Clarify how Prometheus uses the OpenMetrics "Created" timestamp

jmacd commented 3 years ago

The OpenMetrics specification states for Counter metrics:

A MetricPoint in a Metric with the type Counter SHOULD have a Timestamp value called Created. This can help ingestors discern between new metrics and long-running ones it did not see before.

A MetricPoint in a Metric's Counter's Total MAY reset to 0. If present, the corresponding Created time MUST also be set to the timestamp of the reset.

The OpenTelemetry data model agrees that this field is useful, and that it should be optional. We have argued that when the Created / Start time is not set, it is possible to miss process restarts, and thus undercount metrics for short-lived processes.

We are trying to define the proper translation into OTLP for metric points when the Created time is not known. This is relevant in https://github.com/lightstep/opentelemetry-prometheus-sidecar, which reads the WAL and writes OTLP metric streams. We believe that a Created / Start time can be filled in by any stateful observer that is able to remember the last value and its timestamp.

When a stateful observer possesses this information, we believe that processor SHOULD fill in the missing start timestamp.

The issue here is investigatory. Does Prometheus have plans to use the OpenMetrics Created timestamp and eventually include that in its WAL?

jeromeinsf commented 3 years ago

Is the proposal more general to include bitemporal modeling that could be used as hints to time the computing of recording rules when data is updated/late ?

jmacd commented 3 years ago

@jeromeinsf If I understand your use of the term correctly, this is probably not the conversation bitemporal modeling you're looking for.

This "Created" or "Start" timestamp is used to support knowing when a cumulative series was reset.

Your question @jeromeinsf relates to late-arriving data, and this is definitely an important discussion. Right now, especially in this working group, we are focused on a pull-based metrics, and Prometheus uses "staleness markers" to consistently indicate missing data. I see two follow-on questions for push-based systems

For a system pushing OTLP metrics from SDK to Collector, can a stateful processor in the Collector indicate that no data arrived? This is discussed in https://github.com/open-telemetry/opentelemetry-specification/issues/1078
For a system re-aggregating OTLP metrics inside a Collector, one that is aware of late-arriving data, can the process correctly update its state and issue new data points that reflect a later understanding of the world? This has not yet been discussed in a dedicated issue, but I'd like to connect this discussion with https://github.com/open-telemetry/wg-prometheus/issues/35. Prometheus currently uses external labels to describe the Prometheus process that collects data. When there is a High-Availability configuration, each replica has a distinct value of some spatial dimension that a downstream processor can erase (see https://github.com/open-telemetry/opentelemetry-specification/issues/1297) to reconstruct a single stream of data. I would like to use external labels to model late-arriving data. In other words, I think we can use external labels to express temporal replication. A stateful processor that is aware of late-arriving data could re-issue an identical data point with a new resource attribute indicating a real or virtual timestamp associated with the update. A downstream processor can correctly compute the state of the world to an observer at a given point in time. (Note I'm ignoring clock synchronization issues!)

jsuereth commented 3 years ago

This has been clarified and we will account for this in our data model.

open-telemetry / wg-prometheus

Clarify how Prometheus uses the OpenMetrics "Created" timestamp #46