open-telemetry / opentelemetry-rust

The Rust OpenTelemetry implementation
https://opentelemetry.io
Apache License 2.0
1.9k stars 444 forks source link

[Bug]: ObservableGauge no longer dropping data points from old Attributes values #2213

Open markdingram opened 1 month ago

markdingram commented 1 month ago

Related Problems?

No response

Describe the solution you'd like:

A change in behaviour is present between 0.24 & 0.26 for ObservableGauge & possibly other Async Instruments.

In 0.26 once any reading is made for a given combination of attributes that reading is always sent in subsequent collections. There's no way to remove any attributes that are no longer present in the data. This doesn't match the SDK Spec:

For asynchronous instruments with Delta or Cumulative aggregation temporality, MetricReader.Collect MUST only receive data points with measurements recorded since the previous collection.

Adapted one of the existing tests to show an error - https://github.com/markdingram/opentelemetry-rust/commit/68d375b320c1e2cbcec5030a2f0269cd9553a7fb

Comment from @cijothomas on CNCF Slack:

OTel Rust currently is likely violating the above part of spec, as it never forgets data points for Cumulative temporality, but the spec says it should forget data points for CUMULATIVE too, for Observable instruments.

Considered Alternatives

No response

Additional Context

No response

kensimon commented 3 weeks ago

I believe this bug was introduced here: https://github.com/open-telemetry/opentelemetry-rust/pull/2021

The LastValue aggregation function used to have a comment saying "[Builder::temporality] is ignored and delta is always used", and "Delta temporality is the only temporality that makes semantic sense for a last-value aggregate", but now it obeys self.temporality and will apply cumulative data if self.temporality is not Delta.

Interestingly the doc comments for AggregateBuilder still say that Delta temporality will be used for last_value: https://github.com/open-telemetry/opentelemetry-rust/blob/52f6d9214eca58c32e3fc41e3829038b868c07e1/opentelemetry-sdk/src/metrics/internal/aggregate.rs#L60

It would seem to me that the original comments were right... last_value should mean last value even if the AggregateBuilder's temporality is cumulative.

This is effecting me because the opentelemetry_prometheus's PrometheusExporter only supports Temporality::Cumulative: https://github.com/open-telemetry/opentelemetry-rust/blob/52f6d9214eca58c32e3fc41e3829038b868c07e1/opentelemetry-prometheus/src/lib.rs#L155

    /// Note: Prometheus only supports cumulative temporality so this will always be
    /// [Temporality::Cumulative].

Meaning that trying to use the prometheus exporter, there's no way to clear any gauge fields in v0.26.

cijothomas commented 3 weeks ago

Thanks for reporting! Yes, this looks like a bug.

(Co-incidentally, this bug was also discovered in OTel .NET recently : https://github.com/open-telemetry/opentelemetry-dotnet/pull/5952)