open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.75k stars 889 forks source link

Add min/max aggregation for metrics #3863

Closed fugafree closed 9 months ago

fugafree commented 9 months ago

What are you trying to achieve? Max aggregation for sync gauges. Specific task: I'd like to monitor the max size of file exports. I'm interested in all exports without sampling.

Additional context. I need to monitor file export sizes (bytes), and make sure none of them will be higher then a specific value. I wanted to solve this with sync gauge, but min/max aggregation is not available.

MrAlias commented 9 months ago

Gauges are observable instruments. Why can't you maintain the state of a min or max in a closure you register and just observe that value in your callback?

fugafree commented 9 months ago

I can not use a callback because I have events and not a state that I can measure whenever I want (see my example). That's why I thought a sync gauge would work, but there is no max aggregation for it. (Note that sync gauges are kind of new: https://opentelemetry.io/docs/specs/otel/metrics/api/#gauge)

MrAlias commented 9 months ago

https://github.com/open-telemetry/opentelemetry-specification/issues/3062#issuecomment-1412306635

fugafree commented 9 months ago

In my understanding what that code does is that it saves the values locally and fetch them on the ticks of the async callback. So in a nutshell: converts sync observability to async one. Which is solved now for gauge: we have sync gauges. My problem is with the aggregations: I want to use max aggregation. And even if I would use max aggregation in this callback, that would introduce sampling, leaving me without the possibility to make sure I process all event.

jack-berg commented 9 months ago

Histogram aggregations (explicit bucket and exponential bucket) include min and max. You could configure a synchronous gauge to use a explicit bucket histogram aggregation with zero bucket boundaries (i.e. a single bucket spanning [-infinity, infinity]), which produces a lightweight summary with min, max, sum, and count. Maybe you ignore the sum because its probably not meaningful or you'd use a histogram metric point in the first place.

The problem with introducing min and max aggregations is determining their semantics and deciding when they're appropriate to use. They would probably produce point which are of type gauge since it doesn't make sense to sum up the maxes produced by gauge instruments, but the merge semantic for gauges is last value (not max):

Gauges do not provide an aggregation semantic, instead "last sample value" is used when performing operations like temporal alignment or adjusting resolution.

This means that using a synchronous gauge with a max aggregation and performing spatial or temporal reaggregation will not yield the max value.

fugafree commented 9 months ago

I see. Thank you for the detailed answer and explanation. And you are right, using a histogram for my use case seems to be a better choice. With this, I'm closing the issue, thank you!