open-telemetry / opentelemetry-python

OpenTelemetry Python API and SDK
https://opentelemetry.io
Apache License 2.0
1.8k stars 626 forks source link

Are OTEL histograms different than Prometheus ones? #3298

Closed Gr1N closed 1 year ago

Gr1N commented 1 year ago

Hey!

First of all, thank you for the project you're doing.

I have a question regarding histograms. I'm using SDK 1.17 and facing unexpected behavior for me. According to the Prometheus documentation:

Example: Lets assume we want to observe the time taken to process API requests. Instead of storing the request time for each request, histograms allow us to store them in buckets. We define buckets for time taken, for example lower or equal 0.3 , le 0.5, le 0.7, le 1, and le 1.2. So these are our buckets and once the time taken for a request is calculated it is added to the count of all the buckets whose bucket boundaries are higher than the measured value.

But OTEL behavior is quite different and value adds to only one bucket: https://github.com/open-telemetry/opentelemetry-python/blob/3732fd4a196555715e541cfa936ddf3c43ccd4ee/opentelemetry-sdk/src/opentelemetry/sdk/metrics/_internal/aggregation.py#L281

Is it correct and expected? I'm trying to configure everything with Grafana Stack by using OTEL receiver and Prometheus exporter and worrying about the basic queries I want to visualize, e.g.:

histogram_quantile(0.95, sum by(le) (rate(http_server_duration_bucket[$__rate_interval])))
Gr1N commented 1 year ago

The issue was on my side. I figured it out by using the right exporter https://grafana.com/docs/agent/latest/flow/reference/components/otelcol.exporter.otlphttp/