grafana / mimir

Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.
https://grafana.com/oss/mimir/
GNU Affero General Public License v3.0
4.1k stars 526 forks source link

Metrics values registered at rounded timestamps are counted twice in contiguous time aggregates #9680

Open laurent149 opened 3 days ago

laurent149 commented 3 days ago

Describe the bug

When metrics values are registered with a precisely rounded timestamp like hh:mm:ss.000, hh:mm:00.000, hh:00:00.000, etc. (getting increasingly worse as you read further right), those values are at serious risk of getting counted twice by *_over_time aggregate functions. Full disclosure: I'm not entirely sure whether this is a Mimir issue or a Prometheus issue; please let me know if I should move this report to a different repo.

To Reproduce

  1. Send some "load per minute" metrics to Mimir once per minute, with the metrics timestamps exactly rounded to hh:mm:00.000
  2. Create a time series panel over this metrics with expression = sum_over_time(...[$sampling])
  3. With sampling = 1m, you'll see that each aggregated data point is in fact the result of the sum of 2 data points from the original metrics: the expected one, but also the one that directly follows.
  4. Actually, there is even a more obvious way to see it: use count_over_time instead of sum_over_time and you'll see that it's at a constant 2 instead of a constant 1. With sampling = 10m, it will show a constant 11 instead of a constant 10, etc.

Expected behavior

Data sent at timestamp hh:mm:00.000 should be counted in minute aggregate [hh:mm:00.000 - hh:mm+1:00.000[ but not in aggregate [hh:mm-1:00.000 - hh:mm:00.000[ Same goes for all "rounding levels" (hh:mm:ss.000, hh:mm:00.000, hh:00:00.000, etc.) and for all sampling aggregates (1s, 1m, 1h, etc.). But of course the issue is at its most visible when the two are at the same level; hh:mm:00.000 aggregated per 1m → data overestimated by 100% (each aggregate includes 2 data points instead of 1), hh:mm:00.000 aggregated per 10m → data overestimated by 10% (each aggregate includes 11 data points instead of 10), etc. I'd venture there's a <= somewhere that really should be a <.

Environment

Apologies, it's a company-managed Grafana/Mimir infra; I don't have the details myself but I can ask around if needed, let me know.

Additional Context

Most certainly not the right place to mention this but I'll try anyway, sorry: I believe this issue of not including both bounds (although in this case it would probably be about excluding the lower bound rather than the upper one) is also present when picking a time window. Both the lower bound and the upper bound of the selected time window will appear on the graph. In my opinion, this is an issue; a minor one compared to the one previously described, but still. Say I select a fixed time window (of "round" width, like a full day or whatever), never change it through the whole experiment, and I display the "Total" value in the legend. With sampling = 1m, this total is going to include an extra minute of data; with sampling = 1h, it will include an extra hour of data, etc. So I'm going to see wildly varying totals for what is in effect the very same metrics in the very same time window. Of course this is allowed to happen if the window's width is not an exact multiple of all the sampling periods but in this example, let's say it is → the total should remain unchanged. As it stands though, one would need to "manually" subtract the value of the very first data point from the displayed total, and only then do we indeed get a constant total as expected. And here again, not sure at all that I'm in the right place to report the issue. Maybe rather for Grafana?

charleskorn commented 1 day ago

I believe this issue of not including both bounds

Yep, this is exactly the problem.

I'm not entirely sure whether this is a Mimir issue or a Prometheus issue

This is Prometheus' behaviour, which Mimir inherits.

The good news is that this will change in Prometheus 3, and Mimir will inherit the change when we bring in the changes from Prometheus 3. https://github.com/prometheus/prometheus/pull/13904 is the Prometheus PR that made this change.