rfmoz / grafana-dashboards

Grafana dashboards
Apache License 2.0
1.13k stars 440 forks source link

Convert to using [$__rate_interval] #72

Closed candlerb closed 3 years ago

candlerb commented 3 years ago

Grafana 7.2 (released 23 Sep 2020) introduces a new variable for prometheus queries, $__rate_interval - see doc link.

This is intended to get rid of the problems around irate() and rate() queries missing spikes where the graph interval skips over them.

To apply this on the node exporter full dashboard, you'd change every instance of

irate(....[5m])

to

rate(...[$__rate_interval])

The way it works: $__rate_interval is equal to the sum of the graph step (the time interval between horizontal data points) and the prometheus sampling interval (set as the sample rate in the data source definition).

Remember that rate() calculates the rate between the first and last data points contained within the window. So, say you are scraping node_exporter at 1 minute intervals. Then rate(...[6m]) contains 6 data points, and calculates the rate over the 5 minute period between the first and last point in the window.

Consider various different zoom levels for grafana for the intervals between data points on the X axis:

In each case, the rate correctly calculates the average over the time period between two data points. Spikes are never missed - although of course if you're averaging a spike over a longer time period then the peak shown will be lower.

The only downside I can see for doing this is that it will make the dashboard only usable with grafana 7.2 and later.

rfrail3 commented 3 years ago

That is an annoying thing, great to know that now there is a realiable alternative.

Definitively is the way to go, but the new release requirement can be a trouble for the people that doesn't have the version updated.

But, as I updated the dashboard version on #70 to 7.3.7, it can be only on this repo for a few weeks or months until I publish it on Grafana website, giving time to have an updated userbase.

alarsyo commented 3 years ago

I'm getting "No data" when zooming in using this change. It works for "last 24 hours" and "last 12 hours" but breaks under 6h and less :/

Reverting to irate and 5m fixes it

candlerb commented 3 years ago

Presumably you have a reasonably modern version of grafana?

My guess is that you've not set the correct scrape interval in the grafana data source you've defined.

Look under Configuration > Data Sources > (your prometheus)

image

Suppose the scrape interval is set to 15s here (which is the default if you haven't set it) - but you're actually scraping at 1 minute intervals in your prometheus config. $__rate_interval will drop to (graph_step+15s), eventually reaching a minimum value, which is 4 x datasource interval = 1m

At this point you're doing rate(...)[1m] which will display no data if you are scraping at 1 minute intervals; you need at least rate(...)[2m]. You can achieve this by setting the data source interval to 30s.

You can set the data source interval to the "correct" value of 1m to match your scraping, but that will give you a rate calculated over the points in a 4m window, which is actually the rate over 3 minutes (from the first to last data point in that window). I think this is wrong, and I raised it here, but that was rejected.

alarsyo commented 3 years ago

You're right, that was absolutely it! Thanks for the detailed answer, this makes sense now!