shatteredsilicon / ssm-submodules

GNU Affero General Public License v3.0
0 stars 2 forks source link

Investigate what happened when there is a telemetry dropout (counter with rate/irate) #251

Open oblitorum opened 1 month ago

oblitorum commented 1 month ago

Sometimes there'll be a telemetry dropout. Not 100% sure, if that means the entry is 0 or null.

But for things that we do rate/irate on, e.g. mysql_global_status_sort_rows, it means that there is a spike from 0 to the current Sort_rows value. Which means you end up with an enormous spike in the graph that makes the graph useless. So:

If mysqld_exporter is returning 0 instead of null/empty/missing, then that's a bug in mysqld_exporter If prometheus is saving null as 0, that's a bug in prometheus

There is a possibility that MariaDB (10.6.latest in this case) is returning bogus data and 0, too, but let's assume for a moment that isn't the case until we exclude the behaviour or mysqld_exporter and prometheus.

Note: this often happens on a hideously overloaded virtual infrastructure and is always disk I/O starved. So this could also be a factor.

@gordan-bobic