We are running the dashboards and alert rules on our OSISM infrastructure, and they are working well.
It has caught my eye, though, that the mySQL alert rules are constantly firing due to "slow queries" which is based on the alert firing as soon as there has ever been more than one slow query.
We are running the dashboards and alert rules on our OSISM infrastructure, and they are working well.
It has caught my eye, though, that the mySQL alert rules are constantly firing due to "slow queries" which is based on the alert firing as soon as there has ever been more than one slow query.
https://github.com/osism/kolla-operations/blob/main/prometheus/mysql.rules#L63
Wouldn't it be better to base this on rate over time?
like so:
expr: "rate(mysql_global_status_slow_queries[5m]) > 0"