medic / cht-watchdog

Configuration for deploying a monitoring/alerting stack for CHT
GNU Affero General Public License v3.0
4 stars 7 forks source link

Support ignoring specific CHT instances for provisioned alerts #97

Open jkuester opened 5 months ago

jkuester commented 5 months ago

To get the most value from alerts, noise reduction is essential. When monitoring CHT multiple instances, it may be helpful to only receive alerts for some of the instances, while other instances (e.g. training instances) do not trigger alerts.

Grafana alerts are highly customizable and you can tweak all kinds of factors on when to send a custom alert (and what instances it should trigger for). However, our provisioned alerts cannot be edited. If someone wants to not trigger any alerts from a particular instance, they have to copy/delete all of the provisioned alerts.

Instead, it would be nice to just be able to include some kind of configuration where admins could disable alerts for an instance.

mrjones-plip commented 4 months ago

@eljhkrr - while you're in there redoing the alerts in 832 and 98, you likely can solve this ticket for free.

basically, a way to not alert on dev instances, but still have them in watchdog!

eljhkrr commented 4 months ago

Completed in #832

mrjones-plip commented 4 months ago

@eljhkrr - we'd love to expose a way for end users of Watchdog to know how to ignore specific CHT instances. Is there a config change at we can upstream your work into the default alerts in Watchdog? If not, is there some content we can add to the docs?

eljhkrr commented 4 months ago

One of the ways of automatically ignoring specific CHT instances would be to introduce a condition on the expression such that instances with .dev for example are not alerted on. However, my take is that this decision should be made by a human making manual updates. In my experience dev instances have periods when a team would be interested in knowing if something isn't right e.g. during training. Reopening to add to documentation.