Open jpds opened 5 years ago
Yes, that's a nice one.
The inhibition rule is actually quite easy:
inhibit_rules:
- source_match:
alertname: 'ClusterIsDown'
equal: ['cluster']
It works just fine with our self-inhibition prevention. However, it contradicts the recommendation given in https://prometheus.io/docs/alerting/configuration/#inhibit_rule : “However, we recommend to choose target and source matchers in a way that alerts never match both sides.” Is there a better way to write the inhibition rule? @stuartnelson3 @brian-brazil
It's kind of weird that the first example use case we list can only be solved by not following a recommendation given later.
I'd usually have some severity label on both sides, as often you'd want a class of alerts to do this rather than one particular alertname.
@beorn7 Thanks for the snippet!
In our particular use case, we have "clusters" of devices (with multiple exporters running) in the field connected to the Internet by various [sometimes unreliable] means and I don't think severity labels fit with what we're trying to prevent.
For example, when a 4G connection/router fails for a group of devices in a particular area, we do not want to be flooded with notifications for all our devices/alerts save one "something fell off the Internet" notification.
Would a good example for the documentation be combining blackbox_exporter
's ICMP probe_success == 0
metric for the alert with an inhibition rule?
I have a feeling that I should label my devices as router
/end-device
and set an inhibitor on if the router is down, don't alert on the devices...
The documentation here states that inhibition can be used to suppress the same alert coming from an entire cluster:
However, none of the examples I can find online show how this can be done easily in either of these places: