prometheus / alertmanager

Prometheus Alertmanager
https://prometheus.io
Apache License 2.0
6.7k stars 2.17k forks source link

[feature request] matchers to work with annotations #3957

Open fhalde opened 3 months ago

fhalde commented 3 months ago

What did you do? In prometheus, labels have a very different semantics. For a given time series with label A,B,C with some values, any change to those values results in a completely different time series

For alerts, this is especially troublesome. Often, we use severity labels to denote the criticality of alerts. Changing these labels means that we lose continuity between the fired alerts vs the newly defined alert

This new alert goes through its own alert lifecycle in prometheus ( starts from pending -> fired ). This becomes a nuisance while configuring the alertmanager as previous alert with a different criticality ends up "resolved" giving misinformation about the ongoing problem

Annotations OTOH are orthogonal to labels. Changing them ensures continuity.

What did you expect to see? Matchers that match against annotations. Severity then can be marked as an annotation rather than a label.

What did you see instead? Under which circumstances? Matchers only work with labels

Environment k8s

0.24.0

2.49.1

simonpasquier commented 3 months ago

Often, we use severity labels to denote the criticality of alerts. Changing these labels means that we lose continuity between the fired alerts vs the newly defined alert

This new alert goes through its own alert lifecycle in prometheus ( starts from pending -> fired ). This becomes a nuisance while configuring the alertmanager as previous alert with a different criticality ends up "resolved" giving misinformation about the ongoing problem

The general guideline for alerting rules with different severity is to write alert expressions such as the lowest severity alert always fires when the highest sev alert fires. E.g. the timeline should be:

  1. Alert Foo with sev=warning firing.
  2. Alert Foo with sev=warning and sev=critical both firing.
  3. Alert Foo with sev=critical resolved and Alert Foo with sev=warning firing.
  4. Alert Foo with sev=warning resolved.

See the node_exporter NodeFilesystemSpaceFillingUp alerts for instance.