Open jnadler opened 5 years ago
The notification data includes both firing and resolved alerts. If you want the Slack message to only display the firing ones, you could do: {{ range .Alerts.Firing }}...{{ end }}
Wow, thanks! I studied the docs before filing this issue and couldn't find this. Might it be helpful if it were doc'd here? https://prometheus.io/docs/alerting/notifications/
You're right. The source is at https://github.com/prometheus/docs/blob/master/content/docs/alerting/notifications.md
How's this https://github.com/prometheus/docs/pull/1411
I've confirmed (with 3 HA AlertManagers) that intermittently some alerts do disappear from the .Alerts
collection, and thus from Slack alerts based on it.
I suppose that the alert for docker.for.mac.localhost:9111
got resolved? In this case, AlertManager sees that the alert group has changed and it will trigger a new notification at the next group evaluation.
I'm not getting that result consistently. The substantial majority of the time, when the alert for 9111 gets resolved, at the next group evaluation that alert is still in the .Alerts
list.
Intermittently I'm able to trigger behavior where the alert for 9111 is resolved and on the next grouping it's removed from the .Alerts
list, as in the example above. I don't have a strong feeling as to what the behavior should be, but it should be reliable and consistent.
Ah I thought that your last screenshot displayed only the firing alerts (eg .Alerts.Firing
) but IIUC it still uses .Alerts
.
That's correct, still using .Alerts
. I can probably document how to reproduce this if that's helpful - it's just a bunch of scripts that start docker containers. Not a super high priority now that I know about .Alerts.Firing
.
What did you do? Running 11 node exporters, 3 prometheus instances scraping all 11, and 1 alertmanger all locally in Docker using the latest published images.
Trigger a grouped alert by taking down 7 node exporters, wait for the Slack alert to arrive, resolve some of the members of the group.
What did you expect to see? After
group_interval
has passed, another Slack alert with the updated, smaller set of grouped alerts.What did you see instead? Under which circumstances? After
group_interval
, another Slack alert including all grouped alerts, even the resolved ones. The AlertManager UI shows the correct (smaller, now that some are resolved) set of grouped alerts.This behavior is easily reproducible but not 100% consistent - possibly racy. The Slack alert accumulates group members when new alerts are added to the group (they are added reliably) but alerts rarely leave the group (the AM UI is always correct - just the Slack alert retains the resolved alerts).
While building a simple repro environment for this issue I think I did occasionally see an alert removed from the Slack alert list, but it's certainly more common for it to be retained in the Slack message.
Environment
System information:
Darwin 18.6.0 x86_64
Alertmanager version:
0.18.0
Prometheus version:
2.11.1
Alertmanager configuration file:
name: pandora-alerts slack_configs:
name: eng-observability-debug slack_configs:
Prometheus configuration file:
alerts.yml