Closed natalia-k closed 7 years ago
Do you have the log lines showing the notifications being sent?
no, I can't find it in the log but they were resend to webhook every time of reload
This seems to happen for us, too. I've added several predictive MySQL capacity alerts that have extremely high timing settings:
ALERT ... FOR 1h
repeat_interval: 108h
group_wait: 5m
group_interval: 12h
I still get quite some complains about 'alert storms': all alerts firing at the same second. It looks like we are hitting the issues described here.
From the log
2017-04-04_07:23:33.61307 time="2017-04-04T07:23:33Z" level=info msg="Loading configuration file" file="/srv/prometheus/alertmanager/alertmanager.yml" source="main.go:200"
2017-04-04_07:23:33.71081 time="2017-04-04T07:23:33Z" level=debug msg="Received alert" alert=MySQLTableSizeWarning[ace3d54][active] component=dispatcher source="dispatch.go:187"
2017-04-04_07:23:33.71119 time="2017-04-04T07:23:33Z" level=debug msg="Received alert" alert=MySQLTableSizeWarning[35c738e][active] component=dispatcher source="dispatch.go:187"
2017-04-04_07:23:33.71135 time="2017-04-04T07:23:33Z" level=debug msg="flushing [MySQLTableSizeWarning[ace3d54][active]]" aggrGroup=c372a46868630640 source="dispatch.go:428"
2017-04-04_07:23:33.71143 time="2017-04-04T07:23:33Z" level=debug msg="flushing [MySQLTableSizeWarning[35c738e][active]]" aggrGroup=8197482606ef9270 source="dispatch.go:428"
The flush happens within less than a second of the config reload
Issue seems to be here: https://github.com/prometheus/alertmanager/blob/master/cmd/alertmanager/main.go#L227-L240
Reloading tears down and completely recreates the dispatcher.
@stuartnelson3 that shouldn't be an issue as the notification log should still be populated from previous notifications regardless of the dispatcher being recreated. Maybe loading the notification log from disk races with the notification queue accepting ingestion though.
This should not be racy. The disk snapshot is fully loaded before the constructor of the notification log returns. That only happens on startup.
Reloading of Alertmanager only rebuilds the pipelines according to the new configuration.
I added a basic test in #716, which works as expected. We need some more information to reproduce this condition.
I will close here as there has been no further progress. @natalia-k feel free to reopen with further information to reproduce the issue like suggested by @fabxc. Thanks for the bug report!
Can please anyone help me for that remediation steps when we got a alertmanagerconfigreloadfailed alert
I met the issue.
I set repeat_interval to 2000h. After reload the alertmanager, alertmanager resend the alert immediately, not in repeat interval. But not everytime. Alertmanager 0.17.
@fabxc @mxinden @brancz any idea?
Same here Any help?
Hi,
I find that reload of alertmanager (curl -XPOST http://localhost:9093/-/reload) without performing any changes in alertmanager.yml resend notifications for all exists alerts. I have version 0.5.1
for example : %journalctl -u alertmanager.service | grep -e " alert=NOC_MemoryUsage_per_container[bbd0ed6]" -e "Loading configuration file"
Alert in prometheus :
I have the errors in a log, may it be a cause :
Could you help me to fix it ? Thanks! Natalia