Closed grobinson-grafana closed 1 year ago
Thinking about this again, I don't know if it makes sense for Alertmanager to allow configurations to contain a repeat_interval
< group_interval
.
For example, if repeat_interval
is 15s
and group_interval
is 5m
, would it make sense to send notifications every 15 seconds, until such time that the alerts in the aggregation group change, upon which Alertmanager will now start a timer for 5m
?
Thinking about this again, I don't know if it makes sense for Alertmanager to allow configurations to contain a repeat_interval < group_interval.
I agree, I'm not sure how I feel about breaking's people setup on the first run, but at the very minimum, we should print out a warning as the Alertmanager starts.
Late to the party but setting repeat_interval < group_interval is an undocumented way to get repeated notifications at a predictable interval. For instance when using a watchdog/deadmansnitch alert, you can ensure that the notification will be emitted every group interval...
Perhaps we should revert the warning I added then, opinions? @gotjosh @simonpasquier
@grobinson-grafana Reverting it really makes sense. Below is the recommended way by Grafana Oncall to implement a heartbeat alert that always triggers events every 50s, while the heartbeat check is evaluated every 1min.
config:
route:
routes:
- match:
alertname: heartbeat
receiver: 'grafana-oncall-heartbeat'
group_wait: 0s
group_interval: 1m
repeat_interval: 50s
Thanks for the feedback!
I don't think we should revert this - In @simonpasquier's owns words:
is an undocumented way to get repeated notifications at a predictable interval
This signals that this is an exceptional use case and not relevant for most users.
@gotjosh Technically I was suggesting making it a documented way to get notifications at a predictable interval. And maybe even adding a test to preserve the behaviour in future versions as it widely used and recommended by the Grafana Team themselves.
What did you do?
It appears that
repeat_interval
doesn't work if its less thangroup_interval
. I'm not sure if this is on purpose or not, and I couldn't see it documented in https://prometheus.io/docs/alerting/latest/configuration/.I created the following configuration file:
and sent an alert to Alertmanager using cURL:
The first notification was received:
However, the repeat notification was not sent until 22:48:16, when it should have been sent at 22:47:31:
The same happens again at 22:49:16:
If I increase
group_interval
to5m
, then repeat notifications aren't received until 5 minutes after the first notification:I believe this happens because the aggregation group is flushed after
group_interval
, so ifrepeat_interval
<group_interval
, the earliest a notification can repeat isgroup_interval
.What did you expect to see?
Notifications repeated once per
repeat_interval
.What did you see instead? Under which circumstances?
Notifications repeated once per
group_interval
(repeat_interval
must be less thangroup_interval
).Environment
I am testing on
main
, commit https://github.com/prometheus/alertmanager/commit/5adc7369c838c31fcbaa7d413951a2dc01ae87ae.