prometheus / alertmanager

Prometheus Alertmanager
https://prometheus.io
Apache License 2.0
6.64k stars 2.15k forks source link

AlertManager not respecting repeat_interval timer setting on duplicate alert #3846

Open geeteq opened 5 months ago

geeteq commented 5 months ago

What did you do? I've got the following configuration set for duplicate alerts and AlertManager is not respecting the repeat_interval

I've got the following routes;

route: group_by: ['alertname','site','host_name'] group_wait: 10m group_interval: 15m repeat_interval: 76h
receiver: 'slack-only'

routes:

What did you expect to see? Within a short while I see two alerts with the same alert ID hash

First first: 2:28 PM Alert ID: 03fb6064079b070a [AM1.3] (1)

Fires again 30 minutes later: 2:58 PM Alert ID: 03fb6064079b070a [AM1.3] (1)

I would expect AlertManager to respect the repeat_interval: 600h for this given alert that is flapping but has the same alert hash ID but it's not working

What did you see instead? Under which circumstances?

Environment

alertmanager, version 0.27.0 (branch: HEAD, revision: 0aa3c2aad14cff039931923ab16b26b7481783b5) build user: root@22cd11f671e9 build date: 20240228-11:51:20 go version: go1.21.7 platform: linux/amd64 tags: netgo

templates: ['alert.tmpl']

route: group_by: ['alertname','site','host_name'] group_wait: 10m group_interval: 15m repeat_interval: 76h
receiver: 'slack-only'

routes:

receivers:

inhibit_rules:

grobinson-grafana commented 5 months ago

Repeat interval suppresses notifications unless the alert state has changed. You mentioned that the alert fired, resolved, and then fired again 30 minutes later. Repeat interval would not work here because the alert resolved somewhere in that 30 minute period.