Open mindhash opened 2 years ago
Just following up. Can I submit a PR for switching DeDup Key to GroupLabels.Fingerprint? instead of sending rule key
@mindhash also curious about this, but not familiar with the relevant AlertManager source code.
Is the whole alert group being sent to PagerDuty as one event or is each alert being sent individually?
We're currently working around this in Robusta. We receive alerts from AlertManager and forward them to PagerDuty with a de-dupe key based on the fingerprint.
https://docs.robusta.dev/master/catalog/sinks/PagerDuty.html
So I think the fingerprint-based solution is solid. It's been working for us and would be good to get fixed in AlertManager itself.
I don't see a group_by: [alertname]
line in your Alertmanager configuration. Am I missing something?
What did you do? Setup Alert manager integration with Pagerduty. The configuration is setup to group by ['alertname']. I also have event orchestration setup in PD to create incident against the alert.
What did you expect to see? Each alert with different name in alert manager should result in a separate alert (+incident) in pager duty.
What did you see instead? Under which circumstances? For some reason, pagerduty considers every new alert as an update to existing alert and performs the update.
In this image, each of the updates are in fact separate alerts (different alert name). As you can see in the image below.
After reviewing the code (NotifyV2), I think the Dedupkey is being generated from route key, which may be the reason for issue. I only have one route with receiver setup.
The de-dup key should have been group labels.fingerprint to allow Pagerduty to identify same group updates.
Environment AlertManager, Pagerduty
System information:
insert output of
uname -srm
here Darwin 19.6.0 x86_64Alertmanager version:
insert output of
alertmanager --version
here (repeat for each alertmanager version in your cluster, if relevant to the issue) 0.23.0Prometheus version:
insert output of
prometheus --version
here (repeat for each prometheus version in your cluster, if relevant to the issue)Alertmanager configuration file:
global: resolve_timeout: 5m http_config: follow_redirects: true smtp_from: alertmanager@x.io smtp_hello: localhost smtp_smarthost: localhost:25 smtp_require_tls: true pagerduty_url: https://events.pagerduty.com/v2/enqueue opsgenie_api_url: https://api.opsgenie.com/ wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/ victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/ route: receiver: default-receiver continue: false routes:
templates: []
insert configuration here (if relevant to the issue)
insert Prometheus and Alertmanager logs relevant to the issue here