Pagerduty Integration (events API V2): Updates same alert instead of creating new one

mindhash commented 2 years ago

What did you do? Setup Alert manager integration with Pagerduty. The configuration is setup to group by ['alertname']. I also have event orchestration setup in PD to create incident against the alert.

What did you expect to see? Each alert with different name in alert manager should result in a separate alert (+incident) in pager duty.

What did you see instead? Under which circumstances? For some reason, pagerduty considers every new alert as an update to existing alert and performs the update.

In this image, each of the updates are in fact separate alerts (different alert name). As you can see in the image below.

After reviewing the code (NotifyV2), I think the Dedupkey is being generated from route key, which may be the reason for issue. I only have one route with receiver setup.

The de-dup key should have been group labels.fingerprint to allow Pagerduty to identify same group updates.

&pagerDutyMessage{
        Client:      tmpl(n.conf.Client),
        ClientURL:   tmpl(n.conf.ClientURL),
        RoutingKey:  tmpl(string(n.conf.RoutingKey)),
        EventAction: eventType,
        **DedupKey:    key.Hash(),**

Environment AlertManager, Pagerduty

System information:

insert output of uname -srm here Darwin 19.6.0 x86_64
Alertmanager version:

insert output of alertmanager --version here (repeat for each alertmanager version in your cluster, if relevant to the issue) 0.23.0
Prometheus version:

insert output of prometheus --version here (repeat for each prometheus version in your cluster, if relevant to the issue)
Alertmanager configuration file:

global: resolve_timeout: 5m http_config: follow_redirects: true smtp_from: alertmanager@x.io smtp_hello: localhost smtp_smarthost: localhost:25 smtp_require_tls: true pagerduty_url: https://events.pagerduty.com/v2/enqueue opsgenie_api_url: https://api.opsgenie.com/ wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/ victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/ route: receiver: default-receiver continue: false routes:

receiver: p1 continue: true group_wait: 30s group_interval: 30s repeat_interval: 4h receivers:
name: default-receiver email_configs:
- send_resolved: false to: default@email.com from: alertmanager@example.org hello: localhost smarthost: localhost:25 html: '{{ template \"email.default.html\" . }}' require_tls: true
name: p1 pagerduty_configs:
- send_resolved: true http_config: follow_redirects: true routing_key: url: https://events.pagerduty.com/v2/enqueue client: SigNoz Alert Manager client_url: http://localhost:8080/alerts description: "description" severity: '{{ (index .Alerts 0).Labels.severity }}' component: test2111}}

templates: []


* Prometheus configuration file:

insert configuration here (if relevant to the issue)


* Logs:

insert Prometheus and Alertmanager logs relevant to the issue here

mindhash commented 2 years ago

Just following up. Can I submit a PR for switching DeDup Key to GroupLabels.Fingerprint? instead of sending rule key

aantn commented 2 years ago

@mindhash also curious about this, but not familiar with the relevant AlertManager source code.

Is the whole alert group being sent to PagerDuty as one event or is each alert being sent individually?

aantn commented 1 year ago

We're currently working around this in Robusta. We receive alerts from AlertManager and forward them to PagerDuty with a de-dupe key based on the fingerprint.

https://docs.robusta.dev/master/catalog/sinks/PagerDuty.html

So I think the fingerprint-based solution is solid. It's been working for us and would be good to get fixed in AlertManager itself.

simonpasquier commented 1 year ago

I don't see a group_by: [alertname] line in your Alertmanager configuration. Am I missing something?

prometheus / alertmanager

Pagerduty Integration (events API V2): Updates same alert instead of creating new one #2874