prometheus / alertmanager

Prometheus Alertmanager
https://prometheus.io
Apache License 2.0
6.64k stars 2.15k forks source link

Pagerduty Integration (events API V2): Updates same alert instead of creating new one #2874

Open mindhash opened 2 years ago

mindhash commented 2 years ago

What did you do? Setup Alert manager integration with Pagerduty. The configuration is setup to group by ['alertname']. I also have event orchestration setup in PD to create incident against the alert.

What did you expect to see? Each alert with different name in alert manager should result in a separate alert (+incident) in pager duty.

What did you see instead? Under which circumstances? For some reason, pagerduty considers every new alert as an update to existing alert and performs the update.

image In this image, each of the updates are in fact separate alerts (different alert name). As you can see in the image below. image

After reviewing the code (NotifyV2), I think the Dedupkey is being generated from route key, which may be the reason for issue. I only have one route with receiver setup.

The de-dup key should have been group labels.fingerprint to allow Pagerduty to identify same group updates.

&pagerDutyMessage{
        Client:      tmpl(n.conf.Client),
        ClientURL:   tmpl(n.conf.ClientURL),
        RoutingKey:  tmpl(string(n.conf.RoutingKey)),
        EventAction: eventType,
        **DedupKey:    key.Hash(),**

Environment AlertManager, Pagerduty

global: resolve_timeout: 5m http_config: follow_redirects: true smtp_from: alertmanager@x.io smtp_hello: localhost smtp_smarthost: localhost:25 smtp_require_tls: true pagerduty_url: https://events.pagerduty.com/v2/enqueue opsgenie_api_url: https://api.opsgenie.com/ wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/ victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/ route: receiver: default-receiver continue: false routes:

templates: []


* Prometheus configuration file:

insert configuration here (if relevant to the issue)


* Logs:

insert Prometheus and Alertmanager logs relevant to the issue here

mindhash commented 2 years ago

Just following up. Can I submit a PR for switching DeDup Key to GroupLabels.Fingerprint? instead of sending rule key

aantn commented 2 years ago

@mindhash also curious about this, but not familiar with the relevant AlertManager source code.

Is the whole alert group being sent to PagerDuty as one event or is each alert being sent individually?

aantn commented 1 year ago

We're currently working around this in Robusta. We receive alerts from AlertManager and forward them to PagerDuty with a de-dupe key based on the fingerprint.

https://docs.robusta.dev/master/catalog/sinks/PagerDuty.html

So I think the fingerprint-based solution is solid. It's been working for us and would be good to get fixed in AlertManager itself.

simonpasquier commented 1 year ago

I don't see a group_by: [alertname] line in your Alertmanager configuration. Am I missing something?