prometheus / alertmanager

Prometheus Alertmanager
https://prometheus.io
Apache License 2.0
6.45k stars 2.12k forks source link

#3121 breaks Slack integration targeting RocketChat servers #3546

Open ae-govau opened 9 months ago

ae-govau commented 9 months ago

What did you do?

We use the Slack integration to send notifications to a RocketChat server, which is reasonably compatible with Slack for this purpose. This has worked well for 4+ years, however after our last update we keep getting an ever increasing list of "Resolved" alerts every 5 minutes.

What did you expect to see?

We expect to see an alert posted, then a resolved posted, then silence.

What did you see instead? Under which circumstances?

We see the alert posted, then the resolved posted over and over again (every 5 minutes) until the AlertManager instance is restarted.

Environment

We believe that change #3121 triggered this condition. That change checks a 200 status code to ensure if contains "ok": true (previously 200 meant OK without further checking).

Our RocketChat instances returns a different message on success:

$ curl ...
< HTTP/1.1 200 OK
< Content-Type: application/json
...
{"success":true}

We believe that this new code identifies this (incorrectly) as failed POST and thus doesn't remove the now resolved alert from its list, and dutifully tries again every 5 minutes for eternity.

(Credit to @bg-govau for pin-pointing the issue. PR forthcoming.)

roidelapluie commented 9 months ago

This should be fixed in rocketchat, not alertmanager

ae-govau commented 9 months ago

I appreciate that view, but it's also worked just fine for the past 4+ years that we've been using it. It's the recent change to alertmanager which broke the behaviour, not the other way around (or I would be opening a PR there).

This is a very small change, we'd appreciate if you'd consider accepting it.

grobinson-grafana commented 9 months ago

I'm afraid I agree with Julien here. Keep in mind this is a Slack integration. I appreciate that this has worked for you for the past 4 years, but that's purely coincidental rather than intentional.

I also disagree that Alertmanager should change its Slack integration to support both Slack and RocketChat. It's not meant to be a multi-vendor integration. If RocketChat want's to provide a Slack-compatible API then that's great, but it needs to be 100% compatible, including in how it sends responses.

roidelapluie commented 9 months ago

I am also happy to support native rocketchat integration as we are using it internally at my company.

TheMeier commented 8 months ago

I am very keen on getting native rocketchat support, so I started some implementation. This is nowhere near finished. Especially documentation and tests are completely missing still. But it is functional. But before I spend more time on this I wanted to ask some questions @roidelapluie

Is it ok to use the rocketchat SDK? It looks quite different then most of the other integrations, bringing it's own rest-client, especially not supporting httpOpts I needed to import the SDK models into config/notifiers.go, I wonder if that is acceptable.

Since the rocketchat SDK defines field.Short as bool instead of *bool there is no real nice way of checking if it was actually set.

Most integrations allow to configure secrets directly or via a file to load the secret from, I that a requirement?

This is what I currently have: https://github.com/prometheus/alertmanager/compare/main...TheMeier:alertmanager:rocketchat?expand=1 If you prefer I can open a draft pull request. But as I said this is nowhere near done.