Open ae-govau opened 9 months ago
This should be fixed in rocketchat, not alertmanager
I appreciate that view, but it's also worked just fine for the past 4+ years that we've been using it. It's the recent change to alertmanager which broke the behaviour, not the other way around (or I would be opening a PR there).
This is a very small change, we'd appreciate if you'd consider accepting it.
I'm afraid I agree with Julien here. Keep in mind this is a Slack integration. I appreciate that this has worked for you for the past 4 years, but that's purely coincidental rather than intentional.
I also disagree that Alertmanager should change its Slack integration to support both Slack and RocketChat. It's not meant to be a multi-vendor integration. If RocketChat want's to provide a Slack-compatible API then that's great, but it needs to be 100% compatible, including in how it sends responses.
I am also happy to support native rocketchat integration as we are using it internally at my company.
I am very keen on getting native rocketchat support, so I started some implementation. This is nowhere near finished. Especially documentation and tests are completely missing still. But it is functional. But before I spend more time on this I wanted to ask some questions @roidelapluie
Is it ok to use the rocketchat SDK?
It looks quite different then most of the other integrations, bringing it's own rest-client, especially not supporting httpOpts
I needed to import the SDK models into config/notifiers.go
, I wonder if that is acceptable.
Since the rocketchat SDK defines field.Short
as bool
instead of *bool
there is no real nice way of checking if it was actually set.
Most integrations allow to configure secrets directly or via a file to load the secret from, I that a requirement?
This is what I currently have: https://github.com/prometheus/alertmanager/compare/main...TheMeier:alertmanager:rocketchat?expand=1 If you prefer I can open a draft pull request. But as I said this is nowhere near done.
What did you do?
We use the Slack integration to send notifications to a RocketChat server, which is reasonably compatible with Slack for this purpose. This has worked well for 4+ years, however after our last update we keep getting an ever increasing list of "Resolved" alerts every 5 minutes.
What did you expect to see?
We expect to see an alert posted, then a resolved posted, then silence.
What did you see instead? Under which circumstances?
We see the alert posted, then the resolved posted over and over again (every 5 minutes) until the AlertManager instance is restarted.
Environment
We believe that change #3121 triggered this condition. That change checks a 200 status code to ensure if contains
"ok": true
(previously 200 meant OK without further checking).Our RocketChat instances returns a different message on success:
We believe that this new code identifies this (incorrectly) as failed POST and thus doesn't remove the now resolved alert from its list, and dutifully tries again every 5 minutes for eternity.
(Credit to @bg-govau for pin-pointing the issue. PR forthcoming.)