prometheus / alertmanager

Prometheus Alertmanager
https://prometheus.io
Apache License 2.0
6.66k stars 2.16k forks source link

Implement proper retries for integrations #2205

Open simonpasquier opened 4 years ago

simonpasquier commented 4 years ago

Looking at the various integrations, some of them provide hints on how you should retry once you're rate-limited:

Currently the retry behavior for all integrations is a simple back-off mechanism using the github.com/cenkalti/backoff/v4 library.

Other integrations (VictorOps, WeChat) don't provide much information about retry strategies so the current implementation is probably good enough for them. The same is true for the webhook integration.

This is a follow up of #2121, #2128 and #2119. In particular this comment is relevant.

vears91 commented 3 years ago

Hi, is this issue still relevant? I'd like to work on it if it still is

tommysitehost commented 1 year ago

Hey I think this is still relevant as more and more backends adopt 429 HTTP code.

vvxxvvxx commented 1 year ago

Hi @simonpasquier , is this unexpected status code 429: {\"retry_after\":1,\"ok\":false,\"error\":\"rate_limited\"}" issue fixed already?

aleem-99 commented 1 year ago

Facing same error for slack notifications alertmanager version v0.24.0 notify retry canceled due to unrecoverable error after 1 attempts: channel \"\": unexpected status code 429: {\"retry_after\":1,\"ok\":false,\"error\":\"rate_limited\"}"