idealista / prom2teams

prom2teams is an HTTP server built with Python that receives alert notifications from a previously configured Prometheus Alertmanager instance and forwards it to Microsoft Teams using defined connectors
Apache License 2.0
274 stars 87 forks source link

Microsoft Teams endpoint returned HTTP error 429 #141

Closed NiasSt90 closed 4 years ago

NiasSt90 commented 5 years ago

Description

Alertmanager sends some alerts and prom2teams tries to forward them to MS-Teams. But after some successfully forwarded messages to the MS-Teams service it replies with http status code 429.

2019-09-04 06:34:11,948 - flask.app - ERROR - Exception on /v2/Connector [POST] Traceback (most recent call last): File "/usr/local/lib/python3.5/site-packages/Flask-1.0.2-py3.5.egg/flask/app.py", line 2292, in wsgi_app response = self.full_dispatch_request() File "/usr/local/lib/python3.5/site-packages/Flask-1.0.2-py3.5.egg/flask/app.py", line 1815, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/local/lib/python3.5/site-packages/flask_restplus-0.12.1-py3.5.egg/flask_restplus/api.py", line 583, in error_router return original_handler(e) File "/usr/local/lib/python3.5/site-packages/flask_restplus-0.12.1-py3.5.egg/flask_restplus/api.py", line 583, in error_router return original_handler(e) File "/usr/local/lib/python3.5/site-packages/Flask-1.0.2-py3.5.egg/flask/app.py", line 1718, in handle_user_exception reraise(exc_type, exc_value, tb) File "/usr/local/lib/python3.5/site-packages/Flask-1.0.2-py3.5.egg/flask/_compat.py", line 35, in reraise raise value File "/usr/local/lib/python3.5/site-packages/Flask-1.0.2-py3.5.egg/flask/app.py", line 1813, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.5/site-packages/Flask-1.0.2-py3.5.egg/flask/app.py", line 1799, in dispatch_request return self.view_functionsrule.endpoint File "/usr/local/lib/python3.5/site-packages/flask_restplus-0.12.1-py3.5.egg/flask_restplus/api.py", line 325, in wrapper resp = resource(*args, kwargs) File "/usr/local/lib/python3.5/site-packages/Flask-1.0.2-py3.5.egg/flask/views.py", line 88, in view return self.dispatch_request(*args, *kwargs) File "/usr/local/lib/python3.5/site-packages/flask_restplus-0.12.1-py3.5.egg/flask_restplus/resource.py", line 44, in dispatch_request resp = meth(args, kwargs) File "/usr/local/lib/python3.5/site-packages/prom2teams-2.4.0-py3.5.egg/prom2teams/app/versions/v2/namespace.py", line 27, in post self.sender.send_alarms(alerts, app.config['MICROSOFT_TEAMS'][connector]) File "/usr/local/lib/python3.5/site-packages/prom2teams-2.4.0-py3.5.egg/prom2teams/app/sender.py", line 30, in send_alarms post(teams_webhook_url, team_alarm) File "/usr/local/lib/python3.5/site-packages/prom2teams-2.4.0-py3.5.egg/prom2teams/app/teams_client.py", line 17, in post str(response.text)))

prom2teams.app.exceptions.MicrosoftTeamsRequestException: Error performing request to: https://outlook.office.com/webhook/**. Returned status code: 200. Returned data: Webhook message delivery failed with error: Microsoft Teams endpoint returned HTTP error 429 with ContextId tcid=7728202280260441716,server=AM3PEPF00000A5B,cv=t2bhHDB4DEie8pMbHqcw8w.0.

the prometheus alertmanager gets back http status code 500

level=error ts=2019-09-04T03:31:08.009985306Z caller=dispatch.go:280 component=dispatcher msg="Notify for alerts failed" num_alerts=42 err="unexpected status code 500 from http://prom2teams-service:8089/v2/Connector"

and tries so resend after a short timeout...this triggers the error again and again....

The major problem: the first alerts are come thru MS-Teams (can see them in the channel). But with http return code 500 to alertmanager it tries to resend all 42 alerts....we got >500 alerts message last night from 42 real alerts. Only restarting alertmanager stopped these endless loop.

the status code 429 from teams should be handled carefully (send slower...).

Versions

2.4.0

miguel-chacon commented 5 years ago

Hi @NiasSt90 Since release 2.5.0, prom2teams returns same status code from Teams response, so Alertmanager should handle 429 by itself instead of retrying because of 500 error from prom2teams.

vishnubraj commented 5 years ago

I am also getting the same error..
prom2teams send few alerts after that microsoft teams endpoint returns HTTP error 429.. Please let us know how to fix this...

rromanovic commented 5 years ago

Hi, I'm getting HTTP error 429 also. After some research it looks like we are getting HTTP error 429 due to SharePoint throttling.

Here https://docs.microsoft.com/en-us/sharepoint/dev/general-development/how-to-avoid-getting-throttled-or-blocked-in-sharepoint-online#how-to-decorate-your-http-traffic-to-avoid-throttling they are suggesting to decorate HTTP request via appropriate UserAgent string to avoid throttling.

Is there a way to get this done through configuration?

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.