mozilla-services / pagerstatus

A service to automatically update Statuspage.io based on Pagerduty incidents
Apache License 2.0
7 stars 5 forks source link

Handle invalid components tags #20

Open sciurus opened 5 years ago

sciurus commented 5 years ago

Right now if someone tags an incident with a component that does not exist, we'll die. Which is good because I know there's a problem, but bad because we didn't process any other incidents that did have valid tags. I should catch this problem and log an error but still continue processing.

Logging isn't the greatest visibility, so I'll open another issue about reporting metrics for unexpected conditions (e.g. invalid components, multiple components) that we can alert on.

08:07:08
START RequestId: 16734de3-feae-11e8-a178-21265e0de132 Version: $LATEST

08:07:09
Found 1 pagerduty incidents

08:07:09
For pagerduty incident PA42BA3

08:07:09
It is tagged component 76k9j8n4y3zt

08:07:09
Found 0 statuspage incidents

08:07:10
pagerstatus - DEBUG - Caught exception for <function handle_webhook at 0x7f3f22fb8510>

08:07:10
Traceback (most recent call last):

08:07:10
File "/var/task/chalice/app.py", line 726, in _get_view_function_response

08:07:10
response = view_function(**function_args)

08:07:10
File "/var/task/app.py", line 29, in handle_webhook

08:07:10
sync(pagerduty_key)

08:07:10
File "/var/task/app.py", line 50, in sync

08:07:10
statuspage.open_incident(component)

08:07:10
File "/var/task/chalicelib/statuspage.py", line 103, in open_incident

08:07:10
name, body = _render_incident_text(component_id, template["name"], template["body"])

08:07:10
File "/var/task/chalicelib/statuspage.py", line 68, in _render_incident_text

08:07:10
component_name = _component_ids_to_names()[component_id]

08:07:10
KeyError: '76k9j8n4y3zt'

08:07:10
END RequestId: 16734de3-feae-11e8-a178-21265e0de132

08:07:10
REPORT RequestId: 16734de3-feae-11e8-a178-21265e0de132 Duration: 1325.84 ms Billed Duration: 1400 ms Memory Size: 128 MB Max Memory Used: 30 MB