mozilla-services / pagerstatus

A service to automatically update Statuspage.io based on Pagerduty incidents
Apache License 2.0
7 stars 5 forks source link

Notice: We have stopped using StatusPage and thus Pagerstatus is no longer maintained

What is Pagerstatus?

Pagerstatus is a service to automatically update Atlassian Statuspage based on Pagerduty incidents.

There are a number of frustrating aspects to Statuspage’s built-in Pagerduty integration.

Pagerstatus solves these problems!

Service Design

Pagerstatus is written using Chalice, a Python "serverless" framework from AWS. It's deployed as a Lambda function behind API Gateway,

The basic logic is straightforward. When a webhook is received from Pagerduty, iterate over each message in the payload. Then, if any Pagerduty incidents were acknowledged or resolved

  1. Fetch all open incidents from Statuspage. Ignore any that do not have a string in the body denoting they were created by this tool.
  2. Fetch all open incidents from Pagerduty. Ignore any that do not have a component tag.
  3. If there are any components in Statuspage that are not in Pagerduty, look up their incidents and close them.
  4. If there are any components in Pagerduty that are not in Statuspage, create incidents for them.

Deploy It

Before you begin, you'll need a few pieces of information

If you have multiple Pagerduty accounts, collect that for each one. You'll also need

Multiple Statuspage's are not supported. As a workaround, you can deploy this service multiple times.

Clone this repository and make it your working directory.

Modify .chalice/config.json. Set STATUSPAGE_PAGE and STATUSPAGE_KEY to your page id and API key.

For each pagerduty account, create a variable that begins with PD_ACCOUNT_ and ends with your account name. For instance, if your account name is hugops create the variable PD_ACCOUNT_HUGOPS. Set the value of that variable to the corresponding API key.

Ensure your aws credentials are configured correctly and install chalice

Run chalice deploy.

Note the URL it shows. That's where you can access pagerstatus. Request it and you should see the response ["Hello from Pagerstatus"]. Now test again with curl -XPOST -H "Content-Type: application/json" --data '{"messages":[{"event":"incident.acknowledge"}]}' yoururl/pdaccount , replacing yoururl with the URL from the previous step and pdaccount with the name of one of your pagerduty accounts from earlier, e.g. hugops. You should get the response ["Performed sync"].

If you get errors in either of those tests, look at the Cloudwatch logs for the Lambda function that chalice deployed.

Use It

Configuring Statuspage

No special configuration is needed in Statuspage. Just create your components and note their IDs for use in tags later.

Configuring Pagerduty

For each Pagerduty service that you route alerts to, you must add a new Generic V2 Webhook extension. The URL of the extension is the URL printed from chalice deploy plus the name of the pagerduty account the service you are configuring is in, e.g. https://qxea58oupc.execute-api.us-west-2.amazonaws.com/hugops

Configuring your monitors (alerts)

Pagerstatus has been tested to work with Datadog and Pingdom. In the examples below, replace component-id with the ID of a statuspage component.

In datadog, tag each monitor with the form component:component-id.

In Pingdom, tag each check with the form component_component-id.

For emails, add a string with the form Component: component-id somewhere in the body.