ministryofjustice / cloud-platform

Documentation on the MoJ cloud platform
MIT License
86 stars 44 forks source link

List of all services alerting in low level alerts #125

Closed ojayx closed 6 years ago

ojayx commented 6 years ago

Background

An audit on #low-priority-alarms slack channel is required. In the past, alarms have not been triggered into the slack room due to the alarm incorrectly configured in Pagerduty or missing from Pagerduty or CloudWatch. Additionally, some alarms are unnecessary and do not need to be alerted at all.

Aim

The aim of the ticket is to create a list of services that are alerting in #low-priority-alarms slack channel

DOD

Note: Alerts come into the channel via Pagerduty and AWS CloudWatch.

ale-novo commented 6 years ago

issue description was improved

ale-novo commented 6 years ago

ive listed all the alarms for the low-priority slack channel for the last 30 days. Ive filtered them by name and removed duplicates. this is the list of uniq alarms:

AlarmName: ci-prod-monitoring | network-out-high-alarm AlarmName: graphite-staging-monitoring | CPU-high-threshold-alarm AlarmName: graphite-staging-monitoring | network-out-high-alarm AlarmName: peoplefinder-prod-monitoring | network-in-high-alarm AlarmName: peoplefinder-prod-monitoring | network-out-high-alarm AlarmName: pvbpublic-staging-monitoring | elb-500-error-alarm AlarmName: pvbpublic-staging-monitoring | http-500-error-alarm AlarmName: pvbpublic-staging-monitoring | unhealthy-hosts-alarm AlarmName: pvb-staging-monitoring | unhealthy-hosts-alarm AlarmName: sensu-prod-monitoring | CPU-high-threshold-alarm AlarmName: sentry-prod-monitoring | elb-500-error-alarm AlarmName: sentry-prod-monitoring | http-400-error-alarm AlarmName: sentry-staging-monitoring | network-out-high-alarm Triggered #62409: DOWN alert: Judicial Office Blog (judicialoffice.blogs.justice.gov.uk) is DOWN

ale-novo commented 6 years ago

a team decision was made to Remove Staging Cloudwatch alerts from low-priority-alarms slack channel

ale-novo commented 6 years ago

the following stacks were updated:

correspondence-tool-staging-monitoring 2017-12-07 10:05:35 UTC+0000 UPDATE_COMPLETE MoJ Application Log Monitoring

correspondence-staff-staging-monitoring 2017-12-07 09:51:21 UTC+0000 UPDATE_COMPLETE MoJ Application Log Monitoring

pvbpublic-staging-monitoring 2017-12-05 12:44:58 UTC+0000 UPDATE_COMPLETE MoJ Application Log Monitoring

cait-staging-monitoring 2017-11-13 10:29:15 UTC+0000 UPDATE_COMPLETE MoJ Application Log Monitoring

postcodeinfo-staging-monitoring 2017-10-25 10:27:19 UTC+0100 UPDATE_COMPLETE MoJ Application Log Monitoring

sentry-staging-monitoring 2017-10-24 17:23:41 UTC+0100 UPDATE_COMPLETE MoJ Application Log Monitoring

mps-staging-monitoring 2017-10-24 14:16:27 UTC+0100 UPDATE_COMPLETE MoJ Application Log Monitoring

graphite-staging-monitoring 2017-10-23 16:22:50 UTC+0100 UPDATE_COMPLETE MoJ Application Log Monitoring

pvb-staging-monitoring 2017-10-23 14:48:31 UTC+0100 UPDATE_COMPLETE MoJ Application Log Monitoring

sensu-staging-monitoring 2017-10-17 13:02:28 UTC+0100 UPDATE_COMPLETE MoJ Application Log Monitoring

ale-novo commented 6 years ago

pagerduty alerts were reviewed

unsuported apps were removed from slack low priority channel alerting:

IRAT - Pingdom (Low Priority)

ale-novo commented 6 years ago

this is now completed