This PR provides alerting for a failure to ingest the latest rules. This is done through the usage of custom CloudWatch metrics for the rules not being found. If no rules are found consistently for three minutes, an alert will trigger, indicating that there is an issue. This is likely to happen when either the rules are missing from the S3 bucket, or there is an S3 outage of some kind.
How to test
Deploy the changes to code (ensure the alarm is created, which it is not normally for CODE), delete the rules from the S3 bucket, wait three minutes, observe the alarm has triggered.
How can we measure success?
We are notified when there is an issue with the Typerighter rule ingestion, allowing us to resolve potential issues quickly.
What does this change?
This PR provides alerting for a failure to ingest the latest rules. This is done through the usage of custom CloudWatch metrics for the rules not being found. If no rules are found consistently for three minutes, an alert will trigger, indicating that there is an issue. This is likely to happen when either the rules are missing from the S3 bucket, or there is an S3 outage of some kind.
How to test
Deploy the changes to code (ensure the alarm is created, which it is not normally for CODE), delete the rules from the S3 bucket, wait three minutes, observe the alarm has triggered.
How can we measure success?
We are notified when there is an issue with the Typerighter rule ingestion, allowing us to resolve potential issues quickly.