The metric defined to trigger the alert is unchanged: 95th percentile of request time, on a 2-minute window.
The difference is on the moment when the alert is triggered:
Before: as soon as the average of the metric is above 500ms in the last 20 minutes. (May be triggered immediately if a request is very slow and the traffic is low, especially on staging).
After: when the last measure is above 500ms for a pending period of 8 minutes. It should be less sensitive to traffic spike.
Checklist
[x] I added the related issue(s) id in the related issues section (if any)
if not, delete the related issues section
[x] I described my changes and my decisions in the PR description
[x] I read the development guidelines of the CONTRIBUTING.md
[x] The tests pass and have been updated if relevant
Description
The metric defined to trigger the alert is unchanged: 95th percentile of request time, on a 2-minute window. The difference is on the moment when the alert is triggered:
Before: as soon as the average of the metric is above 500ms in the last 20 minutes. (May be triggered immediately if a request is very slow and the traffic is low, especially on staging).
After: when the last measure is above 500ms for a pending period of 8 minutes. It should be less sensitive to traffic spike.
Checklist