tournesol-app / tournesol

Free and open source code of the https://tournesol.app platform. Meet the community on Discord https://discord.gg/WvcSG55Bf3
https://tournesol.app
Other
331 stars 48 forks source link

[infra] update alert rule about slow requests with pending period #1992

Closed amatissart closed 5 months ago

amatissart commented 5 months ago

Description

The metric defined to trigger the alert is unchanged: 95th percentile of request time, on a 2-minute window. The difference is on the moment when the alert is triggered:

Before: as soon as the average of the metric is above 500ms in the last 20 minutes. (May be triggered immediately if a request is very slow and the traffic is low, especially on staging).

After: when the last measure is above 500ms for a pending period of 8 minutes. It should be less sensitive to traffic spike.

Checklist