opensearch-project / alerting

📟 Get notified when your data meets certain conditions by setting up monitors, alerts, and notifications
https://opensearch.org/docs/latest/monitoring-plugins/alerting/index/
Apache License 2.0
60 stars 102 forks source link

[FEATURE] Actions triggered only after a specific number of occurrences #1424

Open spapadop opened 7 months ago

spapadop commented 7 months ago

Is your feature request related to a problem? I have a "per cluster metrics" monitor type, running GET _cluster/health every minute, with 2 triggers:

  1. Cluster is red, checking ctx.results[0].status == "red"
  2. Cluster is yellow, checking ctx.results[0].status == "yellow"

As a result, whenever cluster (even for one minute) goes to red or yellow status, it triggers the alarm. That makes sense for the "red" status, as there is no acceptable "red" state, but clusters go through some "yellow" periods as part of normal activity (e.g., initialising a new shard, increasing the number of replicas). Consequently, unnecessarily many yellow state alarms are triggered.

What solution would you like? I would like an extra configuration item to exist on trigger "action" level, where I can say the yellow action to only trigger an alarm when it has been evaluated as "true" for 30 consecutive times. In other words, if a cluster is "yellow" for more than 30 minutes, then something is off.

What alternatives have you considered? Having two monitors (one for red cluster health running every minute, one for yellow cluster health running every 30 mins) would statistically reduce the number of alarms, but it's not a good solution. Also, chaining monitors to ensure that cluster is yellow and no shard is initialising could be explored, but it's unnecessarily complex IMO.