Is your feature request related to a problem?
I have a "per cluster metrics" monitor type, running GET _cluster/health every minute, with 2 triggers:
Cluster is red, checking ctx.results[0].status == "red"
Cluster is yellow, checking ctx.results[0].status == "yellow"
As a result, whenever cluster (even for one minute) goes to red or yellow status, it triggers the alarm. That makes sense for the "red" status, as there is no acceptable "red" state, but clusters go through some "yellow" periods as part of normal activity (e.g., initialising a new shard, increasing the number of replicas). Consequently, unnecessarily many yellow state alarms are triggered.
What solution would you like?
I would like an extra configuration item to exist on trigger "action" level, where I can say the yellow action to only trigger an alarm when it has been evaluated as "true" for 30 consecutive times. In other words, if a cluster is "yellow" for more than 30 minutes, then something is off.
What alternatives have you considered?
Having two monitors (one for red cluster health running every minute, one for yellow cluster health running every 30 mins) would statistically reduce the number of alarms, but it's not a good solution. Also, chaining monitors to ensure that cluster is yellow and no shard is initialising could be explored, but it's unnecessarily complex IMO.
Is your feature request related to a problem? I have a "per cluster metrics" monitor type, running
GET _cluster/health
every minute, with 2 triggers:ctx.results[0].status == "red"
ctx.results[0].status == "yellow"
As a result, whenever cluster (even for one minute) goes to red or yellow status, it triggers the alarm. That makes sense for the "red" status, as there is no acceptable "red" state, but clusters go through some "yellow" periods as part of normal activity (e.g., initialising a new shard, increasing the number of replicas). Consequently, unnecessarily many yellow state alarms are triggered.
What solution would you like? I would like an extra configuration item to exist on trigger "action" level, where I can say the yellow action to only trigger an alarm when it has been evaluated as "true" for 30 consecutive times. In other words, if a cluster is "yellow" for more than 30 minutes, then something is off.
What alternatives have you considered? Having two monitors (one for red cluster health running every minute, one for yellow cluster health running every 30 mins) would statistically reduce the number of alarms, but it's not a good solution. Also, chaining monitors to ensure that cluster is yellow and no shard is initialising could be explored, but it's unnecessarily complex IMO.