moira-alert / moira

Realtime Alerting for Graphite and Prometheus
MIT License
298 stars 69 forks source link

Notify when count of metrics in trigger changes #379

Open beevee opened 5 years ago

beevee commented 5 years ago

We are stepping into the new world of orchestrated clouds. Long-living pretty-named hosts are gone and replaced by short-lived containers with some random characters as a host name.

Yesterday we had a pattern DevOps.*.cpu.percent that matched:

Today it matches a lot more metrics:

This poses two problems with Moira:

  1. False notifications, obviously. Solved with NODATA→DEL setting, but then you lose NODATA detection.
  2. Piles of unused and never deleted keys in Redis. Not applicable for remote triggers.

A solution for (1): let user set a desired count of active metrics in trigger. Notify user if there are less active metrics than expected. Metric is considered active if there are datapoints inside the TTL window (same criteria as for NODATA state).

hv7214 commented 4 years ago

What i understood:

  1. Nowadays, containers are shortlived, i.e. keeps on created and destroyed/restarted.
  2. This lead to bulk of NODATA -> OK -> NODATA metrics in redis, which is ofcourse false info.
  3. So create a counter which maintains number of NODATA metric which in turn will give number of active metrics (total metrics - counter).
  4. User can set threshold metrics count ( for a particular trigger) through UI.
  5. Notify user if counter > curr_active_metrics.
beevee commented 4 years ago

@hv7214 I imagined something like this:

When a user sets NODATA == DEL for a trigger, allow user to specify the desired amount of active (i.e. not deleted) metrics. Notify when there are more or less metrics than expected.

hv7214 commented 4 years ago

@beeve i want some more details : Notif has to be sent once per crossing the marked metric level, or after every 24 hrs if metric count is less or more ?

beevee commented 4 years ago

@hv7214 I think once.