sensu / sensu-go

Simple. Scalable. Multi-cloud monitoring.
https://sensu.io
MIT License
1.02k stars 175 forks source link

Check flap detection #382

Closed portertech closed 7 years ago

portertech commented 7 years ago

Sensu 1.X provides check flap detection, using the same algorithm as Nagios. Sensu stores the 21 latest exit statuses for every check and uses them to calculate a weighted "total state change %" value, with the more recent exit statuses having more value. The "total state change %" is then compared against two configurable thresholds, the high threshold used to determine if a check enters a flapping state, and the low threshold use to determine if a check exits a flapping state.

The limited Sensu 1.X check flap detection documentation can be found at: https://sensuapp.org/docs/latest/reference/checks.html#check-attributes under "low_flap_threshold" and "high_flap_threshold".

The Sensu 1.X "total state change %" calculation Ruby code and YARDOC can be found at: https://github.com/sensu/sensu/blob/b27e506875e466f2192e00c4da8dfd6344819309/lib/sensu/server/process.rb#L380-L415

The Sensu 1.X flap detection Ruby code and YARDOC can be found at: https://github.com/sensu/sensu/blob/b27e506875e466f2192e00c4da8dfd6344819309/lib/sensu/server/process.rb#L417-L453

Sensu 2.X could implement the total state change % calculation and flap detection in eventd.

grepory commented 7 years ago

Further work: figure out a way to better represent flapping checks, but that is out of scope for this ticket.

grepory commented 7 years ago

Make sure that we have at least as much test coverage as 1.x.

portertech commented 7 years ago

The 1.X spec https://github.com/sensu/sensu/blob/master/spec/server/process_spec.rb#L206-L253

palourde commented 7 years ago

CoS

Tasks

portertech commented 7 years ago

1201