shinken-solutions / shinken

Flexible and scalable monitoring framework
http://www.shinken-monitoring.org
GNU Affero General Public License v3.0
1.14k stars 335 forks source link

Enh: Force problem/impact evaluation #2012

Closed geektophe closed 3 years ago

geektophe commented 3 years ago

When the retention data is saved by the scheduler, only the direct objects state is saved (last check time, state, state type, ...), but not the problem/impact related attributes (the associated data structure is too complex).

This means that when the retention data is reloaded, the objects direct state is restored, but not the problem/impact related attributes, which come back to their default value. They only get recalculated when a new check result arrives arrives for a given host/service. Additionally, a Brok is only emitted if the objects have dependencies. This means that the broker isn't aware of the attribute change until a state change is triggered.

This patch adds the ability to force problem/impact evaluation on all the objects after the retention data has been reloaded. In such a situation, a Brok should not be emitted as the broker will request initial Broks soon after. The send_brok is there precisely to control the Broks emission in such a situation.

This feature is disabled by default, and can be enabled by setting the global parameter enable_problem_impacts_states_reprocessing parameter to 1 in addition to the enable_problem_impacts_states_change parameter in the Shinken main configuration file:

enable_problem_impacts_states_change=1
enable_problem_impacts_states_reprocessing=1

Some factorization has also been made in the retention data restoration routines to ease further maintenance.

coveralls commented 3 years ago

Coverage Status

Coverage increased (+0.04%) to 27.768% when pulling f995b89dd3aad5c510c940d17a086dc51a42c22e on geektophe:enh_force_problem_impact_reprocessing into fc4a0fe237fb9a8bd9a5dc247480024a134bb401 on naparuba:master.

geektophe commented 3 years ago

@naparuba could this be of interest in Enterprise ? Did you manage this issue differently ?