naemon / naemon-core

Networks, Applications and Event Monitor
http://www.naemon.io/
GNU General Public License v2.0
151 stars 63 forks source link

enable_flap_detection=0 seems to be ignored #434

Open lausser opened 1 year ago

lausser commented 1 year ago

I have a naemon 1.4.1 (OMD 5.21.20230729-labs-edition) where i disabled flap detection globally. But there is one service which is still shown as flapping (also in the logfile with [1690795047] SERVICE NOTIFICATION SUPPRESSED: snclient;os_windows_svc_check_Server;Notification blocked because the object is currently flapping.

In the retention date one can clearly see the two different settings:

info {
created=1690813918
version=1.4.1
}
program {
modified_host_attributes=0
modified_service_attributes=0
...
enable_flap_detection=0
...
}

service {
host_name=snclient
service_description=os_windows_svc_check_Server
modified_attributes=2
...
is_flapping=1
percent_state_change=0.00
...

Also Thruk shows the flapping symbol. Shouldn't everything be 0 when enable_flap_detection in the naemon.cfg was disabled?

nook24 commented 1 year ago

Sounds like a bug to me

sni commented 1 year ago

Disabling enable_flap_detection does not reset all existing flapping flags. It simply prevents new flapping flags from being set. But i agree, it should not prevent notifications from being sent out.

jacobbaungard commented 1 year ago

Isn't the flapping flag removed on the next check execution? If not, that would probably be sensible imo.

sni commented 1 year ago

I don't think this is done aleady, but yes, that might be the way to go.

sni commented 7 months ago

Thinking about it, simply resetting the flag might not be a good idea. The global enable_flap_detection flag can be (temporarily) changed by an external command. So this would result in lots of "starting to flap" notifications again.

One way would be to check the global flag at least in the notifications logic. Better than nothing, but this would still show the host/service as flapping in the UI unless the UI checks the global flag as well.

Naemon simply cannot predict whether the enable_flap_detection is just temporarily disabled or forever.

sni commented 7 months ago

Besides the quick fix from #452 we could think about slowly letting grow out the state history in case the host/service flapping flag is set, even if flapping detection is disabled. For example, each time a check result arrives, update the state history used for flapping detection with an OK untill the list is cleared. (But silently ignoring flapping stop notification)