Closed alexstaz closed 3 years ago
That is a known issue with icinga2 but should be fixed meanwhile. Are you using the latest version?
Sounds like the issue solved in e0e8ea1e2d651d6ff22d4b9ec29426d0a8865575
I have tested with lmd 1.9.0 and 1.9.5 and HEAD. Icinga release are 2.12.3. All peer have flags Icinga2 (flags = ['Icinga2']) Any ideas ? I have put logs in debug, but don't help too much
No idea so far. I mean, there are known issues with Icinga2, but there should be workarounds in place as already mentioned. I would have to setup a test environment to see if i am able to reproduce that somehow.
Hello @sni, We dig into more testing, and if we deactivate the full it's ok. So it's not the incremental that doesn't work, but the full I think. We changed to FullUpdateInterval = 600 to FullUpdateInterval = 0. I hope it can helps.
thanks, that surely helps identifying the root cause.
please try again, should be better now
Thanks a lot. We use it since 24h and everything seems fine now.
Hello,
We use LMD to attack 1 nagios and 6 Icinga 2 system. It seems the incremental update of status is doing some internal corruption of the cache. We see that some services get their status and all other information corrupted by another service, like plugin_output, status, etc ... For example : [host_name,display_name,plugin_output] ["xxx-yyy-bo02","HYCU Last Backup","Updates: 3 critical, 2 optional"]
If we wait some time (full update) or restart we have the good information : ["xxx-yyy-bo02","HYCU Last Backup","Status=OK, Compliancy=GREEN, Date=2021-02-25T07:04:05.355000"]
It happens more with system that have some long delay to respond (we have around 250ms of latency between lmd server and Icinga). Never seen on Nagios system. And also the system that we have most of the time the problem is in High Avaibility (2 icinga connection declared in Lmd)
I tried to look at the code, but quite difficult for me. If you need more information, please tell me.
Thanks a lot.