Closed adoom42 closed 2 years ago
great investigation. Thanks for that. A few remarks.
The error you see is because somehow a newline sliped in the query when updating services, ex.:
GET services
ResponseHeader: fixed16
OutputFormat: json
Columns: host_name description...
KeepAlive: on
Filter: last_check >= 1637073927
Filter: last_check < 1637073937
And: 2
Filter: is_executing = 1
Or: 2
I'll see where this comes from.
Could you try the latest LMD? (Could also be extracted from tomorrows OMD nightly build)
I grabbed the lmd binary from omd-4.41.20211118-labs-edition-rhel7.x86_64.rpm and it worked without any errors. Thanks for the super-quick fix.
I installed Thruk & LMD a couple years ago and it has been working very well. Recently I upgraded Thruk to use the new API support and that went without a hitch. I encountered some problems when upgrading LMD.
For background, all servers run RHEL 7 or CentOS 7 with recent patches. The Thruk server connects to multiple Nagios servers at remote sites using stunnel (followed https://www.thruk.org/documentation/install.html#_tls-livestatus ). All Nagios instances are 4.4.6 and use check-mk-livestatus-1.4.0p31 (RPMs from the EPEL repo).
Before upgrading Thruk & LMD:
After upgrading Thruk & before upgrading LMD:
I thought the new version of Thruk may require a newer version of LMD, that's part of what led me to attempt upgrading LMD.
After upgrading LMD:
So it seems that upgrading LMD resolved the
not implemented op: 7
problem, but introduced a newFilter: is_executing
problem. I captured some livestatus queries that Thruk sends to Nagios and they run fine when executing them manually. I'm stumped as to why they all succeed when run by hand but sometimes fail when run by Thruk. The stunnel connections are fine (they use the exact same settings from the Thruk documentation).As an exercise in trial & error, I tried all versions of LMD that shipped with major OMD releases between 2.7.0 and 4.4.0. It's pretty clear that the errors are tied to the versions since they come & go cleanly when the software is upgrade/downgraded.
Note that LMD 1.9.2 didn't work at all. The stunnel connections showed
peer is down: tls: server selected unsupported protocol version 301
errors for some reason. Upgrading/downgrading fixed that problem, only 1.9.2 was affected for some reason.A guess is that the "Invalid request method" problem noted in livestatus.log started with LMD 2.x since that was a major version bump.
I looked into upgrading the check-mk-livestatus Nagios module but it appears that isn't distributed independently anymore (only available as source code or bundled with the main checkmk release). I also tried replacing check-mk-livestatus with naemon-livestatus, but Nagios wouldn't load the module (
nagios[117094]: Error: Module '/usr/lib64/naemon/naemon-livestatus/livestatus.so' is using an old or unspecified version of the event broker API. Module will be unloaded.
). While I could probably replace the whole Nagios app with Naemon, that would be a lot of work which I'd rather not tackle at this time.Do you know what's going on or what other steps can be taken to gather more info? I'd like to get LMD 2.0.3 working with my Nagios 4.4.6 instances.
Thanks.