Open pvdputte opened 1 year ago
So, for the record, this is a working combination.
# dpkg -l |grep -e naemon -e gearm
ii gearman-job-server 1.1.19.1+ds-2+b2 amd64 Job server for the Gearman distributed job queue
ii gearman-tools 1.1.19.1+ds-2+b2 amd64 Tools for the Gearman distributed job queue
ii libgearman8:amd64 1.1.19.1+ds-2+b2 amd64 Library providing Gearman client and worker functions
ii libnaemon:amd64 1.4.1-1 amd64 Library for Naemon - common data files
ii mod-gearman-module 5.1.0 amd64 Event broker module to distribute service checks.
ii mod-gearman-tools 5.1.2-1 amd64 Tools for mod-gearman
ii naemon-core 1.4.1-1 amd64 host/service/network monitoring and management system
ii naemon-livestatus 1.4.1-1 amd64 contains the Naemon livestatus eventbroker module
Same on Debian 12 'bookworm' with gearman-job-server 1.1.20 by the way (which is where I noticed the issue first).
I did some manual compiles. 5.1 was still fine for me, 5.1.1 was broken. The problem starts right after this commit:
https://github.com/sni/mod_gearman/commit/87e22207c67e1e3305362b46e183a3130e8925ae
Does it fail immediately?
Yes. 0 jobs arrive in gearmand.
FYI I did a reinstall on Debian 12 today using
ii naemon-core 1.4.2-1 amd64 host/service/network monitoring and management system
ii mod-gearman-module 5.1.2-1 amd64 Event broker module to distribute service checks.
and I got the same result.
I was keeping an old copy of the 5.1 binary to work around it, I'll have to recompile for the new v6 event broker API.
Error: Module '/usr/lib/x86_64-linux-gnu/mod_gearman/mod_gearman_naemon.o' is using an incompatible version (v5) of the event broker API (current version: v6). Module will be unloaded.
Or should I look into something else that could be wrongly configured on my system, as I find it hard to believe no-one else has been using mod-gearman-module on Debian the last few months. :thinking:
or those packages from OBS? I assumed they automatically rebuild if there is an update in naemon-core.
I'm using
$ cat /etc/apt/sources.list.d/naemon-stable.list
deb [signed-by=/etc/apt/trusted.gpg.d/naemon.asc] http://download.opensuse.org/repositories/home:/naemon/Debian_12/ ./
apt show
confirms it.
i will look into this. There was a new naemon release some days ago, i assume that's the issue here. NEB modules have to be rebuild against that version. I will trigger a rebuild for mod-gearman.
The build is fine for all I know, there's no NEB error starting naemon after installing. But using those packages still results in the problem described in the issue here: check jobs don't arrive in gearmand.
I only get that NEB v6 error when I try to replace /usr/lib/x86_64-linux-gnu/mod_gearman/mod_gearman_naemon.o
with an older 5.1 version that still worked for me on Debian 12 with gearman-job-server 1.1.20+ds-1.
In other words: the issue is still the same with naemon 1.4.2 and the 5.1.2-1 version of mod-gearman-module on Debian 12, fetched from download.opensuse.org/repositories/home:/naemon/Debian_12
At least for me :slightly_smiling_face:
I find it hard to believe no-one else has been using mod-gearman-module on Debian
I use it on several debian 12 machines, but i am using it with OMD-Labs which comes with its own gearmand. Had no issues so far.
I'm currently digging a bit deeper as I found that a plain vanilla install of all packages results in a working configuration.
Still a bit at a loss though as in the failing environment I now have identical naemon-core mod-gearman-module packages with identical
/etc/naemon/naemon.cfg
/etc/naemon/module-conf.d/mod-gearman.cfg
/etc/mod-gearman/module.conf
but no fix. I'll update this issue as soon as I find out more.
Turns out it's because I've been running gearmand for ages with --listen=0.0.0.0
to let remote mod-gearman-worker instances connect.
So I had no reason to suspect anything wrong with the gearmand setup (that I had not touched for a long time) and the mod-gearman-module suddenly stopped working properly with version 5.1.2 / this commit: https://github.com/sni/mod_gearman/commit/87e22207c67e1e3305362b46e183a3130e8925ae
I.e. these errors appeared out of nowhere:
sending job to gearmand failed: gearman_wait(GEARMAN_TIMEOUT) timeout reached, 1 servers were poll(), no servers were available
But by default in the Debian gearman-job-server package, gearmand is started with --listen=localhost
, and when I revert to that, version 5.1.2 starts working properly again.
What had me puzzled is that the check_results
queue was created and had naemon in it. But the host/notification/service queues etc. were never created because that part of mod-gearman-module 5.1.2 apparently no longer works if gearmand listens on 0.0.0.0
instead of localhost
.
5.1.0 still worked with gearmand on 0.0.0.0
though. :shrug:
Anyway, the fix for me is to just remove the --listen=...
thing, by default it listens on everything then.
Sorry to bother you, maybe this helps someone else later :slightly_smiling_face:
Hi,
Could there be a regression between 5.1.0 and 5.1.2 in submitting jobs to gearmand?
I'm using a clean Debian 11 install with the packages from OBS (as you recommend). This is from
/var/log/mod-gearman/mod_gearman_neb.log
with debug=4By chance I had the 5.1.0 .deb still in the
/var/cache/apt
of a different testing VM. (note: this deb is from the labs.consol.de repository as I just recently switched, not sure if that matters).If I
dpkg -i
the 5.1.0 one without changing my conffiles, it works again: