sni / mod_gearman

Distribute Naemon Host/Service Checks & Eventhandler with Gearman Queues. Host/Servicegroups affinity included.
http://www.mod-gearman.org
GNU General Public License v3.0
122 stars 42 forks source link

Strange check_latency when using mod_gearman #178

Closed dirtyren closed 1 week ago

dirtyren commented 1 week ago

I`ve noticed that naemon is calculating strange check latency values when using mod_gearman worker.

Without mod_gearman

check_latency=0.001 check_latency=0.008 check_latency=0.007 check_latency=0.000 check_latency=0.000 check_latency=0.000 check_latency=0.000 check_latency=0.000 check_latency=0.000 check_latency=0.000

With mod_gearman

check_latency=0.000 check_latency=1731356107.142 check_latency=1731356067.148 check_latency=1731356104.795 check_latency=1731356064.350 check_latency=1731356104.802 check_latency=1731356065.959 check_latency=1731356106.099 check_latency=27.026 check_latency=1731356104.912

I already remove the retention file to reset all latencies, if still calculate these strange numbers. I am using the latest version from naemon site for RHEL9.

Any ideas what could be causing this? Tks.

sni commented 1 week ago

Which versions of naemon and mod-gearman is this?

dirtyren commented 1 week ago

The versions are these https://download.opensuse.org/repositories/home:/naemon/AlmaLinux_9/x86_64/

dirtyren commented 1 week ago

I just redid the test with and without mod_gearman and the problem is confirmed in my installation

With servicestatus { host_name=neoson servicedescription=Disk/ modified_attributes=2 check_command=system-discovery-snmp-storage!'/'!80!90 check_period=24x7 notification_period=24x7 check_interval=5.000000 retry_interval=1.000000 event_handler= has_been_checked=1 check_execution_time=0.032 check_latency=1731420589.121 ...

Without servicestatus { host_name=neoson servicedescription=Disk/ modified_attributes=2 check_command=system-discovery-snmp-storage!'/'!80!90 check_period=24x7 notification_period=24x7 check_interval=5.000000 retry_interval=1.000000 event_handler= has_been_checked=1 check_execution_time=0.028 check_latency=0.001 check_type=0 ...

dirtyren commented 1 week ago

I am investigating the issue, there is something strange happening with this test check:

This is the check, when running as opuser, the same user as naemon and mod_gearman run, it works /usr/local/opmon/libexec/opservices/op-snmp-storage -h 127.0.0.1:161 -i 1 -t 2 -r 2 -f '/' -w 80 -c 90 / disk usage 51.69 % (19.1GB/37.0GB) |usage=51.69%;80.00;90.00;0;100 [opuser@opmon-install-ol9 ~]$

but, inside mod_gearman_worker I am getting this error

output=CRITICAL: Return code of 127 is out of bounds. Make sure the plugin you're trying to run actually exists. (worker: opmon10-ol9-php8-sid)\n[/usr/local/opmon/libexec/opservices/op-snmp-storage: error while loading shared libraries: libmemcached.so.11: cannot open shared object file: No such file or directory]

Running processes

[opuser@opmon-install-ol9 ~]$ ps -ef | grep mod_gearman opuser 2801853 1 0 Nov11 ? 00:00:08 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/worker.conf --pidfile=/run/mod-gearman-worker/mod-gearman-worker.pid opuser 3630044 2801853 0 15:03 ? 00:00:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/worker.conf --pidfile=/run/mod-gearman-worker/mod-gearman-worker.pid opuser 3630096 2801853 0 15:04 ? 00:00:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/worker.conf --pidfile=/run/mod-gearman-worker/mod-gearman-worker.pid opuser 3630212 2801853 0 15:04 ? 00:00:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/worker.conf --pidfile=/run/mod-gearman-worker/mod-gearman-worker.pid opuser 3630213 2801853 0 15:04 ? 00:00:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/worker.conf --pidfile=/run/mod-gearman-worker/mod-gearman-worker.pid opuser 3630214 2801853 0 15:04 ? 00:00:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/worker.conf --pidfile=/run/mod-gearman-worker/mod-gearman-worker.pid opuser 3630215 2801853 0 15:04 ? 00:00:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/worker.conf --pidfile=/run/mod-gearman-worker/mod-gearman-worker.pid opuser 3630305 3629190 0 15:04 pts/0 00:00:00 grep --color=auto mod_gearman [opuser@opmon-install-ol9 ~]$ ps -ef | grep naemon opuser 3628203 3628202 0 15:01 ? 00:00:00 /usr/local/opmon/bin/opmon --worker /usr/local/opmon/var/rw/naemon.qh opuser 3628204 3628202 0 15:01 ? 00:00:00 /usr/local/opmon/bin/opmon --worker /usr/local/opmon/var/rw/naemon.qh opuser 3628205 3628202 0 15:01 ? 00:00:00 /usr/local/opmon/bin/opmon --worker /usr/local/opmon/var/rw/naemon.qh opuser 3628206 3628202 0 15:01 ? 00:00:00 /usr/local/opmon/bin/opmon --worker /usr/local/opmon/var/rw/naemon.qh opuser 3630370 3629190 0 15:04 pts/0 00:00:00 grep --color=auto naemon [

dirtyren commented 1 week ago

The same happens with the Go version of mod_gearman worker

root@opmon-install-ol9 mod_gearman]# ps -ef | grep mod opuser 3633073 3632567 0 15:09 pts/0 00:00:00 /usr/bin/mod_gearman_worker-go --config=/etc/mod_gearman/worker.conf --pidfile=/tmp/mod-gearman-worker.pid

Tks.

sni commented 1 week ago

i tried recent naemon and mod-gearman version and cannot reproduce this.

dirtyren commented 1 week ago

Tks @sni , using mod_gearman go that check_latency problem does not occur

servicestatus { host_name=neoson servicedescription=Disk/ modified_attributes=2 check_command=system-discovery-snmp-storage!'/'!80!90 check_period=24x7 notification_period=24x7 check_interval=5.000000 retry_interval=1.000000 event_handler= has_been_checked=1 check_execution_time=0.033 check_latency=0.002

sni commented 1 week ago

great, the c worker is deprecated anyway... i'll close this then.