sni / mod_gearman

Distribute Naemon Host/Service Checks & Eventhandler with Gearman Queues. Host/Servicegroups affinity included.
http://www.mod-gearman.org
GNU General Public License v3.0
122 stars 42 forks source link

Installing on Debian 9 (stretch) #123

Closed PitoneMaledetto closed 6 years ago

PitoneMaledetto commented 7 years ago

Hi all, I am running Nagios 4.3.2 and I would like to run mod_gearman. Although the documentation advises to install Debian packages when possible my first approach was to do that but the repos versions are quite old Source Package: mod-gearman (1.5.5-1) I got in contact with Sven via Twitter and he advised me to get the latest from this Git repo but your installation requires naemon which I am not planning to run. So I had a dig around and found https://mod-gearman.org/download/v3.0.5/debian9/amd64/ from which I downloaded and installed all. In order to install the above I had to create the user naemon since it is hardcoded.

When I tried to run the mod-gearman-worker first it complained that it was missing the libgearman.so.6 library since I had installed libgearman.so.7 (Debian 9 default). I sym linked libgearman.so.6 -> libgearman.so.7.0.1.

Now this is how it runs: `mod-gearman-worker.service - LSB: Control the mod-gearman worker daemon Loaded: loaded (/etc/init.d/mod-gearman-worker; generated; vendor preset: enabled) Active: active (running) since Sat 2017-07-29 19:59:02 BST; 2min 26s ago Docs: man:systemd-sysv-generator(8) Process: 5504 ExecStop=/etc/init.d/mod-gearman-worker stop (code=exited, status=0/SUCCESS) Process: 5541 ExecStart=/etc/init.d/mod-gearman-worker start (code=exited, status=0/SUCCESS) Tasks: 7 (limit: 4915) CGroup: /system.slice/mod-gearman-worker.service ├─5548 /usr/sbin/mod_gearman_worker -d --config=/etc/mod-gearman/worker.conf --pidfile=/var/run/mod-gearman/worker.pid ├─5589 /usr/sbin/mod_gearman_worker -d --config=/etc/mod-gearman/worker.conf --pidfile=/var/run/mod-gearman/worker.pid ├─5591 /usr/sbin/mod_gearman_worker -d --config=/etc/mod-gearman/worker.conf --pidfile=/var/run/mod-gearman/worker.pid ├─5592 /usr/sbin/mod_gearman_worker -d --config=/etc/mod-gearman/worker.conf --pidfile=/var/run/mod-gearman/worker.pid ├─5593 /usr/sbin/mod_gearman_worker -d --config=/etc/mod-gearman/worker.conf --pidfile=/var/run/mod-gearman/worker.pid ├─5594 /usr/sbin/mod_gearman_worker -d --config=/etc/mod-gearman/worker.conf --pidfile=/var/run/mod-gearman/worker.pid └─5595 /usr/sbin/mod_gearman_worker -d --config=/etc/mod-gearman/worker.conf --pidfile=/var/run/mod-gearman/worker.pid

Jul 29 19:59:02 dev-gb-nagios-01 systemd[1]: Starting LSB: Control the mod-gearman worker daemon... Jul 29 19:59:02 dev-gb-nagios-01 mod-gearman-worker[5541]: Starting : mod_gearman_worker. Jul 29 19:59:02 dev-gb-nagios-01 systemd[1]: Started LSB: Control the mod-gearman worker daemon. ` My concern is that since it runs as naemon will it work for Nagios? If not what can I do? Thanks

PitoneMaledetto commented 7 years ago

Had to set use_retained_scheduling_info=0 UPDATE: did not work I get (host check orphaned, is the mod-gearman worker on queue 'host' running?) Maybe because everything is in the Waiting queue. But it is strange since all the services are coming back fine...(maybe is Nagios runing bypassing mod-gearman-worker module?)

in Nagios since I kept getting orphaned statuses in Nagios as if the queue host was not checked. Moreover when gearman_top I get everything under Jobs Waiting nothing is running.

PitoneMaledetto commented 7 years ago

Everything is orphaned: '[1501428212] Warning: The check of host 'localhost' looks like it was orphaned (results never came back). I'm scheduling an immediate check of the host... [1501428222] HOST ALERT: localhost;DOWN;SOFT;1;(host check orphaned, is the mod-gearman worker on queue 'host' running?) [1501428272] Warning: The check of service 'Current Load' on host 'localhost' looks like it was orphaned (results never came back; last_check=1501427305; next_check=1501427590). I'm scheduling an immediate check of the service... [1501428333] Warning: The check of service 'Current Users' on host 'localhost' looks like it was orphaned (results never came back; last_check=1501427462; next_check=1501427628). I'm scheduling an immediate check of the service... [1501428392] Warning: The check of service 'HTTP' on host 'localhost' looks like it was orphaned (results never came back; last_check=1501427500; next_check=1501427665). I'm scheduling an immediate check of the service... [1501428392] Warning: The check of service 'PING' on host 'localhost' looks like it was orphaned (results never came back; last_check=1501427537; next_check=1501427703). I'm scheduling an immediate check of the service... [1501428452] Warning: The check of service 'Root Partition' on host 'localhost' looks like it was orphaned (results never came back; last_check=1501427456; next_check=1501427740). I'm scheduling an immediate check of the service... [1501428452] Warning: The check of service 'SSH' on host 'localhost' looks like it was orphaned (results never came back; last_check=1501427540; next_check=1501427778). I'm scheduling an immediate check of the service... [1501428513] Warning: The check of service 'Swap Usage' on host 'localhost' looks like it was orphaned (results never came back; last_check=1501427530; next_check=1501427815). I'm scheduling an immediate check of the service... [1501428573] Warning: The check of service 'Total Processes' on host 'localhost' looks like it was orphaned (results never came back; last_check=1501427305; next_check=1501427853). I'm scheduling an immediate check of the service...'

sni commented 7 years ago

If everything orphans, then make sure the worker is running fine. You can check that with gearman_top.

PitoneMaledetto commented 7 years ago

My configuration Debian 9 Nagios Core Version 4.3.2 libgearman.so.7.0.1 sym linked like so ln -s libgearman.so.7.0.1 libgearman.so.6 (otherwise lib not found error) gearman job server v1.0.6

nagios.cfg broker_module=/usr/lib/mod_gearman/mod_gearman_nagios4.o config=/etc/mod-gearman/module.conf

mod-gearman module and worker conf module.conf and worker.conf left as per default the only change was to insert the password key on both.

Packages installed for mod-gearman mod-gearman-module_3.0.5_debian9_amd64.deb mod-gearman-tools_3.0.5_debian9_amd64.deb mod-gearman-worker_3.0.5_debian9_amd64.deb

From mod_gearman_worker.log

[2017-08-08 17:25:56][5168][INFO ] mod_gearman worker daemon started with pid 5168 [2017-08-08 17:27:57][5168][INFO ] no checks in 2minutes, restarting all workers [2017-08-08 17:29:58][5168][INFO ] no checks in 2minutes, restarting all workers [2017-08-08 17:31:59][5168][INFO ] no checks in 2minutes, restarting all workers [2017-08-08 17:34:00][5168][INFO ] no checks in 2minutes, restarting all workers [2017-08-08 17:36:01][5168][INFO ] no checks in 2minutes, restarting all workers

From nagios.log at startup [1502208362] mod_gearman: initialized version 3.0.5 (libgearman 1.0.6) [1502208362] Event broker module '/usr/lib/mod_gearman/mod_gearman_nagios4.o' initialized successfully. then [1502209981] Warning: The check of service 'PING' on host 'localhost' looks like it was orphaned (results never came back; last_check=1502208238; next_check=1502209261). I'm scheduling an immediate check of the service... [1502209981] Warning: The check of service 'Root Partition' on host 'localhost' looks like it was orphaned (results never came back; last_check=1502208274; next_check=1502209261). I'm scheduling an immediate check of the service... [1502210042] Warning: The check of service 'Swap Usage' on host 'localhost' looks like it was orphaned (results never came back; last_check=1502208349; next_check=1502209321). I'm scheduling an immediate check of the service...

From syslog

8238; next_check=1502209261). I'm scheduling an immediate check of the service... Aug 8 17:33:02 dev-gb-nagios-01 nagios: Warning: The check of service 'Root Partition' on host 'localhost' looks like it was orphaned (results never came back; last_check=1502208274; next_check=1502209261). I'm scheduling an immediate check of the service... Aug 8 17:33:52 dev-gb-nagios-01 ndo2db: Trimming timedevents. Aug 8 17:33:52 dev-gb-nagios-01 ndo2db: Trimming systemcommands. Aug 8 17:33:52 dev-gb-nagios-01 ndo2db: Trimming servicechecks. Aug 8 17:33:52 dev-gb-nagios-01 ndo2db: Trimming hostchecks. Aug 8 17:33:52 dev-gb-nagios-01 ndo2db: Trimming eventhandlers. Aug 8 17:33:52 dev-gb-nagios-01 ndo2db: Trimming externalcommands. Aug 8 17:33:52 dev-gb-nagios-01 ndo2db: Trimming notifications. Aug 8 17:33:52 dev-gb-nagios-01 ndo2db: Trimming contactnotifications. Aug 8 17:33:52 dev-gb-nagios-01 ndo2db: Trimming contactnotificationmethods. Aug 8 17:33:52 dev-gb-nagios-01 ndo2db: Trimming logentries. Aug 8 17:33:52 dev-gb-nagios-01 ndo2db: Trimming acknowledgements. Aug 8 17:34:02 dev-gb-nagios-01 nagios: Warning: The check of service 'Swap Usage' on host 'localhost' looks like it was orphaned (results never came back; last_check=1502208349; next_check=1502209321). I'm scheduling an immediate check of the service...

From gearman_top image attached. gearman_top

things are ticking and seem to run, values in the columns are constantly changing.

Strangely from Nagios web console only the host status is DOWN with the following: Status Information: | (host check orphaned, is the mod-gearman worker on queue 'host' running?)

all the service checks are returning fine but maybe not via the module but Nagios itself.

Any help would be greatly appreciated.

sni commented 7 years ago

libgearman.so.7.0.1 sym linked like so ln -s libgearman.so.7.0.1 libgearman.so.6 (otherwise lib not found error) gearman job server v1.0.6

Never do that. It's a miracle how that even starts. You need to install libgearman and gearmand from the labs repository as well: -> https://labs.consol.de/repo/stable/debian/dists/stretch/main/binary-amd64/

PitoneMaledetto commented 7 years ago

I have gearmand v 0.33 and libgearman.so.6.0.0. Now my queue is empty and everything is orphaned. gearman_top

Nagios log does not log orphaned statuses anymore but still no queues.

Is it because in nagios.cfg I state: broker_module=/usr/lib/mod_gearman/mod_gearman_nagios4.o config=/etc/mod-gearman/module.conf

instead of for example: broker_module=.../mod_gearman_naemon.o server=localhost:4730 eventhandler=yes services=yes hosts=yes config=.../module.conf

but does not the module take all the parameters form its configuration file anyway?

Does it matter that I created the user naemon just to install this packages: mod-gearman-module_3.0.5_debian9_amd64.deb mod-gearman-tools_3.0.5_debian9_amd64.deb mod-gearman-worker_3.0.5_debian9_amd64.d

or is there a way to install using nagios as a user? I don't think that's the problem anyway. Thanks

PitoneMaledetto commented 7 years ago

Is it because the workers are running as naemon? naemon 9993 1 0 20:43 ? 00:00:00 /usr/sbin/mod_gearman_worker -d --config=/etc/mod-gearman/worker.conf --pidfile=/var/run/mod-gearman/worker.pid

PitoneMaledetto commented 7 years ago

Started again from scratch and installed all of the following: -rw-r--r-- 1 root root 94868 Mar 27 23:18 gearman-job-server_0.33-6_debian9_amd64.deb -rw-r--r-- 1 root root 46668 Mar 27 23:18 gearman-tools_0.33-6_debian9_amd64.deb -rw-r--r-- 1 root root 49538 Mar 27 23:18 libgearman7_0.33-6_debian9_amd64.deb -rw-r--r-- 1 root root 23112 Mar 27 23:18 libgearman-dev_0.33-6_debian9_amd64.deb -rw-r--r-- 1 root root 284814 Jun 27 10:38 mod-gearman-module_3.0.4_debian9_amd64.deb -rw-r--r-- 1 root root 55162 Jun 27 10:38 mod-gearman-tools_3.0.4_debian9_amd64.deb -rw-r--r-- 1 root root 55374 Jun 27 10:38 mod-gearman-worker_3.0.4_debian9_amd64.deb

Still no joy....

PitoneMaledetto commented 7 years ago

Is mod-gearman now orphaned? Is this the right place to have support?

sni commented 7 years ago

How is you setup looking right now? From the previous posts it could be anything. Maybe its better to start fresh without any symlink hacks and workarounds. Or have a look at OMD-Labs which comes with Naemon and Mod-Gearman preconfigured already.

PitoneMaledetto commented 7 years ago

screenshot from 2017-10-03 13-18-33

Jobs waiting, no workers available no jobs running. I am baffled.

PitoneMaledetto commented 7 years ago

Now I get this, am I good to go?

host_name=localhost core_start_time=1507044578.0 start_time=1507044578.736922 finish_time=1507044578.738929 return_code=0 exited_ok=1 source=Mod-Gearman Worker @ dev-nagios-01 service_description=Swap Usage output=SWAP OK - 100% free (2047 MB out of 2047 MB) |swap=2047MB;0;0;0;2047\n

[2017-10-03 16:29:38][4322][TRACE] add_job_to_queue(check_results, (null), 2, 1, 1, 1) [2017-10-03 16:29:38][4322][TRACE] 307 --->host_name=localhost core_start_time=1507044578.0 start_time=1507044578.736922 finish_time=1507044578.738929 return_code=0 exited_ok=1 source=Mod-Gearman Worker @ dev-nagios-01 service_description=Swap Usage output=SWAP OK - 100% free (2047 MB out of 2047 MB) |swap=2047MB;0;0;0;2047\n

PitoneMaledetto commented 6 years ago

I have tried on Debian 9.2, Nagios 4.3.4 to install mod-gearman.

From source From Debian packages From unofficial packages provided by mod-gearman A mix of the above

I have re-compiled with the nagios 4 headers but nothing seems to work for me. I have decided to drop mod-gearman because I have invested hours on this without success. And to be honest documentation is mixed, some old/unsupported and even this support channel is not properly managed. I appreciate that is a free resource therefore thank you for your efforts.