Closed chifac08 closed 2 years ago
Which OMD version is that? The latest release does not use this Worker anymore (only the neb module). The worker has been rewritten from scratch here https://github.com/ConSol/mod-gearman-worker-go/ That one is also enabled by default in OMD (at least since the 3.20)
Version: omd-3.2 I am aware that there is a new mod_gearman written in Golang but i prefer the C Version because the monitoring scripts were written for it. I managed to eliminate the behavior with the above mentioned method when the service is not terminated by an SIGALRM.
When a worker finishes or exits arbitrary, there is a small time window where Linux detects the process (child) as a Zombie. This happens after the child exits and before the method clean_worker_exit is called.
Let me show you a short demonstration:
Wait until all Worker are up and running. Now start a simple bash Script to monitor our Zombies:
COUNTER=0; while true; do ps ax | grep -v grep | grep defunc; if [[ $? -eq 0 ]] then echo $COUNTER ((COUNTER++)) fi sleep 10 done
As you can see, there will be a lot of "defunc" marked worker processes.
I know, that you stop all children when you call the clean_exit method but that won't work for a single process.
I suggest implementing the following method in worker.c:
and of course we also have to install a signal handler that catches the SIGCHLD Signal from the child.
method make_new_child:
signal(SIGCHLD, child_exit);
The live of every Zombie may only last some seconds before he gets wiped out and therefore we do not need to worry about resources and free process ids but I would be delighted if you could fix it because my monitoring software complains about it.
if you need any further information, feel free to contact me!
Thanks!