Closed josecelano closed 5 months ago
A zombie process, also known as a defunct process, is a state that occurs in a Unix-like operating system when a process finishes execution, but its entry remains in the process table. In simpler terms, it's a process that has completed its execution but still has an entry in the process table because its parent process hasn't yet retrieved its exit status.
When a process finishes its execution, it typically sends an exit status to its parent process, indicating its completion. The parent process is then responsible for reading this exit status via system calls like
wait()
orwaitpid()
. Once the parent process retrieves the exit status, the zombie process is removed from the process table, and its resources are released.However, if the parent process fails to retrieve the exit status of its child processes (perhaps because it's busy with other tasks or has terminated without cleaning up its child processes), the child process enters a zombie state. In this state, the process table entry remains, but the process itself is essentially defunct; it occupies virtually no system resources, except for its entry in the process table.
Zombie processes are usually harmless by themselves and don't consume significant system resources. However, having too many zombie processes can indicate a problem with process management, such as a bug in the parent process or a resource exhaustion issue. Therefore, while individual zombie processes are not a cause for concern, a large number of them may require investigation and remediation.
ChatGPT
I think we should check that healthcheck binaries end correctly in all cases. However, it looks like, in this case, the reason could be the parent process "fails to retrieve the exit status"·
Relates to: https://github.com/torrust/torrust-demo/issues/1
I'm trying to fix this issue on the live demo server. The tracker container restarts every 2 hours because of the healthcheck. I'm still trying to figure out what is happening. However, I've noticed a lot of zombie processes. This may or may not be related to the periodic restart.
Some minutes after restarting the server, you see a lot of zombie processes.
This is the server 3 hours after restarting the tracker container:
As you can see the tracker is unhealthy. Running
top
gives you this output:87 zombie processes but I've seen more in other cases. That output is where the server is already too busy swapping. Before reaching that point you get an output like this:
You can see how zombie processes have increased.
I have also listed the zombie processes:
Those processes are a child of the main torrust tracker, index and index-gui processes.
These are the parent processes:
In the past, we had a similar problem:
And it was solved by adding timeouts. That could be the reason for the healthcheck zombies, but I have not idea but the problem is with the node webserver (
495006 494983 /nodejs/bin/node /app/.output/server/index.mjs
). I guess the webserver is launching threads to handle requests but they are not finishing correctly.