Closed shamefulCake1 closed 1 month ago
Thanks. I've always thought this would be possible but it has never been reported in 25 years. There is a place between two lines of code at the top of the event loop where a signal would be lost if it arrived (and there's a comment there to that effect). But it's a very small time window (between handling signals and starting select
).
Are you able to give me specific instructions to reproduce it? What exactly do you mean by "fails to start". Presumably, it must start in order to become a process that can become a zombie. Does it start and exit quickly? It would really help if you could provide a complete command and the resulting process details (like ps
or pstree
output perhaps).
If it's what I think it is, I can fix it by giving the event loop a timeout to give it a chance to respond to a SIGCHLD
. I haven't done that because it's extra cpu cycles that would mostly be a waste. But if I can test it, and the fix works, then I'll know I'm right. Actually, I should use pselect
which is standard.
But I really need to be able to reproduce this so I'm not guessing what the reason is. The fact that you are showing two daemon processes (rather than daemon and the client process) makes me wonder if it is something else.
1.
What exactly do you mean by "fails to start". Presumably, it must start in order to become a process that can become a zombie. Does it start and exit quickly?
When command
is a non-existent path and a few arguments.
2. When I had created this issue I didn't yet realize that the issue might have been due to the sanitizer rather than the program itself.
So, I compiled daemon
with leak sanitizer, which somehow (I don't think using a thread), spawns llvm-symbolizer
as a child process, and in my case the parent daemon
had a zombie daemon
child, and the llvm-symbolizer
child. Presumably llvm-symbolizer
wouldn't exit until daemon
ripes all its children, and daemon
wouldn't ripe the child daemon
until it is the only remaining child, or something like that.
But maybe this is an issue with llvm-symbolizer
, and not with daemon
, so maybe this issue should be just closed.
OK. Thanks.
I have seen this when the target service fails to start from the beginning.
The process tree would look like:
And that is it. Parent daemon never ends up successfully
waitpid
' ing the child.