Closed johnsoncodehk closed 6 years ago
I found the real reason.
child.kill()
works fine, but the child thread creates a grandchild thread when I can't control it, so when killing the child thread, the grandchild thread is taken over by init and kept forever.
I plan to solve this problem in three directions:
I don't currently know the implementation of option 1 and option 2, so I will solve it with option 3. If anybody has any ideas please tell me, thank you.
What you describe is standard practice for UNIX - not just Linux - where the parent process ends prior to that of the child process. If the parent process terminates first, then the init
job (PID 1) will inherit all such orphaned processes.
Zombie processes are the opposite where a child process ends first before the parent, and will exist only as an entry in the process table for any job (namely the parent) to ascertain its exit code status, which is what is acknowledged (read) during the wait
; typical behaviour conducted during join
. Zombies will persist if the parent does not acknowledge their ending - this is the fork/join model to manage child process creation. Independent child processes are created through fork/exec on UNIX, accomplishing the action of a spawn as practicised in other OSes.
It sounds like you have a combination of both of the above occurring!! Could your child processes be forking grandchildren too! (thus, distorting the lifecycle of the child?) Alternatively, should you multi-threaded child process work be conducted with daemon threads?
Perhaps some of the clustered child management work in npm modules like adios might be helpful for your review?
thank you for your help!
Adios seems to be used to establish IPC channels between clusters, and my guess is not applicable to my situation. In order to make the problem clearer, I will explain my project, this is my project link, currently only a demo: Https://johnsoncodehk.github.io/codesearch/npm/
My project is a function search engine that finds functions that match the conditions based on input and output.
The search engine filters the impossible function types and then creates a child thread to run all possible functions, including all functions of the child_process module.
The reason for the problem is that when searching for parameters of the specified type and returning values, the fork (or similar function) of child_process enters the run list, and when running in the child thread, the grandchild thread is created.
When the child thread finishes executing and deletes, the grandchild thread becomes the zombie process, and process.kill()
cannot delete it.
PS: I can't simply filter the child_process module to solve this problem, because I can't prevent other npm packages from using child_process to create child threads unless I filter all npm packages that will create child threads.
add another point: If the params correct, it may even be in the child thread to establish the same level as the parent thread, inherited to the init thread. (I can't taken care this problem)
Fundamentally, it sounds like your problem concerns how you are choosing to decompose the problem (hierarchically), but then performing only the 'job' management at the top-level (root) of the hierarchy. But, with each level of the hierarchy capable of creating new sub-processes/tasks, effectively each node in the process tree you create has to manage the jobs under it - that is, you don't know a priori how many stages you will/must create in the task tree? Consequently, your approach to abruptly ending that task by killing it is too draconian.
In decomposing your problem, yes, you are refining the work to be done, but each stage then has to accomplish a defined amount of work. As a consequence of doing that work, if it results in creating a further set of sub-tasks, then you should actually distribute the burden of managing that created set of work. Another option where the child actually has work delegated by the parent is that the parent must await the answer from the child(ren) before its own task is complete? In this case, it must listen and wait until the child completes its labours, and then the parent process lifecycle is actually longer than that of the child it fork/spawned (no orphaned child jobs).
At the top level, if you are detecting the child has completed its labour, if you kill it, you also kill its role as being the manager for any tasks it created... and so on. That is, each stage of the decomposed hierarchy must become a worker and sub-task manager... If you explicitly kill the subtask/child, then you also are killing its ability to manage all the subtasks too.
(Do you actually need to kill
the child process? Surely, when it knows the task is complete/no further work to be done, the process can end naturally (it doesn't need to be killed/stopped by an outside agency)... except where it must now act as a manager for any other work created by it. And, it must be a task manager for as long as there are sub-tasks not ended... still a known finite amount of work that it can track to completion.)
But, fundamentally, this is a choice on how to tackle your problem. Instead of decomposing the work into a hierarchy of sub-tasks (and processes!), you could instead have a controller/master that farms work out to a pool of workers... These are strategic choices concerning how to solve the problem presented. They can be tackled in multiple different approaches, each with its own merits.
Yes, this project can use a better hierarchy and I will plan to improve it.
Although the main reason is that I am killing subtasks at any time, there are two reasons why I have to kill subtasks:
So when a function in a child thread runs for more than 10 milliseconds, I will kill the child thread, skip this function, and create a new child thread to execute the remaining functions.
The best way seems to be to create a container to run it, so I don't have to worry about the child thread doing anything, such as triggering process.kill(1)
to try to kill the parent thread.
is this addressed and resolved?
Not yet... I may not be able to fix it, should I close this issue?
thanks for the quick revert. let me review and see if I can provide some suggestions.
how about:
YIELD
)in this way, there is no zombie in the whole system. Of course this shifts the potential of the issue from parent-child to child-grand-child probably - i.e, if the grand-child has further child process it spawns etc. But if your application is chained strictly 3 levels , this will be a pragmatic approach I guess.
In other words, I need to check if there is a grandchild thread in the child thread and kill it. Your suggestion may be feasible, I will try, thank you.
I used
child_process.fork
to generate some child processes and kill them withchild.kill()
when the work is done. My system always throws errors after running for a while. After some troubleshooting, I found that the longer my system is running, the more zombie processes are generated. The ppids of these processes are all 1, becausechild.kill()
does not release the process from memory, just stops the process.I read the documentation and found that this is the mechanism of the Linux system. The killed process needs to call
wait
/waitpid
to wait for the process to finish, butwait
/waitpid
does not exist in child_process and node.I found a lot of information, but I didn't find a way for me. I guess this is a wide-ranging problem. The solution should not be difficult. Have I missed something?