nodejs / help

:sparkles: Need help with Node.js? File an Issue here. :rocket:
1.47k stars 282 forks source link

What is the correct way to kill a child process? #1389

Closed johnsoncodehk closed 6 years ago

johnsoncodehk commented 6 years ago

I used child_process.fork to generate some child processes and kill them with child.kill() when the work is done. My system always throws errors after running for a while. After some troubleshooting, I found that the longer my system is running, the more zombie processes are generated. The ppids of these processes are all 1, because child.kill() does not release the process from memory, just stops the process.

I read the documentation and found that this is the mechanism of the Linux system. The killed process needs to call wait/waitpid to wait for the process to finish, but wait/waitpid does not exist in child_process and node.

I found a lot of information, but I didn't find a way for me. I guess this is a wide-ranging problem. The solution should not be difficult. Have I missed something?

johnsoncodehk commented 6 years ago

I found the real reason.

child.kill() works fine, but the child thread creates a grandchild thread when I can't control it, so when killing the child thread, the grandchild thread is taken over by init and kept forever.

I plan to solve this problem in three directions:

  1. Limit the permissions of the child thread so that it cannot create a grandchild thread.
  2. Dynamically create a container to run the child thread, allow the child thread to create a grandchild thread inside the container, and delete the container when the work is completed. (It may be difficult to pass data)
  3. Log all processes at the beginning of the program and periodically kill processes that are not in the start list.

I don't currently know the implementation of option 1 and option 2, so I will solve it with option 3. If anybody has any ideas please tell me, thank you.

shellberg commented 6 years ago

What you describe is standard practice for UNIX - not just Linux - where the parent process ends prior to that of the child process. If the parent process terminates first, then the init job (PID 1) will inherit all such orphaned processes.

Zombie processes are the opposite where a child process ends first before the parent, and will exist only as an entry in the process table for any job (namely the parent) to ascertain its exit code status, which is what is acknowledged (read) during the wait; typical behaviour conducted during join. Zombies will persist if the parent does not acknowledge their ending - this is the fork/join model to manage child process creation. Independent child processes are created through fork/exec on UNIX, accomplishing the action of a spawn as practicised in other OSes.

It sounds like you have a combination of both of the above occurring!! Could your child processes be forking grandchildren too! (thus, distorting the lifecycle of the child?) Alternatively, should you multi-threaded child process work be conducted with daemon threads?

Perhaps some of the clustered child management work in npm modules like adios might be helpful for your review?

johnsoncodehk commented 6 years ago

thank you for your help!

Adios seems to be used to establish IPC channels between clusters, and my guess is not applicable to my situation. In order to make the problem clearer, I will explain my project, this is my project link, currently only a demo: Https://johnsoncodehk.github.io/codesearch/npm/

My project is a function search engine that finds functions that match the conditions based on input and output. The search engine filters the impossible function types and then creates a child thread to run all possible functions, including all functions of the child_process module. The reason for the problem is that when searching for parameters of the specified type and returning values, the fork (or similar function) of child_process enters the run list, and when running in the child thread, the grandchild thread is created. When the child thread finishes executing and deletes, the grandchild thread becomes the zombie process, and process.kill() cannot delete it.

PS: I can't simply filter the child_process module to solve this problem, because I can't prevent other npm packages from using child_process to create child threads unless I filter all npm packages that will create child threads.

johnsoncodehk commented 6 years ago

add another point: If the params correct, it may even be in the child thread to establish the same level as the parent thread, inherited to the init thread. (I can't taken care this problem)

shellberg commented 6 years ago

Fundamentally, it sounds like your problem concerns how you are choosing to decompose the problem (hierarchically), but then performing only the 'job' management at the top-level (root) of the hierarchy. But, with each level of the hierarchy capable of creating new sub-processes/tasks, effectively each node in the process tree you create has to manage the jobs under it - that is, you don't know a priori how many stages you will/must create in the task tree? Consequently, your approach to abruptly ending that task by killing it is too draconian. In decomposing your problem, yes, you are refining the work to be done, but each stage then has to accomplish a defined amount of work. As a consequence of doing that work, if it results in creating a further set of sub-tasks, then you should actually distribute the burden of managing that created set of work. Another option where the child actually has work delegated by the parent is that the parent must await the answer from the child(ren) before its own task is complete? In this case, it must listen and wait until the child completes its labours, and then the parent process lifecycle is actually longer than that of the child it fork/spawned (no orphaned child jobs). At the top level, if you are detecting the child has completed its labour, if you kill it, you also kill its role as being the manager for any tasks it created... and so on. That is, each stage of the decomposed hierarchy must become a worker and sub-task manager... If you explicitly kill the subtask/child, then you also are killing its ability to manage all the subtasks too. (Do you actually need to kill the child process? Surely, when it knows the task is complete/no further work to be done, the process can end naturally (it doesn't need to be killed/stopped by an outside agency)... except where it must now act as a manager for any other work created by it. And, it must be a task manager for as long as there are sub-tasks not ended... still a known finite amount of work that it can track to completion.)

But, fundamentally, this is a choice on how to tackle your problem. Instead of decomposing the work into a hierarchy of sub-tasks (and processes!), you could instead have a controller/master that farms work out to a pool of workers... These are strategic choices concerning how to solve the problem presented. They can be tackled in multiple different approaches, each with its own merits.

johnsoncodehk commented 6 years ago

Yes, this project can use a better hierarchy and I will plan to improve it.

Although the main reason is that I am killing subtasks at any time, there are two reasons why I have to kill subtasks:

  1. Some functions take too long to run.
  2. Some functions cause child threads to stop working and cannot catch errors via try/catch.

So when a function in a child thread runs for more than 10 milliseconds, I will kill the child thread, skip this function, and create a new child thread to execute the remaining functions.

The best way seems to be to create a container to run it, so I don't have to worry about the child thread doing anything, such as triggering process.kill(1) to try to kill the parent thread.

gireeshpunathil commented 6 years ago

is this addressed and resolved?

johnsoncodehk commented 6 years ago

Not yet... I may not be able to fix it, should I close this issue?

gireeshpunathil commented 6 years ago

thanks for the quick revert. let me review and see if I can provide some suggestions.

gireeshpunathil commented 6 years ago

how about:

in this way, there is no zombie in the whole system. Of course this shifts the potential of the issue from parent-child to child-grand-child probably - i.e, if the grand-child has further child process it spawns etc. But if your application is chained strictly 3 levels , this will be a pragmatic approach I guess.

johnsoncodehk commented 6 years ago

In other words, I need to check if there is a grandchild thread in the child thread and kill it. Your suggestion may be feasible, I will try, thank you.