josdejong / workerpool

Offload tasks to a pool of workers on node.js and in the browser
Apache License 2.0
2.06k stars 147 forks source link

child_process.spawn() or child_process.exec() #389

Closed Zirafnik closed 1 year ago

Zirafnik commented 1 year ago

Currently this library only supports the creation of node child processes through the use of child_process.fork().

I ran into a situation where I need to do extensive file processing on my web server. Initially I was looking for solutions in the Node.js space, but found that the same could be accomplished faster and more efficiently with external binaries through the command line (shell).

So, now instead of spawning a new node process (.fork()), I need to execute shell commands, which run the desired binaries. This can be done with .spawn() or .exec().

Workerpool currently does not support these child_process methods, so I cannot create a pool of exec workers, waiting for their shell commands.

The workaround is creating a fork child, which then spawns exec children of its own inside the provided function. I, however, do not fully understand the implications of this (what if there are errors, or if fork child dies unexpectedly, ...) and it feels very hacky. Additionally, it forces the server to spawn more child processes than necessary meaning more threads become occupied.

Ex.: 1xfork child -> spawns 3x exec children => 4 new child processes created === 4 threads ... 3xfork child -> spawns 9x exec children => 12 new child processes created === 12 threads

Instead we could have just spawned a pool of necessary exec children directly, avoiding the spawning of fork processes. For example: spawn a pool of 3 workers for each exec command => 9 workers, thus avoiding the 3 unnecessary fork workers, which are just used to kick off the execs.

Furthermore, the exec children cannot be re-used. Each time a fork worker gets a task, it would run the provided function, which would first create the 3 exec children (expensive) and then kill them. So for each task you would have to re-create the 3 exec child processes.

I have not looked at the codebase to understand whether this would be hard to implement, but I imagine most of the code would stay the same (error handling, queue consumption, etc...), only the spawn process and input type would be different, along with some options.

P.S.: The same logic applies if you are using worker_threads instead of child_processes to kick off the exec child_processes. As far as I am aware, the only difference is the shared memory of worker_threads, so if one crashes, so does the main thread (undesirable).

Related: https://github.com/josdejong/workerpool/issues/261

josdejong commented 1 year ago

Thanks for your input. Let's continue the discussion and think through what would be required to support spawn and exec in #261 OK?