Currently this library only supports the creation of node child processes through the use of child_process.fork().
I ran into a situation where I need to do extensive file processing on my web server. Initially I was looking for solutions in the Node.js space, but found that the same could be accomplished faster and more efficiently with external binaries through the command line (shell).
So, now instead of spawning a new node process (.fork()), I need to execute shell commands, which run the desired binaries. This can be done with .spawn() or .exec().
Workerpool currently does not support these child_process methods, so I cannot create a pool of exec workers, waiting for their shell commands.
The workaround is creating a fork child, which then spawns exec children of its own inside the provided function. I, however, do not fully understand the implications of this (what if there are errors, or if fork child dies unexpectedly, ...) and it feels very hacky. Additionally, it forces the server to spawn more child processes than necessary meaning more threads become occupied.
Ex.:
1xfork child -> spawns 3x exec children => 4 new child processes created === 4 threads
...
3xfork child -> spawns 9x exec children => 12 new child processes created === 12 threads
Instead we could have just spawned a pool of necessary exec children directly, avoiding the spawning of fork processes. For example: spawn a pool of 3 workers for each exec command => 9 workers, thus avoiding the 3 unnecessary fork workers, which are just used to kick off the execs.
Furthermore, the exec children cannot be re-used. Each time a fork worker gets a task, it would run the provided function, which would first create the 3 exec children (expensive) and then kill them. So for each task you would have to re-create the 3 exec child processes.
I have not looked at the codebase to understand whether this would be hard to implement, but I imagine most of the code would stay the same (error handling, queue consumption, etc...), only the spawn process and input type would be different, along with some options.
P.S.: The same logic applies if you are using worker_threads instead of child_processes to kick off the exec child_processes. As far as I am aware, the only difference is the shared memory of worker_threads, so if one crashes, so does the main thread (undesirable).
Currently this library only supports the creation of node child processes through the use of
child_process.fork()
.I ran into a situation where I need to do extensive file processing on my web server. Initially I was looking for solutions in the Node.js space, but found that the same could be accomplished faster and more efficiently with external binaries through the command line (shell).
So, now instead of spawning a new node process (
.fork()
), I need to execute shell commands, which run the desired binaries. This can be done with.spawn()
or.exec()
.Workerpool
currently does not support these child_process methods, so I cannot create a pool ofexec
workers, waiting for their shell commands.The workaround is creating a
fork
child, which then spawnsexec
children of its own inside the provided function. I, however, do not fully understand the implications of this (what if there are errors, or iffork
child dies unexpectedly, ...) and it feels very hacky. Additionally, it forces the server to spawn more child processes than necessary meaning more threads become occupied.Ex.: 1x
fork
child -> spawns 3xexec
children => 4 new child processes created === 4 threads ... 3xfork
child -> spawns 9xexec
children => 12 new child processes created === 12 threadsInstead we could have just spawned a pool of necessary
exec
children directly, avoiding the spawning offork
processes. For example: spawn a pool of 3 workers for eachexec
command => 9 workers, thus avoiding the 3 unnecessaryfork
workers, which are just used to kick off theexec
s.Furthermore, the
exec
children cannot be re-used. Each time afork
worker gets a task, it would run the provided function, which would first create the 3exec
children (expensive) and then kill them. So for each task you would have to re-create the 3exec
child processes.I have not looked at the codebase to understand whether this would be hard to implement, but I imagine most of the code would stay the same (error handling, queue consumption, etc...), only the spawn process and input type would be different, along with some options.
P.S.: The same logic applies if you are using worker_threads instead of child_processes to kick off the
exec
child_processes. As far as I am aware, the only difference is the shared memory of worker_threads, so if one crashes, so does the main thread (undesirable).Related: https://github.com/josdejong/workerpool/issues/261