josdejong / workerpool

Offload tasks to a pool of workers on node.js and in the browser
Apache License 2.0
2.1k stars 146 forks source link

Support workerData as a pool option for worker threads? #65

Open ianwalter opened 5 years ago

ianwalter commented 5 years ago

It would be cool to have a way to make general data available on worker initialization instead of just through worker function parameters. I can submit a PR.

Ref: https://nodejs.org/api/worker_threads.html#worker_threads_worker_workerdata

josdejong commented 5 years ago

Thanks for your suggestion @ianwalter. Do you have a concrete use case for this idea?

It could be interesting, though I would also love to keep the API the same for web workers, child_process, and worker threads, so let's think this through.

ianwalter commented 5 years ago

@josdejong The use case that made me think about this was wanting to set a log level for all of the workers to use based on the configuration parsed by the main thread.

I can understand wanting to keep the API the same but not sure how to do that or if it would be worth it. For my project, I'm all in on worker threads and not trying to support the child process fallback.

josdejong commented 5 years ago

Thanks for your explanation. Something like setting a log level could be relevant for browsers too, so maybe we can dig a bit deeper and think about a solution that will work in any environment.

Thinking aloud here: Maybe a hook like onWorkerCreated which allows you to perform some action after a worker is created, like in your case invoke a method setLogLevel on the worker or something like that.

sbrl commented 4 years ago

Hey there! This would be a really useful feature to have.

A workercreated event (e.g. pool.on("workercreated")) would be cool, but without a way to execute a function on specifically the new worker that was just created, it would be of limited help.

I suggest allowing an extra state object to be passed in when the pool is created. For example:

// master
let pool = workerpool.pool({
    // Other options go here
     state_info: { foo: 5, bar: "some_string" }
});
// worker
console.log("I have state info:", workerpool.state_info);

Context: For my PhD I am handling a large dataset. I need to parallelise the processing thereof, but to process it I need to pass in a reference 2D array that's complicated and potentially computationally expensive to initialise and create. To this end, I want to initialise it once on the master, and then pass it to all workers via (immutable) shared state

josdejong commented 4 years ago

Thanks for your input, that is a very simple and elegant approach @sbrl !

I have to double check if it's possible to expose the state directly as a property workerpool.state_info (or simply workerpool.state), or that we need a getter for it like workerpool.getState().

sbrl commented 4 years ago

Thanks, @josdejong! Either would be great if possible.

I think it's probably a good thing to encourage immutable shared state in particular, since lots of bugs can arise from having mutable shared state that's modified by multiple processes at the same time.

josdejong commented 4 years ago

I think it's probably a good thing to encourage immutable shared state in particular, since lots of bugs can arise from having mutable shared state that's modified by multiple processes at the same time.

Totally agree!

Anyone interested in implementing this feature?

adrfantini commented 3 years ago

I saw that there are two available undocumented options forkArgs and forkOpts, both passed to child_process.fork when the worker is created. Maybe these can be used to pass simple string data to the workers? As I understand it could be just an horrible workaround.

sbrl commented 3 years ago

pass simple string data

If only a string can be passed, I recommend automatic serialisation to JSON (though IIRC when sending an object to a different process/thread in Node it will serialise automatically; with potentially higher performance but I haven't tested that).

shtse8 commented 2 months ago

please consider this feature. as my worker needs to use the same registration to resolve component. but for registration, I need to program options (cli options) to build the registration. so the main thread needs to pass the data to worker thread before any operations.