Persistent context - Githubissues

wilk / microjob

A tiny wrapper for turning Node.js worker threads into easy-to-use routines for heavy CPU loads.

https://wilk.github.io/microjob/

MIT License

2.02k stars 47 forks source link

Persistent context #47

Open darky opened 5 years ago

darky commented 5 years ago

Need ability to pass some context firstly and then, it will be always available in workers pool. For example, CPU-intensive geo task - check point in polygons. Polygons so weight and every time serialize - deserialize it so expensive. Would be better to pass it firstly

await job(() => {
}, {persistentCtx: {polygons: [/* many-many polygons */]}});

And then on every job execute it always accessible:

await job(() => {
  polygons // it accessible here yet.

}, {data: {point: [12.3434, 56.3434]}});

manuel-di-iorio commented 5 years ago

darky commented 5 years ago

https://github.com/wilk/microjob/pull/48 PR

wilk commented 5 years ago

@darky Thanks for this issue!

Well, let me check if I got it rightly: you need a global bucket shared between worker threads to avoid multiple massive serialisations/deserialisations, correct? This could be done with SharedArrayBuffer (shared memory) by you. However, yes, it could be a useful feature to embed in microjob.

Anyway, your PR is moving the serialisation/deserialisation problem from the user to the core: https://github.com/darky/microjob/commit/67c21aec41ec0ddc3903d6f28cfaae490e41fc95#diff-c9253097723f89dd4716748fab2e00cdR108 Every time the user invokes job, the whole persistentCtx gets serialised and sent via postMessage and then deserialised from the worker thread. I think a good solution could be to pass a global shared context from an external facade, convert it to a SharedArrayBuffer and then convert it back with a proper getter from the worker. I wouldn't use the job interface to define a global context: it's ambiguous.

darky commented 5 years ago

Every time the user invokes job, the whole persistentCtx gets serialised and sent via postMessage and then deserialised from the worker thread.

It occurred once at first time, after it always available via https://github.com/darky/microjob/commit/67c21aec41ec0ddc3903d6f28cfaae490e41fc95#diff-5bfbc2def8d97c3939b537c3f6f31b3eR3

I think a good solution could be to pass a global shared context from an external facade, convert it to a SharedArrayBuffer and then convert it back with a proper getter from the worker.

Can you please provide little example, also you can close #42 via it example :)

darky commented 5 years ago

I wouldn't use the job interface to define a global context: it's ambiguous.

Yep, agree. Maybe better to use start function for this purpose?

r3wt commented 4 years ago

Yep, agree. Maybe better to use start function for this purpose?

In this scenario, would persistentCtx be mutable (from within a job for example)?

I have a bit of a weird use case:

in one job that runs every N minutes, some data is passed in via context, and the synchronous algorithm builds a sharded index based on the data, then returns it from the job to the main thread.
this index is stored in memory along with the data, where a synchronous search algorithm uses the index and data to compute search results.

Ideally i'd like to be able to do the following:

keep both index and data in persistent state of the job (mutable)
run the search algorithm inside of jobs, instead of in main thread as it is now

unfortunately the serialization cost is too high without persistent state, and idea the state would be mutable would be advantageous, otherwise i'd have to stop and start a new worker pool everytime i need to update the dataset.

darky commented 4 years ago

@r3wt #48 PR can satisfy your needs about mutation