josdejong / workerpool

Offload tasks to a pool of workers on node.js and in the browser
Apache License 2.0
2.09k stars 146 forks source link

Sharing streams with workers from the main thread #347

Closed makivlach closed 2 years ago

makivlach commented 2 years ago

This is more a question than an issue but I wonder if there is a way to share streams with workers from the main thread or should the workers manage their own streams.

My current implementation is that each worker has its own open stream (e.g.: to write to a file).

josdejong commented 2 years ago

I don't think that is possible. Also, I think you cannot reuse a single stream multiple times, so you would have to open a new stream anyway? Or am I overlooking something?

My current implementation is that each worker has its own open stream (e.g.: to write to a file).

That makes sense to me. Do you have the feeling it's not a good solution? If so why?

makivlach commented 2 years ago

I have a loop for 31400 XML files that I need to process and then save the output to 8 files. I can't know which files should I write into beforehand until I process data from those XML files. So, to write into those files concurrently, each worker has open 8 streams to each output file.

There is probably nothing serious about that. It just seemed wasteful when I was programming it. I am probably just too used to Golang channels. Now that I am thinking about it, it would probably get bottlenecked anyways if only one pair of 8 streams were used for all workers though.

josdejong commented 2 years ago

Thanks for your explanation. Don't you get problems when multiple workers write simultaneously to the same file? I would think that you should only open one stream to one file. Node.js does not block when reading/writing to files. Most value of workers is in executing CPU heavy operations which block the main thread. Maybe it would work to have the main thread do all the reading/writing to files, and use the workers only for the heavy processing part. Not sure though if that makes sense for your use case.

makivlach commented 2 years ago

Haha, It seemed also a bit off to me! 😁 unfortunately, I think this is my only choice for my use case since I need to work on multiple files simultaneously to utilize most power from the processor since most files that I need to work on are small in size but huge in amount. I think there should probably be not a problem with writing since all streams should be just appending to the files. I shall take a further look into it tomorrow. Thank your for you help so far!

josdejong commented 2 years ago

👍 sounds like a fun challenge at least. I'll close this issue now.