nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
104.97k stars 28.42k forks source link

workers: initial implementation #2133

Closed petkaantonov closed 7 years ago

petkaantonov commented 8 years ago

https://github.com/nodejs/io.js/pull/1159 retargeted to master

trevnorris commented 8 years ago

@TheAlphaNerd Being that one of the reasons this wasn't accepted is because it's too large a change on its own, could it be helpful to you to break up the changes into more discrete commits while rebasing?

JamesMGreene commented 8 years ago

WebWorkers, yes please. :+1:

petkaantonov commented 8 years ago

I believe there is 2 separate parts. First part is all the stuff outside WorkerContext.cc/h/js and the new tests. Those have been reviewed. Second part is WorkerContext.cc/h/js and the new tests, which have not been reviewed much at all (except for the internal worker.js file). The second part cannot really be broken down.

petkaantonov commented 8 years ago

In the first part there is also a ton of changes that are not any change at all but extraction of some class to a header file and implementation file. Those should be easy to separate to different commits. The meat of the PR isn't actually very large.

petkaantonov commented 8 years ago

Is it acceptable to separate the commits in the PR rather than making separate PRs?

trevnorris commented 8 years ago

@petkaantonov That will help, but it would be more helpful to have them in separate PRs since discussion will be more focused and if there is any concerns about part of the implementation that won't prevent other parts from landing.

petkaantonov commented 8 years ago

@TheAlphaNerd I have successfully rebased now.

dherman commented 8 years ago

@petkaantonov Sorry if I'm intruding but I wanted to ask if this PR supports (or, if not, if it could support) a per-worker variant of node::AtExit? I am working on a codebase that needs the ability to allocate some data associated with an isolate and then delete that data when the isolate is torn down.

Hope this isn't the wrong place to ask, but feel free to send me elsewhere if so!

fbender commented 8 years ago

@Dave: It does not seem to be in the Web Worker spec, so in the interest of consistency, I'd vote against it. However, you may want to talk to the WHATWG since this seems to be an interesting idea.

Workaround: send your own shutdown event for nominal shutdown and listen for error events for off-nominal shutdowns. Also, use polling messages and clean up after the worker has not responded for X polls after Y seconds.

Von meinem Mobilgerät gesendet

Am 12.02.2016 um 03:24 schrieb Dave Herman notifications@github.com:

@petkaantonov Sorry if I'm intruding but I wanted to ask if this PR supports (or, if not, if it could support) a per-worker variant of node::AtExit? I am working on a codebase that needs the ability to allocate some data associated with an isolate and then delete that data when the isolate is torn down.

Hope this isn't the wrong place to ask, but feel free to send me elsewhere if so!

— Reply to this email directly or view it on GitHub.

petkaantonov commented 8 years ago

@dherman Yes I dont see why not

@fbender this already has several features not in the ww spec. it is just ww api inspired not the same spec.

dherman commented 8 years ago

@fbender Sorry if I wasn't clear -- I'm talking about a C++ API for addons, not a JS API.

dherman commented 8 years ago

@petkaantonov Thanks for being open to this -- I've written up a proposed API for per-isolate exit hooks. Not sure if that's the best place to propose this API but I'll see what feedback I get.

petkaantonov commented 8 years ago

Changes are now in separate commits

kzc commented 8 years ago

Thanks for the clarification on that part, but even in a 1:1 producer/consumer scenario with a single child worker what prevents the race I described above from occurring executing events out of order in the parent/owner? (pardon my accidental non-inline response).

petkaantonov commented 8 years ago

If the last slot in the primary is refilled in the course of processing the primary PopFront while loop

The primary queue is atomic, either the producer sees full queue and puts it in secondary queue or the consumer will see the inserted item in its next loop

kzc commented 8 years ago

The primary queue is atomic

Each individual primary queue PushBack and PopFront is atomic. But that doesn't prevent the race condition outlined above involving the backup queue.

petkaantonov commented 8 years ago

You are right but it's not a race condition or synchronization problem, it's just everyday faulty program logic.

For instance if the primary queue is full and secondary queue has some items and only now consumer processing starts. Now the primary queue slots are being freed by the processing and further messages from producer will be placed on primary queue even though they should only come after secondary queue.

petkaantonov commented 8 years ago

Perhaps it should just be the primary queue with larger size and making producer postMessage block until the consumer can consume. With the queue size configurable by applications they can optimize it best.

petkaantonov commented 8 years ago

Damn, this appears to be a thing https://en.wikipedia.org/wiki/Producer%E2%80%93consumer_problem

kzc commented 8 years ago

Synchronizing two things without a mutual lock is a very difficult problem. I only examined your code to see how you tried to solve it.

Perhaps it should just be the primary queue with larger size and making producer postMessage block until the consumer can consume

That can work. The devil is in the details.

matheusmoreira commented 8 years ago

Hello,

I'd like to know the current status of this feature. I'm confused by the pull request's comment history.

From what I understood, it is a complex implementation, and thus should be split into smaller change sets. I'm not sure about where all the work is happening, if it is happening, or if and how I can help. All references I know of a Node.js worker implementation take me to this discussion.

spion commented 8 years ago

Based on the tempo, I'd estimate its safe to say its "not going to happen"

edit: Apparently it is. I remain sceptical for now.

piranna commented 8 years ago

It's a shame, the concept was really beautiful... :'-( El 31/5/2016 6:37 PM, "Gorgi Kosev" notifications@github.com escribió:

Based on the tempo, I'd estimate its safe to say its "not going to happen"

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nodejs/node/pull/2133#issuecomment-222745748, or mute the thread https://github.com/notifications/unsubscribe/AAgfvg67sqTmFwAeVKYkOHjYWooBgR5Kks5qHGPLgaJpZM4FUYS9 .

mitar commented 8 years ago

@piranna: Check out: https://github.com/audreyt/node-webworker-threads

bnoordhuis commented 8 years ago

Getting this merged in one shape or another is on my todo list for v7. In fact, I hope to have the first patches ready for review later this week.

Caveat emptor: "this" should be understood as "this feature", not necessarily "this exact pull request".

matheusmoreira commented 8 years ago

@bnoordhuis: I'm happy to hear that! When the new patches are ready, where shall we head for updates on the feature's progress? Will there be a new pull request or issue?

Fishrock123 commented 8 years ago

@bnoordhuis could you make a meta-tracking issue for these like I have done with the stdio things? e.g. https://github.com/nodejs/node/issues/6980

matheusmoreira commented 8 years ago

Any updates on the patches or the meta-tracking issue?

I'm also wondering if there will be any limitations placed on code executing in a worker thread.

basarat commented 8 years ago

What kinds of objects can or can't be communicated through the event loop messaging system

only JSON serializable can be communicated.

Are there any functions or operations which can't be used?

No functions can be passed around. At least not reliably (.toString is not the right solution).

matheusmoreira commented 8 years ago

@basarat commented:

No functions can be passed around. At least not reliably (.toString is not the right solution).

I realize that using JSON as the underlying serialization method prevents passing functions as data. Issue #6300 is relevant.

What I meant to ask is whether there will be anything you can do in Node's event loop thread that you can't do in a worker thread. I'm wondering what subset of Node.js functionality will be usable within a worker.

For example, will worker threads:

With child_process it is clear what will happen. It is a full, independent Node.js instance and works just like one; the only real limitation is in the process communications mechanism. The semantics of threads in Node.js don't seem so clear to me.

bnoordhuis commented 8 years ago

For example, will worker threads:

Have their own event loops? Be able to start other workers? Be able to spawn a child process? Be able to require modules

Yes.

and native addons?

Maybe, it's complicated. Add-ons would have to opt-in. The mechanism already kind of exists (NODE_MODULE_CONTEXT_AWARE) but no existing add-ons (to my knowledge) use it.

dead-claudia commented 8 years ago

@bnoordhuis I'd suspect that if worker (or any kind of threading) support makes it into core, several add-ons will start using it.

hax commented 8 years ago

@isiahmeadows Workers should be in core to support transfer ArrayBuffer or other resources in Node. But I'm not sure whether this PR implement these features.

siriux commented 7 years ago

@bnoordhuis Is there any place to follow your work on integrating webworkers for v7?

HyeonuPark commented 7 years ago

As Atomics and SharedArrayBuffer api landed in stage 2 of tc39 process and v8 is implementing it, I think nodejs should have thread apis in any form as it's core module, to support shared memory correctly.

bnoordhuis have you checked that sharedmem api? Can it be possible with your implementation?

HyeonuPark commented 7 years ago

@bnoordhuis have you checked that sharedmem api? Can it be possible with your implementation?

I' just wonder why i used # instead @ :P

dead-claudia commented 7 years ago

I'll point out that this is a really old PR, and it would be easiest to rewrite it from scratch again.

On Tue, Dec 6, 2016, 04:25 Cogery bot notifications@github.com wrote:

[image: review status] http://127.0.0.1:7000/review/nodejs/node/2133/

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nodejs/node/pull/2133#issuecomment-265101154, or mute the thread https://github.com/notifications/unsubscribe-auth/AERrBFy1afzdViBRi8x8E-DgUUVsBAklks5rFSn8gaJpZM4FUYS9 .

bnoordhuis commented 7 years ago

I've been working on and off on a rewrite of this pull request but after discussion with other collaborators I've come to the conclusion that multi-threading support adds too many new failure modes for not enough benefit.

The primary motivation for the rewrite was improved IPC performance but I'm fairly confident by now that we can also accomplish that using more traditional means like shared memory and more efficient serialization.

I'll go ahead and close this. Thanks for your hard work, Petka.

GnorTech commented 7 years ago

For those who want to write Node.js code in multithread program: NW.js implemented this by enabling Node.js in Web Workers: https://nwjs.io/blog/v0.18.4/

pemrouz commented 7 years ago

The primary motivation for the rewrite was improved IPC performance but I'm fairly confident by now that we can also accomplish that using more traditional means like shared memory and more efficient serialization.

Hi @bnoordhuis. Can I ask what the latest plan/thinking is for shared memory in Node (i.e. implement workers, or somehow allow transferring SharedArrayBuffers with cluster, or different API altogether)? The latest version seems to have SharedArrayBuffer (and Atomics), but there is no way to currently use this iiuc? Also, what would be the best the way to help out with this?

addaleax commented 7 years ago

I've come to the conclusion that multi-threading support adds too many new failure modes for not enough benefit.

Also… could you mention what exactly it is that you have discarded? Multi-threading with the full Node API available in each thread, or something more lightweight like a WebWorkers-style API?

rsp commented 7 years ago

@addaleax I tried to summarize the state of this issue as well as the different types of concurrency and their pros and cons in the context of Node, and I also kept posting updates about this pull request (mostly thanks to comments from @matheusmoreira - thanks for that) in this answer on Stack Oveflow:

Which would be better for concurrent tasks on node.js? Fibers? Web-workers? or Threads?

If anything is incorrect or outdated please let me know.

bnoordhuis commented 7 years ago

Can I ask what the latest plan/thinking is for shared memory in Node (i.e. implement workers, or somehow allow transferring SharedArrayBuffers with cluster, or different API altogether)?

Shared memory is not my primary focus right now, reducing the overhead of serializing/deserializing is. I ran a lot of benchmarks and in most non-contrived cases the overhead of converting to and from JSON is significantly greater (as in 70/30 or 80/20 splits) than sending it to another process.

Once I get the overhead of serdes down, I'm going to look into shared memory support. It's a minefield of platform-specific quirks and limitations so it's probably going to take a while to get it merged in libuv and iron out the bugs. If you want to help out, this is probably a good place to start.

V8 5.5 or 5.6 will make it a lot easier to do efficient serdes so that's what I'm currently waiting for.

could you mention what exactly it is that you have discarded? Multi-threading with the full Node API available in each thread, or something more lightweight like a WebWorkers-style API?

The former, the node-with-threads approach. WebWorkers-style parallelism is still an option and not terribly hard to implement but I didn't see a point in pursuing that in core, there are already add-ons that do.

dead-claudia commented 7 years ago

@bnoordhuis

The former, the node-with-threads approach. WebWorkers-style parallelism is still an option and not terribly hard to implement but I didn't see a point in pursuing that in core, there are already add-ons that do.

That'd be useful except none of the modules I've seen using true threads (instead of processes) actually support require in any way, which makes it way harder to scale. (More specifically, they currently can't, because there's no way to atomically modify the require-related caches via different threads. It has to be moved into C++ land for that to be possible, thanks to V8's lack of thread safety.)

ronkorving commented 7 years ago

@bnoordhuis

V8 5.5 or 5.6 will make it a lot easier to do efficient serdes so that's what I'm currently waiting for.

Out of curiosity, could you elaborate as to why this is?

addaleax commented 7 years ago

V8 5.5 or 5.6 will make it a lot easier to do efficient serdes so that's what I'm currently waiting for.

Out of curiosity, could you elaborate as to why this is?

I’m pretty sure Ben is referring to the added ValueSerializer and ValueDeserializer classes

ronkorving commented 7 years ago

@addaleax Ah nice, a serializer that doesn't use JSON strings?

Sidenote: Something cluster's messaging could make use of too I imagine (would be nice as it's quite slow now imho).

addaleax commented 7 years ago

Ah nice, a serializer that doesn't use JSON strings?

I mean, I haven’t used it myself yet, but that’s sure what it sounds like. :smile:

Something cluster's messaging could make use of too I imagine (would be nice as it's quite slow now imho).

Yeah, I had that thought, too. But as far as the current slowness is concerned: https://github.com/nodejs/node/pull/10557 seems to fix quite a bit of that. :)

NawarA commented 7 years ago

Is this still being worked on?

jasnell commented 7 years ago

@NawarA ... not at this time. If someone wanted to volunteer to pick it up, I think that would be welcome, but it's quite a task and there would be much to do.