workers: initial implementation

petkaantonov commented 8 years ago

https://github.com/nodejs/io.js/pull/1159 retargeted to master

[x] addressed inline comments in the old PR
[x] io.js master merged the required libuv fix regarding to closing stdio handles on Windows
[x] addressed use-after-free bug as described in https://github.com/nodejs/io.js/pull/1159#issuecomment-96914818
[x] make it possible to run fs tests which use a tmp directory in parallel using workers

alubbe commented 8 years ago

Awesome, thank you for picking this up!

petkaantonov commented 8 years ago

Implemented process.threadId which is always 0 for the main thread and > 0 for workers.

Implemented data option where you can pass initial data to the worker (process.env cannot be used since it's process wide). The passed data is available in process.workerData inside a worker. This is needed for running fs and network tests in parallel when using workers.

Implemented eval option, a boolean that you can set to true if you want the first argument to be evaluated as code rather than loading it as a file.

Fishrock123 commented 8 years ago

@petkaantonov "io.js master merged the required libuv fix regarding to closing stdio handles on Windows"

Could you provide a link to the libuv issue up there? :)

piscisaureus commented 8 years ago

io.js master merged the required libuv fix regarding to closing stdio handles on Windows

That happened: https://github.com/libuv/libuv/compare/60e515d...c619f37. I believe a libuv release is also imminent.

petkaantonov commented 8 years ago

@piscisaureus yeah the task means that current deps/uv in master doesn't contain the changes

petkaantonov commented 8 years ago

@kzc The issue you reported was actually known all along in this comment:

// Deleting WorkerContexts in response to their notification signals
// will cause use-after-free inside libuv. So the final `delete this`
// call must be made somewhere else

"somewhere else" means queuing the delete this call asynchronously on the main thread event loop. And of course this fails when the owner thread == main thread.

A significantly simpler solution (without this problem) now occurs to me where WorkerContexts to be deleted would be pushed to a global cleanup queue which is looped through in-between event loop iterations on the main event loop.

kzc commented 8 years ago

@petkaantonov - it's been some time since I looked at your thread worker code but I vaguely recall it already had a cleanup queue that was intended be called asynchronously. The problem was a nested event loop handler within a dispose function. Nesting event loops is something that really should be avoided - it creates complexity and it is difficult to reason about the ordering of events and the correctness of a solution.

petkaantonov commented 8 years ago

@kzc btw did you want to know the worker's threadId from the worker object on its owner thread as well? As in worker.threadId?

And yeah I'll change process.threadId to process.tid for better symmetry with process.pid :)

kzc commented 8 years ago

For my needs just having process.threadId is sufficient.

This brings up a good point - in your implementation can a given worker instance potentially be scheduled on different threads during its lifetime, or are workers always pinned to a specific thread? If not pinned, the worker instance could benefit from having a unique worker id (never reused for the lifetime of the process across all workers), which is different than a threadId.

petkaantonov commented 8 years ago

Worker is exclusively tied to a thread. I am not sure what benefit there would be from being able to schedule it on different threads, it would be very complex to implement as you need to facilitate the ownership transfer of a v8 isolate and so on.

However tying a worker to a specific CPU core will be possible if/when libuv merges https://github.com/libuv/libuv/pull/280.

petkaantonov commented 8 years ago

The use-after-free and nested event loops should be fixed now

kzc commented 8 years ago

@petkaantonov - Just curious... instead of posting delete tasks to the main thread with QueueWorkerContextCleanup() and CleanupWorkerContexts(), why don't you delete the WorkerContext at the end of WorkerContext::RunWorkerThread() when the worker thread's event loop is guaranteed to have finished?

void WorkerContext::RunWorkerThread(void* arg) {
  WorkerContext* worker = static_cast<WorkerContext*>(arg);
  worker->Run();
  delete worker;
}

petkaantonov commented 8 years ago

After Run() completes only the stuff belonging to the worker thread has been disposed. It is still pending owner disposal at that point.

ronkorving commented 8 years ago

Very cool stuff, but is this not going to be solved/replaced by Vats "some day"? I'm probably missing something, but hope this isn't overlooked.

petkaantonov commented 8 years ago

You seem to imply that all strawman proposals will eventually be implemented but that is not the case.

ronkorving commented 8 years ago

I just assume a lot, out of ignorance :)

kzc commented 8 years ago

@petkaantonov - I tested your "Fix use-after-free" patch on Linux with valgrind as per the instructions here. It appears to work correctly.

You may consider getting rid of the async WorkerContext reaper on the main thread and adopting something like this instead which I think is easier to understand and should put less of a burden on the main thread since it would no longer have to poll the WorkerContext queue:

void WorkerContext::RunWorkerThread(void* arg) {
  WorkerContext* worker = static_cast<WorkerContext*>(arg);
  worker->Run();
  ...wait on a libuv condition variable signalled by 
     owner thread at end of WorkerContext::Dispose()...
  delete worker;
}

Unfortunately the Mac OSX BADF/select problem mentioned in the last PR still exists. I think it's a libuv issue. There's also an unrelated linux issue outlined below.

Using the latest workers implementation as of 1e0b6b1fd5fc93986d056798f47804d0a15a9bec and this patch:

--- a/test/workers/test-crypto.js
+++ b/test/workers/test-crypto.js
@@ -33,3 +33,3 @@ var tests = [

-var parallelism = 4;
+var parallelism = 8;
 var testsPerThread = Math.ceil(tests.length / parallelism);

running this command repeatedly:

./iojs --experimental-workers test/workers/test-crypto.js

on a 4 core Linux VM it experiences this error roughly once per 50 runs:

/opt/iojs-workers-implementation/test/common.js:484
  throw e;
        ^
Error: Running test/parallel/test-crypto-stream.js inside worker failed:
AssertionError: false == true
    at Decipheriv.end (/opt/iojs-workers-implementation/test/parallel/test-crypto-stream.js:52:5)
    at Decipheriv.<anonymous> (/opt/iojs-workers-implementation/test/common.js:371:15)
    at emitOne (events.js:82:20)
    at Decipheriv.emit (events.js:169:7)
    at done (_stream_transform.js:178:19)
    at _stream_transform.js:119:9
    at Decipheriv.Cipher._flush (crypto.js:160:5)
    at Decipheriv.<anonymous> (_stream_transform.js:118:12)
    at Decipheriv.g (events.js:260:16)
    at emitNone (events.js:67:13)
    at Worker.<anonymous> (/opt/iojs-workers-implementation/test/common.js:477:14)
    at emitOne (events.js:77:13)
    at Worker.emit (events.js:169:7)
    at onerror (worker.js:61:18)
    at WorkerBinding.workerContext._onmessage (worker.js:75:16)

on a 4 core Mac it experiences these errors roughly once per 20 runs:

 /opt/iojs-workers-implementation/test/common.js:484
   throw e;
         ^
 Error: Running test/parallel/test-crypto-hmac.js inside worker failed:
 Error: EBADF: bad file descriptor, close
     at Error (native)
     at Object.fs.closeSync (fs.js:518:18)
     at Object.fs.readFileSync (fs.js:445:21)
     at Object.Module._extensions..js (module.js:447:20)
     at Module.load (module.js:355:32)
     at Function.Module._load (module.js:310:12)
     at Function.Module.runMain (module.js:471:10)
     at process._runMain (node.js:68:18)
     at Worker.<anonymous> (/opt/iojs-workers-implementation/test/common.js:477:14)
     at emitOne (events.js:77:13)
     at Worker.emit (events.js:169:7)
     at onerror (worker.js:61:18)
     at WorkerBinding.workerContext._onmessage (worker.js:75:16)
 (node) crypto.createCredentials is deprecated. Use tls.createSecureContext instead.
 <Buffer 0c 1e e9 6b 67 d3 29 f7 94 26 87 51 bb 05 53 3f>
 Assertion failed: (r == 1), function uv__stream_osx_interrupt_select, file ../deps/uv/src/unix/stream.c, line 127.
 Abort trap: 6

Ignore the deprecation lines - they are of no consequence to this issue.

evanlucas commented 8 years ago

With it being behind a flag, I'm guessing we are punting on the docs for now?

kzc commented 8 years ago

@petkaantonov This is more of a meta question - What's your sense of the number of (not lock protected) global/static variables in native modules and methods in the worker threads source tree? Something like that could account for the spurious failures in the worker thread tests. Just scanning the sources in src/*.cc I see a few mutable not lock protected globals. And any third party native module used on a worker thread would have to be thread safe as well. The code in this ecosystem was originally developed without regard to thread safety by design. Should a mechanism or policy be developed to say which node modules are thread safe and as such could be used on a worker thread?

dead-claudia commented 8 years ago

I made a thread on v8-users yesterday about this, in case you all are interested.

petkaantonov commented 8 years ago

You may consider getting rid of the async WorkerContext reaper on the main thread and adopting something like this instead which I think is easier to understand and should put less of a burden on the main thread since it would no longer have to poll the WorkerContext queue:
void WorkerContext::RunWorkerThread(void* arg) {
  WorkerContext* worker = static_cast<WorkerContext*>(arg);
  worker->Run();
  ...wait on a libuv condition variable signalled by 
     owner thread at end of WorkerContext::Dispose()...
  delete worker;
}

This doesn't help. Consider that after signaling the condition variable, the owner thread is suspended. Meanwhile the worker thread continues and deletes the WorkerContext as the condition was signaled. After this the owner thread is again scheduled and it continues at end of Dispose returns to back inside libuv where the use-after-free now happens because the WorkerContext has been deleted.

There should be nothing like a "burden on main thread" from the cleanup queue as It's virtually always empty and only checked in-between fully executed event loop "turns".

This is more of a meta question - What's your sense of the number of (not lock protected) global/static variables in native modules and methods in the worker threads source tree? Something like that could account for the spurious failures in the worker thread tests. Just scanning the sources in src/*.cc I see a few mutable not lock protected globals.

For this PR I have only checked all dependencies (./deps) and some critical global statics like those in node.cc

petkaantonov commented 8 years ago

When the crypto test fails the message is "Error: error:1006706B:elliptic curve routines:ec_GFp_simple_oct2point:point is not on curve"

kzc commented 8 years ago

After this the owner thread is again scheduled and it continues at end of Dispose returns to back inside libuv where the use-after-free now happens because the WorkerContext has been deleted

That's not the case. If the signal is the last action in Dispose on the owner thread as proposed, there are no more actions to perform on the owner thread - just a return. Just like the owner thread cannot do anything with the WorkerContext at this point in the current implementation after the WorkerThread instance is passed via node::QueueWorkerContextCleanup(this) to the reaper on the main thread.

There should be nothing like a "burden on main thread" from the cleanup queue as It's virtually always empty and only checked in-between fully executed event loop "turns".

The queue may be empty the majority of the time but the mutex is continually being locked/unlocked on the main thread after every uv_run(env->event_loop(), UV_RUN_ONCE) which can have adverse effects on the CPU cache lines and stall a CPU core. Polling is generally something that should be avoided. The signal proposal avoids this polling.

But it's not a showstopper. It's just an efficiency issue.

For this PR I have only checked all dependencies (./deps) and some critical global statics like those in node.cc

I wonder if package.json should have a field to mark whether a given module is thread-safe or not and only the modules deemed to be thread-safe could be used on a worker thread.

petkaantonov commented 8 years ago

That's not the case. If the signal is the last action in Dispose on the owner thread as proposed, there are no more actions to perform on the owner thread - just a return.

Dispose is the last action yes, but it is called from libuv. And after the control returns to libuv where libuv will still use the worker context object. That's why direct delete this inside Dispose won't work. Signaling inside Dispose for some other thread to do delete this is exactly the same as doing delete this directly and won't work.

The queue may be empty the majority of the time but the mutex is continually being locked/unlocked on the main thread after every uv_run(env->event_loop(), UV_RUN_ONCE) which can have adverse effects on the CPU cache lines and stall a CPU core. Polling is generally something that should be avoided. The signal proposal avoids this polling.

The mutex is also virtually always uncontested.

piranna commented 8 years ago

I wonder if package.json should have a field to mark whether a given module is thread-safe or not and only the modules deemed to be thread-safe could be used on a worker thread.

This could only be an issue for compiled modules, not for pure Javascript ones. I think this is something somewhat more related to node-gyp...

petkaantonov commented 8 years ago

Third party native modules cannot be loaded inside worker. Later a NODE_THREAD_SAFE_MODULE macro could be introduced.

petkaantonov commented 8 years ago

@kzc the crypto issue appears to be fixed by https://github.com/petkaantonov/io.js/commit/0d10ae6de9326c5caa34fd20dd013b381737ec9a

The issue seemed to be that ECDH routines did not clear errors on return which caused stale error to pop up in the wrong place. I speculate this only appears with workers because the normal parallel runner always creates a fresh process to run each test in (so openssl error stack is always clean for each test).

Normally when I ran this asd=0; while ./out/Release/iojs --experimental-workers ./test/workers/test-crypto.js > /dev/null 2>&1; do let "asd++"; echo $asd; done; with parallelism=8 the loop stops almost always at < 50 iterations and highest was 84. With the fix I got to 200 iterations so I am assuming it works based on that.

kzc commented 8 years ago

Signaling inside Dispose for some other thread to do delete this is exactly the same as doing delete this directly and won't work

Getting the worker thread to delete the WorkerContext after its uv_run loop is done and after waiting for the signal from the owner thread guarantees that there will not be an issue. In fact, your present implementation is making the exact same assumption once you call QueueWorkerContextCleanup() - that another thread will delete it - immediately or some time in the future.

The original delete this crash happened because the WorkerContext class has a NotificationChannel data member that has a uv_async_t that initiated the callback and it was trying to delete the rug out from under itself. That is not the case with the proposal. The worker thread run loop would be over at that point and that async struct is not being used any longer by any thread and can safely be deleted.

Since we both think we're right there's no point debating this. Later when the workers branch is merged to master I'll put together a patch to demonstrate it.

The mutex is also virtually always uncontested

Even so, polling is not desirable, nor necessary here.

Third party native modules cannot be loaded inside worker.

What about indirectly via require?

kzc commented 8 years ago

The issue seemed to be that ECDH routines did not clear errors on return which caused stale error to pop up in the wrong place

@petkaantonov - well done.

I guess the worker threads implementation will have a number of similar issues and single threaded assumptions to sort out once it's merged into master. Will probably have to keep the --experimental-workers flag in place for a year or so to shake these things out.

Just the spurious Mac OSX BADF and select assert failures to deal with now.

petkaantonov commented 8 years ago

The original delete this crash happened because the WorkerContext class has a NotificationChannel data member that has a uv_async_t that initiated the callback and it was trying to delete the rug out from under itself. That is not the case with the proposal. The worker thread run loop would be over at that point and that async struct is not being used any longer by any thread and can safely be deleted.

There is a wrong assumption there - the uv_async_t for owner notifications is on the owner's event loop, not on the worker's. Especially in the case where owner thread is the main thread, the loop is never over.

petkaantonov commented 8 years ago

What about indirectly via require?

One ultimately needs to call process.dlopen to load 3rd party native modules but it's not defined inside workers.

Just the spurious Mac OSX BADF and select assert failures to deal with now.

At least the assertion failure seems to be a libuv bug

kzc commented 8 years ago

There is a wrong assumption there - the uv_async_t for owner notifications is on the owner's event loop, not on the worker's. Especially in the case where owner thread is the main thread, the loop is never over.

Okay, I didn't realize that. I see what you mean now.

kzc commented 8 years ago

With parallelism = 8 here's the last two valgrind errors on Linux in our favorite test: test/workers/test-crypto.js. These simple mismatched free/delete errors may not be specific to the workers work but would be nice to see this test run cleanly.

valgrind --freelist-vol=250000000 --malloc-fill=0xda --free-fill=0xde \
    ./iojs --experimental-workers test/workers/test-crypto.js

(1)

==13566== Thread 8:
==13566== Mismatched free() / delete / delete []
==13566==    at 0x4C2BDEC: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13566==    by 0xA8FD9F: node::smalloc::CallbackInfo::DisposeNoAllocation(v8::Isolate*) (smalloc.cc:97)
==13566==    by 0xA8D8C1: node::PersistentHandleCleanup::VisitPersistentHandle(v8::Persistent<v8::Value, v8::NonCopyablePersistentTraits<v8::Value> >*, unsigned short) (persistent-handle-cleanup.cc:26)
==13566==    by 0x7D94BA: v8::VisitorAdapter::VisitEmbedderReference(v8::internal::Object**, unsigned short) (api.cc:6900)
==13566==    by 0x8A8A1B: v8::internal::GlobalHandles::IterateAllRootsWithClassIds(v8::internal::ObjectVisitor*) (global-handles.cc:910)
==13566==    by 0x7E2E1F: v8::Isolate::VisitHandlesWithClassIds(v8::PersistentHandleVisitor*) (api.cc:6912)
==13566==    by 0xA69E38: node::Environment::~Environment() (env-inl.h:217)
==13566==    by 0xA9D1DA: node::WorkerContext::DisposeWorker(node::WorkerContext::TerminationKind) (env-inl.h:260)
==13566==    by 0xA9D2C4: node::WorkerContext::LoopEnded() (worker.cc:326)
==13566==    by 0xA9D574: node::WorkerContext::Run() (worker.cc:654)
==13566==    by 0xACBC67: uv__thread_start (thread.c:49)
==13566==    by 0x5A6B181: start_thread (pthread_create.c:312)
==13566==  Address 0x10671360 is 0 bytes inside a block of size 16 alloc'd
==13566==    at 0x4C2B800: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13566==    by 0xA9FD24: node::crypto::CipherBase::GetAuthTag(char**, unsigned int*) const (node_crypto.cc:2879)
==13566==    by 0xAA81A2: node::crypto::CipherBase::GetAuthTag(v8::FunctionCallbackInfo<v8::Value> const&) (node_crypto.cc:2892)
==13566==    by 0x7EEB4E: v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) (arguments.cc:33)
==13566==    by 0x80416B: v8::internal::MaybeHandle<v8::internal::Object> v8::internal::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::(anonymous namespace)::BuiltinArguments<(v8::internal::BuiltinExtraArguments)1>&) (builtins.cc:1077)
==13566==    by 0x8043B4: v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) (builtins.cc:1100)

(2)

==13566== Thread 18:
==13566== Mismatched free() / delete / delete []
==13566==    at 0x4C2C83C: operator delete[](void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13566==    by 0xAAAB4F: node::crypto::Certificate::ExportChallenge(v8::FunctionCallbackInfo<v8::Value> const&) (node_crypto.cc:5187)
==13566==    by 0x7EEB4E: v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) (arguments.cc:33)
==13566==    by 0x80416B: v8::internal::MaybeHandle<v8::internal::Object> v8::internal::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::(anonymous namespace)::BuiltinArguments<(v8::internal::BuiltinExtraArguments)1>&) (builtins.cc:1077)
==13566==    by 0x8043B4: v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) (builtins.cc:1100)
==13566==  Address 0x15aefe70 is 0 bytes inside a block of size 37 alloc'd
==13566==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13566==    by 0x741EAF: CRYPTO_malloc (mem.c:342)
==13566==    by 0x6E27AF: ASN1_STRING_set (asn1_lib.c:376)
==13566==    by 0x6E9D60: asn1_ex_c2i (tasn_dec.c:960)
==13566==    by 0x6EA150: asn1_d2i_ex_primitive (tasn_dec.c:831)
==13566==    by 0x6EA331: ASN1_item_ex_d2i (tasn_dec.c:237)
==13566==    by 0x6EABAC: asn1_template_noexp_d2i (tasn_dec.c:691)
==13566==    by 0x6EAD45: asn1_template_ex_d2i (tasn_dec.c:579)
==13566==    by 0x6EA784: ASN1_item_ex_d2i (tasn_dec.c:443)
==13566==    by 0x6EABAC: asn1_template_noexp_d2i (tasn_dec.c:691)
==13566==    by 0x6EAD45: asn1_template_ex_d2i (tasn_dec.c:579)
==13566==    by 0x6EA784: ASN1_item_ex_d2i (tasn_dec.c:443)

petkaantonov commented 8 years ago

There's also a well over thousand memory leaks if you run with valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes ./out/Release/iojs --experimental-workers test/workers/test-crypto.js

kzc commented 8 years ago

Workers aside, no doubt Node itself could benefit from extensive valgrind testing. Of the thousand leaks only the "definitely leaked" ones are interesting, and even then most tend to be secondary leaks stemming from a few true leaks.

pashoo2 commented 8 years ago

Does it is support the structured clone algorithm instead of JSON parsing, when passing an objects between processes?

pashoo2 commented 8 years ago

Or it is still not supported, as you have said before

Globegitter commented 8 years ago

This is exciting, looking forward to play around with this behind an experimental flag.

piranna commented 8 years ago

This is exciting, looking forward to play around with this behind an experimental flag.

Pseudo-offtopic: is there any way to enable by default at compile time some experimental flags? I would like to test this on ]NodeOS](https://github.com/NodeOS/NodeOS), but I don't have there any way to pass flags to the node.js command... :-/

heavyk commented 8 years ago

@piranna just change this to true https://github.com/nodejs/io.js/pull/2133/files#diff-cd53544f44aab2c697bcd7b6a57f23ccR142

piranna commented 8 years ago

@piranna just change this to true https://github.com/nodejs/io.js/pull/2133/files#diff-cd53544f44aab2c697bcd7b6a57f23ccR142

Oh cool, thank you! :-D Maybe this can be set as compile time options too? If not, would be easy to add an #ifdef :-)

Morgul commented 8 years ago

Is there any documentation on this new API? I couldn't find any in the PR.

I'm highly interested in the ability to use workers, and was curious about getting a feel for how this will work (other than attempting to read the unit tests.)

Fishrock123 commented 8 years ago

Is there any documentation on this new API? I couldn't find any in the PR.

It would initially land behind a flag, so there wouldn't be until it becomes fully public api.

mikeal commented 8 years ago

It would initially land behind a flag, so there wouldn't be until it becomes fully public api.

We should still document it, just not in the main API docs. We have a whole new docs project for this kinda stuff.

alubbe commented 8 years ago

Since 4.0 is just around the corner, is there anything besides docs that's keeping us from merging this to give this some production usage? Am 05.09.2015 2:50 nachm. schrieb "Mikeal Rogers" <notifications@github.com

:

It would initially land behind a flag, so there wouldn't be until it becomes fully public api.

We should still document it, just not in the main API docs. We have a whole new docs project for this kinda stuff.

— Reply to this email directly or view it on GitHub https://github.com/nodejs/node/pull/2133#issuecomment-137961984.

Fishrock123 commented 8 years ago

@alubbe due to how this touches large parts of the code base dealing with handling V8 Isolates, we're being very cautious about this.

alubbe commented 8 years ago

Right, my question is more geared towards how we can help - what's left to tackle? I've lost track of the status.

dead-claudia commented 8 years ago

@petkaantonov Status?

petkaantonov commented 8 years ago

@Fishrock123 What do you mean? This code has minimal effect unless you enable the flag.

@impinball Waiting for node to use the libuv where the windows stdio closing bug is fixed, this is in the OP

petkaantonov commented 8 years ago

Rebased and noticed the libuv windows fix has been integrated so this is good to go.

nodejs / node

workers: initial implementation #2133