nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
104.76k stars 28.3k forks source link

Async Hooks - Initial Implementation #12892

Closed AndreasMadsen closed 7 years ago

AndreasMadsen commented 7 years ago

Rebase of #11883 - async_hooks initial implementation. original text below.

I think we would all like to get async_hooks into node-v8. As I see it the roadmap is:

  1. merge this PR
  2. add documentation for the createHook API (not the embedder API) (https://github.com/thlorenz/node/tree/trevnorris-async-wrap-eps-impl+docs)
  3. add promise support – @matthewloring have already implemented this. (https://github.com/matthewloring/node/commit/dd178336a60c2698619b96a62267bd528c09ae0d)
  4. add C++ Embedder API. – Somebody else should do this, I'm a slow at v8/C++. At this point node-v8 could be released without introducing breaking changes.
  5. update NAN to use the C++ Embedder API.
  6. add/document JavaScript Embedder API.

edited by @addaleax: added links

It is possible that 3. and 4. could be delayed to after the node-v8 release. In particular, I'm just not sure if the new MakeCallback function would be ABI breaking. See https://github.com/nodejs/node-eps/blob/master/006-asynchooks-api.md#native-api


Checklist
Affected core subsystem(s)

async_wrap, async_hooks and a little bit of everything else

This PR supersedes my other PR for async_hooks support. The implementation has been cleaned up and @thlorenz has helped create much better tests (though they may need to be changed from their current location) and which accounts for ~3300 lines of the change. Documentation does exist, but needs to be cleaned up some before it's included. The code is ready for initial review.

For anyone reviewing, I carefully went through and divided all the changes into logical commits that should all pass. So while reviewing feel free to use each commit individually. Though GitHub fails at displaying them in the correct order.

The one aspect of this PR that I don't like but have gone through hell trying to get around to no avail is the usage of initTriggerId(). It's basically a way to propagate the "triggerId" to a constructor. The code comments explain some of those difficulties, but I hope someone can come up with something better.

@nodejs/ctc

Refs: https://github.com/nodejs/node/pull/8531

CI: https://ci.nodejs.org/job/node-test-commit/8503/

AndreasMadsen commented 7 years ago

It compiles, lets see how many test failures there are :)

CI: https://ci.nodejs.org/job/node-test-pull-request/7941/

AndreasMadsen commented 7 years ago

I will post some updates on the fixes I had to make, which were not caught by git rebase:

edit: a "not" was missing in the text, the above fixes were not caught by git rebase, there were also a lot git rebase did catch but they were mostly trivial.

AndreasMadsen commented 7 years ago

The linter and test runner now passes on my Mac. Trying CI again: https://ci.nodejs.org/job/node-test-pull-request/7942/

AndreasMadsen commented 7 years ago

Some CI failures, mostly I think the tests are just too strict. I will try and look into it, but I don't have easy access to any of these OSs.


aix

No test results ?


arm

async-hooks/test-timerwrap.setTimeout.js

assert.js:92
  throw new AssertionError({
  ^

AssertionError [ERR_ASSERTION]: Checking invocations at stage "t1: when third timer fired":
    Never called "destroy", but expected 1 invocation(s).
    at checkHook (/home/iojs/build/workspace/node-test-commit-arm/nodes/armv7-wheezy/test/async-hooks/hook-checks.js:45:7)
    at Array.forEach (native)
    at checkInvocations (/home/iojs/build/workspace/node-test-commit-arm/nodes/armv7-wheezy/test/async-hooks/hook-checks.js:28:44)
    at Timeout.onthirdTimeout [as _onTimeout] (/home/iojs/build/workspace/node-test-commit-arm/nodes/armv7-wheezy/test/async-hooks/test-timerwrap.setTimeout.js:61:3)
    at ontimeout (timers.js:455:14)
    at tryOnTimeout (timers.js:306:5)
    at Timer.listOnTimeout (timers.js:266:5)


Linux: centos5-32 and centos5-64

test/async-hooks/test-pipeconnectwrap.js

assert.js:92
  throw new AssertionError({
  ^

AssertionError [ERR_ASSERTION]: 3 pipe wraps created when client connected
    at Socket.onconnect (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos5-32/test/async-hooks/test-pipeconnectwrap.js:63:10)
    at Socket.<anonymous> (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos5-32/test/common/index.js:504:15)
    at Object.onceWrapper (events.js:312:19)
    at emitNone (events.js:105:13)
    at Socket.emit (events.js:207:7)
    at PipeConnectWrap.afterConnect [as oncomplete] (net.js:1131:10)

OS X

No test results, I suspect something is wrong with the CI.


Windows

parallel/test-async-wrap-getasyncid.js

(node:4008) [DEP0064] DeprecationWarning: tls.createSecurePair() is deprecated. Please use tls.Socket instead.
c:\workspace\node-test-binary-windows\RUN_SUBSET\3\VS_VERSION\vcbt2015\label\win10\test\parallel\test-async-wrap-getasyncid.js:176
      throw new Error(`write failed: ${process.binding('uv').errname(err)}`);
      ^

Error: write failed: EPIPE
    at TCPConnectWrap.req.oncomplete.common.mustCall (c:\workspace\node-test-binary-windows\RUN_SUBSET\3\VS_VERSION\vcbt2015\label\win10\test\parallel\test-async-wrap-getasyncid.js:176:13)
    at TCPConnectWrap.oncomplete (c:\workspace\node-test-binary-windows\RUN_SUBSET\3\VS_VERSION\vcbt2015\label\win10\test\common\index.js:504:15)

parallel\test-ttywrap-invalid-fd.js

assert.js:557
    throw actual;
    ^

Error: EBADF: bad file descriptor, uv_tty_init
    at new WriteStream (tty.js:73:13)
    at assert.throws (c:\workspace\node-test-binary-windows\RUN_SUBSET\3\VS_VERSION\vcbt2015\label\win10\test\parallel\test-ttywrap-invalid-fd.js:19:3)
    at _tryBlock (assert.js:514:5)
    at _throws (assert.js:535:12)
    at Function.throws (assert.js:565:3)
    at Object.<anonymous> (c:\workspace\node-test-binary-windows\RUN_SUBSET\3\VS_VERSION\vcbt2015\label\win10\test\parallel\test-ttywrap-invalid-fd.js:13:8)
    at Module._compile (module.js:569:30)
    at Object.Module._extensions..js (module.js:580:10)
    at Module.load (module.js:503:32)
    at tryModuleLoad (module.js:466:12)
refack commented 7 years ago

Some CI failures, mostly I think the tests are just too strict. I will try and look into it, but I don't have easy access to any of these OSs.

@AndreasMadsen I'll take a look at the Windows fails.

addaleax commented 7 years ago

Looks like there are AIX results now:

async-hooks/test-pipewrap

assert.js:92
  throw new AssertionError({
  ^

AssertionError [ERR_ASSERTION]: Checking invocations at stage "pipe wrap when sleep.spawn was called":
     Called "before" 3 time(s), but expected 2 invocation(s).
    at checkHook (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/async-hooks/hook-checks.js:51:14)
    at Array.forEach (native)
    at checkInvocations (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/async-hooks/hook-checks.js:28:44)
    at forEach (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/async-hooks/test-pipewrap.js:76:5)
    at Array.forEach (native)
    at process.onexit (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/async-hooks/test-pipewrap.js:73:27)
    at emitOne (events.js:120:20)
    at process.emit (events.js:210:7)
AndreasMadsen commented 7 years ago

@refack Awesome! I'm pretty sure the EBADF: bad file descriptor, uv_tty_init is simply because the exact error message depends on the OS, so the test regex just need to be relaxed a bit.

Fishrock123 commented 7 years ago

@AndreasMadsen Thanks for helping to push this forward.

Either PR still doesn't handle JS timers properly... I will try to get a patch out today. 🤞

AndreasMadsen commented 7 years ago

@Fishrock123 Thanks, I was just about to ask if you would look at the timers. If you could look at the ARM timer failure too that would be awesome. The timer stuff is beyond me.


I've looked at the CertOS 5 failure, I tested node on Scientific Linux v6 (I don't have CertOS 5) and it works fine, so I suspect it is highly OS specific. My best guess is that there is a race condition, where the client onconnect is called before the server onconnection because of some OS quirks. So I'm trying a fix with that assumption. CI: https://ci.nodejs.org/job/node-test-pull-request/7943/

PS: Is there a way to run the pull-request test runner on a small subset of the machines, in this case the CertOS 5 machines?

edit:

I think it "fixed" the error, but now there is a new further in the test script:

assert.js:92
  throw new AssertionError({
  ^

AssertionError [ERR_ASSERTION]: Checking invocations at stage "pipe1, client connected":
    Never called "after", but expected 1 invocation(s).
    at checkHook (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos5-32/test/async-hooks/hook-checks.js:45:7)
    at Array.forEach (native)
    at checkInvocations (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos5-32/test/async-hooks/hook-checks.js:28:44)
    at maybeOnconnect (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos5-32/test/async-hooks/test-pipeconnectwrap.js:80:3)
    at Server.<anonymous> (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos5-32/test/async-hooks/test-pipeconnectwrap.js:21:3)
    at Server.<anonymous> (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos5-32/test/common/index.js:504:15)
    at emitOne (events.js:115:13)
    at Server.emit (events.js:210:7)
    at Pipe.onconnection (net.js:1545:8)

new CI with a possible fix: https://ci.nodejs.org/job/node-test-pull-request/7944/

AndreasMadsen commented 7 years ago

@Fishrock123 could you push the updated dprof to https://github.com/AndreasMadsen/dprof/pull/19 – that might be useful for debugging the "Checking invocations" errors.

Fishrock123 commented 7 years ago

@AndreasMadsen I'm not sure you can use just one machine but you can start a linux-only run: https://ci.nodejs.org/job/node-test-commit-linux/build?delay=0sec

AndreasMadsen commented 7 years ago

Hurray 🎉 I guessed correct.

There are some new unrelated testfailures on linux, so some of the test are a bit unstable.


linux: fedora22

async-hooks/test-querywrap

assert.js:92
  throw new AssertionError({
  ^

AssertionError [ERR_ASSERTION]: Checking invocations at stage "when process exits":
    Never called "destroy", but expected 1 invocation(s).
    at checkHook (/home/iojs/build/workspace/node-test-commit-linux/nodes/fedora22/test/async-hooks/hook-checks.js:45:7)
    at Array.forEach (native)
    at checkInvocations (/home/iojs/build/workspace/node-test-commit-linux/nodes/fedora22/test/async-hooks/hook-checks.js:28:44)
    at process.onexit (/home/iojs/build/workspace/node-test-commit-linux/nodes/fedora22/test/async-hooks/test-querywrap.js:38:3)
    at emitOne (events.js:120:20)
    at process.emit (events.js:210:7)

edit: I'm not sure about this one, the destroy hook not emitting on exit has to be due to the GC not properly completing. The test appear to be unstable because of the delay-hack at: https://github.com/AndreasMadsen/node/blob/async-wrap2/test/async-hooks/test-querywrap.js#L24

@addaleax do you have a suggestion on how to fix this, should we just delay the test more.


linux: ubuntu1604

async-hooks/test-timerwrap.setTimeout

assert.js:92
  throw new AssertionError({
  ^

AssertionError [ERR_ASSERTION]: Checking invocations at stage "t1: when third timer fired":
    Never called "destroy", but expected 1 invocation(s).
    at checkHook (/home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1604-32/test/async-hooks/hook-checks.js:45:7)
    at Array.forEach (native)
    at checkInvocations (/home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1604-32/test/async-hooks/hook-checks.js:28:44)
    at Timeout.onthirdTimeout [as _onTimeout] (/home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1604-32/test/async-hooks/test-timerwrap.setTimeout.js:61:3)
    at ontimeout (timers.js:455:14)
    at tryOnTimeout (timers.js:306:5)
    at Timer.listOnTimeout (timers.js:266:5)

@Fishrock123 this suggests the timer error isn't specific to ARM linux OS's but is just unstable.

refack commented 7 years ago

I have good new and bad news: Good news: test-ttywrap-invalid-fd.js is indeed a platform dependent error message

Bad new, with inspector attached it goes boom:

Debugger listening on port 57009.
Debugger attached.
C:\bin\dev\node\node.exe: src\tty_wrap.cc:130: Assertion `(fd) >= (0)' failed.
AndreasMadsen commented 7 years ago

@refack ./node --inspect-brk test/parallel/test-ttywrap-invalid-fd.js works on mac, I hope you find a solution.

AndreasMadsen commented 7 years ago

Trying to fix the AIX problem. We can also check the Windows fix from @refack.

CI: https://ci.nodejs.org/job/node-test-pull-request/7946/

refack commented 7 years ago

@only solved the easy one (parallel\test-ttywrap-invalid-fd.js)

AndreasMadsen commented 7 years ago

Rerunning CI, as I made a silly error: https://ci.nodejs.org/job/node-test-pull-request/7947/

AndreasMadsen commented 7 years ago

@refack looks like you forgot to include , uv_tty_init in the non-windows error regex. I have fixed it and updated your commit (force push).

refack commented 7 years ago

Sorry, tried to dual test it...

AndreasMadsen commented 7 years ago

@refack now we have both made a silly error :)

refack commented 7 years ago

🙏 praise the CI

AndreasMadsen commented 7 years ago

Rerunning CI: https://ci.nodejs.org/job/node-test-pull-request/7948/

I think I managed to stop the previous two.

AndreasMadsen commented 7 years ago

Definitely progress on the AIX failure. The AIX passed, but it introduced a failure on ubuntu1604_docker_alpine34-64. I think it is just a bit unstable because the amount of IO events can vary. I'm trying a little more flexible assert.

assert.js:92
  throw new AssertionError({
  ^

AssertionError [ERR_ASSERTION]: Checking invocations at stage "pipe wrap when sleep.spawn was called":
     Called "before" 4 time(s), but expected 3 invocation(s).
    at checkHook (/home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1604_docker_alpine34-64/test/async-hooks/hook-checks.js:51:14)
    at Array.forEach (native)
    at checkInvocations (/home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1604_docker_alpine34-64/test/async-hooks/hook-checks.js:28:44)
    at process.onexit (/home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1604_docker_alpine34-64/test/async-hooks/test-pipewrap.js:80:3)
    at emitOne (events.js:120:20)
    at process.emit (events.js:210:7)

CI: https://ci.nodejs.org/job/node-test-pull-request/7949/

AndreasMadsen commented 7 years ago

Hurray, 🎉 . The AIX failure is now fixed.

This leaves:

@Fishrock123 and @refack: please tell us if plans change, or if I misunderstood something.

AndreasMadsen commented 7 years ago

@nodejs/platform-arm will it be possible to add IPv6 support the pi{1,2}-raspbian-wheezy machines? Otherwise we will have to create specialized tests for when they are used, which I would really like to avoid.

Some background: In #4600 it caused a lot of issues, see: https://github.com/nodejs/node/pull/4600#issuecomment-184355370. After a lot of debugging we found that it was because IPv6 support was missing see: https://github.com/nodejs/node/issues/5327

addaleax commented 7 years ago

I think it’s a good thing we test Node on some single-stack machines. @nodejs/testing might have some tricks available, but otherwise skipping the affected tests or modifying them to run differently on IPv4-only machines sounds fine to me.

refack commented 7 years ago

@refack: please tell us if plans change, or if I misunderstood something.

I'm digging... (I need a good debugger that works with ninja builds, since Visual Studio is sssllllllooooowwwww)

Fishrock123 commented 7 years ago

@AndreasMadsen I believe I have fixed https://github.com/AndreasMadsen/dprof/pull/19.

AndreasMadsen commented 7 years ago

I think it’s a good thing we test Node on some single-stack machines.

@addaleax I'm not sure I understand. Will enabling IPv6 make this impossible? If so, I think we should skip them, #4600 took 12 days extra because of this. Possibly it could be done faster now that we understand the cause, but there are also many more test failures. Alternatively, would you be able to do this?

edit: if we skip them we will still test on pi3-raspbian-wheezy.

I'm digging... (I need a good debugger that works with ninja builds, since Visual Studio is sssllllllooooowwwww)

@refack Hmm, that does sound annoying. I'm not sure what we can do. I assume you have given console.log() and cout a fair try.

@AndreasMadsen I believe I have fixed AndreasMadsen/dprof#19.

@Fishrock123 Cool.

AndreasMadsen commented 7 years ago

@refack I gave the windows write failed: EPIPE error a quick look. It is a tcp connection to 0.0.0.0:${common.PORT} that writes hi and the shuts down. It then fails on the write. From what I can see there is no server listening on 0.0.0.0:${common.PORT}.

I suspect that it is possible to queue up the write on nix-platforms, but not on windows thus we get an EPIPE error. (Actually I'm surprised this works on nix in the first place).

Try and create a simple socket.end() tcp-server and see if that works.

refack commented 7 years ago

From what I can see there is no server listening on 0.0.0.0:${common.PORT}.

Yeah I've been checking around that (felt like I was doing something wrong since "it works on UNIX" 🤦‍♂️ )

AndreasMadsen commented 7 years ago

@refack regarding the src\tty_wrap.cc:130: Assertion (fd) >= (0)' failed. I think you should try a clean rebuild. There don't appear to be anything on line 130. If this does not work then try printing the fd before all CHECK_GE(fd, 0); to see which one is failing and work backward from there.

refack commented 7 years ago

@refack regarding the src\tty_wrap.cc:130: Assertion(fd) >= (0)' failed.I think you should try a clean rebuild. There don't appear to be anything on line 130. If this does not work then try printing thefdbefore allCHECK_GE(fd, 0);` to see which one is failing and work backward from there.

Ack. a clean build fixed it.

refack commented 7 years ago

@AndreasMadsen does adding nextTick to the zlib block like this:

{
  const Zlib = process.binding('zlib').Zlib;
  const constants = process.binding('constants').zlib;
  const zlib = new Zlib(constants.GZIP);
  process.nextTick(() => testInitialized(zlib, 'Zlib'));
}

makes sense?

otherwise I get: D:\code\node\out\Release\node.exe: c:\code\node\src\node_zlib.cc:107: Assertion `init_done_ && "close before init"' failed.

Or is that a bug?

addaleax commented 7 years ago

Will enabling IPv6 make this impossible?

I think so? I’m not an expert but it kind of reminds me of bugs like https://github.com/nodejs/node/issues/5588 where we just didn’t test certain network configurations (because they were hard to test).

Or is that a bug?

@refack Yes, looks like it – can you give a stack trace for that crash?

refack commented 7 years ago

@refack Yes, looks like it – can you give a stack trace for that crash?

No stack, just assertion fail :(

addaleax commented 7 years ago

@refack You should still be able to get a stack trace for it with a debugger?

refack commented 7 years ago

@addaleax Yeah 😢 see https://github.com/nodejs/node/pull/12892#issuecomment-300000889... In a few hours...

refack commented 7 years ago
    node.exe!node::`anonymous namespace'::ZCtx::Close() Line 107    C++
    node.exe!node::`anonymous namespace'::ZCtx::~ZCtx() Line 98 C++
    node.exe!node::`anonymous namespace'::ZCtx::`scalar deleting destructor'(unsigned int)  C++
    node.exe!node::BaseObject::WeakCallback<node::`anonymous namespace'::ZCtx>(const v8::WeakCallbackInfo<node::`anonymous namespace'::ZCtx> & data) Line 72    C++
    node.exe!v8::internal::GlobalHandles::PendingPhantomCallback::Invoke(v8::internal::Isolate * isolate) Line 1058 C++
    node.exe!v8::internal::GlobalHandles::DispatchPendingPhantomCallbacks(bool synchronous_second_pass) Line 1023   C++
    node.exe!v8::internal::GlobalHandles::PostGarbageCollectionProcessing(v8::internal::GarbageCollector collector, const v8::GCCallbackFlags gc_callback_flags) Line 1078  C++
    node.exe!v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector collector, const v8::GCCallbackFlags gc_callback_flags) Line 1375  C++
    node.exe!v8::internal::Heap::CollectGarbage(v8::internal::GarbageCollector collector, v8::internal::GarbageCollectionReason gc_reason, const char * collector_reason, const v8::GCCallbackFlags gc_callback_flags) Line 1011    C++
    node.exe!v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace space, v8::internal::GarbageCollectionReason gc_reason, const v8::GCCallbackFlags callbackFlags) Line 686 C++
    node.exe!v8::internal::Factory::NewFixedArray(int size, v8::internal::PretenureFlag pretenure) Line 148 C++
    node.exe!v8::internal::String::CalculateLineEnds(v8::internal::Handle<v8::internal::String> src, bool include_ending_line) Line 10923   C++
    node.exe!v8::internal::Script::InitLineEnds(v8::internal::Handle<v8::internal::Script> script) Line 12972   C++
    node.exe!v8::internal::Script::GetPositionInfo(v8::internal::Handle<v8::internal::Script> script, int position, v8::internal::Script::PositionInfo * info, v8::internal::Script::OffsetFlag offset_flag) Line 12983 C++
    node.exe!v8::internal::Script::GetLineNumber(v8::internal::Handle<v8::internal::Script> script, int code_pos) Line 13112    C++
    node.exe!v8::internal::JSStackFrame::GetLineNumber() Line 443   C++
    node.exe!v8::internal::`anonymous namespace'::AppendFileLocation(v8::internal::Isolate * isolate, v8::internal::StackFrameBase * call_site, v8::internal::IncrementalStringBuilder * builder) Line 502  C++
    node.exe!v8::internal::JSStackFrame::ToString() Line 623    C++
    node.exe!v8::internal::ErrorUtils::FormatStackTrace(v8::internal::Isolate * isolate, v8::internal::Handle<v8::internal::JSObject> error, v8::internal::Handle<v8::internal::Object> raw_stack) Line 972 C++
    node.exe!v8::internal::Accessors::ErrorStackGetter(v8::Local<v8::Name> key, const v8::PropertyCallbackInfo<v8::Value> & info) Line 1190 C++

It is interesting, I'll try to find out what triggers the ErrorStackGetter

refack commented 7 years ago

Commented out all errors, still: c:\code\node\out\Release\node.exe: c:\code\node\src\node_zlib.cc:107: Assertion `init_done_ && "close before init"' failed.

>   node.exe!node::`anonymous namespace'::ZCtx::Close() Line 107    C++
    node.exe!node::`anonymous namespace'::ZCtx::~ZCtx() Line 98 C++
    node.exe!node::`anonymous namespace'::ZCtx::`scalar deleting destructor'(unsigned int)  C++
    node.exe!node::BaseObject::WeakCallback<node::`anonymous namespace'::ZCtx>(const v8::WeakCallbackInfo<node::`anonymous namespace'::ZCtx> & data) Line 72    C++
    node.exe!v8::internal::GlobalHandles::PendingPhantomCallback::Invoke(v8::internal::Isolate * isolate) Line 1058 C++
    node.exe!v8::internal::GlobalHandles::DispatchPendingPhantomCallbacks(bool synchronous_second_pass) Line 1023   C++
    node.exe!v8::internal::GlobalHandles::PostGarbageCollectionProcessing(v8::internal::GarbageCollector collector, const v8::GCCallbackFlags gc_callback_flags) Line 1078  C++
    node.exe!v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector collector, const v8::GCCallbackFlags gc_callback_flags) Line 1375  C++
    node.exe!v8::internal::Heap::CollectGarbage(v8::internal::GarbageCollector collector, v8::internal::GarbageCollectionReason gc_reason, const char * collector_reason, const v8::GCCallbackFlags gc_callback_flags) Line 1011    C++
    node.exe!v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace space, v8::internal::GarbageCollectionReason gc_reason, const v8::GCCallbackFlags callbackFlags) Line 686 C++
    node.exe!v8::internal::Factory::NewFillerObject(int size, bool double_align, v8::internal::AllocationSpace space) Line 85   C++
    node.exe!v8::internal::__RT_impl_Runtime_AllocateInNewSpace(v8::internal::Arguments args, v8::internal::Isolate * isolate) Line 295 C++
    node.exe!v8::internal::Runtime_AllocateInNewSpace(int args_length, v8::internal::Object * * args_object, v8::internal::Isolate * isolate) Line 288  C++
    0000019121404208()  Unknown
    00007ff600000001()  Unknown
    000000a67f14db60()  Unknown
    00000294d7dbcdb0()  Unknown
    cccccccccccccccc()  Unknown
    000000a67f14db50()  Unknown
    node.exe!v8::base::LazyInstanceImpl<int,v8::base::StaticallyAllocatedInstanceTrait<int>,v8::internal::`anonymous namespace'::PerThreadAssertKeyConstructTrait,v8::base::ThreadSafeInitOnceTrait,v8::base::LeakyInstanceTrait<int> >::Init() Line 184    C++
AndreasMadsen commented 7 years ago

I think so? I’m not an expert but it kind of reminds me of bugs like #5588 where we just didn’t test certain network configurations (because they were hard to test).

@addaleax There is something I'm misunderstanding in your original text then. Could you elaborate on what the term single-stack machines means? I suspect that I was I'm misunderstanding.

addaleax commented 7 years ago

@refack Okay, thanks, that was really useful! One can make this reproduce by adding gc(); after the zlib block, then running the test with --expose-gc.

That makes sense, because the zlib bindings assume that .init() is always called on the handle, and for normal operations that’s true.

This is a lazy (but imho perfectly acceptable) fix:

diff --git a/test/parallel/test-async-wrap-getasyncid.js b/test/parallel/test-async-wrap-getasyncid.js
index 5ab6240ed25d..b84e31c290ec 100644
--- a/test/parallel/test-async-wrap-getasyncid.js
+++ b/test/parallel/test-async-wrap-getasyncid.js
@@ -113,9 +113,8 @@ if (common.hasCrypto) {

 {
-  const Zlib = process.binding('zlib').Zlib;
-  const constants = process.binding('constants').zlib;
-  testInitialized(new Zlib(constants.GZIP), 'Zlib');
+  const Gzip = require('zlib').Gzip;
+  testInitialized(new Gzip()._handle, 'Zlib');
 }
addaleax commented 7 years ago

@AndreasMadsen I was talking about machines that only have IPv4 support, sorry if that wasn’t clear. :)

AndreasMadsen commented 7 years ago

This is a lazy (but imho perfectly acceptable) fix:

I actually think that is a better test. Tests that are highly dependent on internal assumptions are always annoying when those assumptions change.

I was talking about machines that only have IPv4 support, sorry if that wasn’t clear. :)

Okay, now I understand the message. I was thinking of something completely different.

AndreasMadsen commented 7 years ago

Rebased. There was a minor conflict with e7c51454b0de0a6b33c15a4d77006a17ced4eeff.

refack commented 7 years ago

This is where I'm at with the TCPWrap:

{
  const stream_wrap = process.binding('stream_wrap');
  const tcp_wrap = process.binding('tcp_wrap');
  const server = net.createServer((socket) => {
    console.log('server.listen');
    socket.end('goodbye\n');
  }).listen(0, (c) => {
    const handle = new tcp_wrap.TCP();
    const req = new tcp_wrap.TCPConnectWrap();
    const sreq = new stream_wrap.ShutdownWrap();
    const wreq = new stream_wrap.WriteWrap();
    testInitialized(handle, 'TCP');
    testUninitialized(req, 'TCPConnectWrap');
    testUninitialized(sreq, 'ShutdownWrap');

    sreq.oncomplete = common.mustCall(() => {
      console.log('sreq.oncomplete');
      handle.close();
    });

    wreq.handle = handle;
    wreq.oncomplete = common.mustCall(() => {
      console.log('handle.shutdown(sreq)');
      handle.shutdown(sreq);
      testInitialized(sreq, 'ShutdownWrap');
    });
    wreq.async = true;

    console.log('server.listen', c);
    req.oncomplete = common.mustCall(() => {
      console.log('handle.shutdown(sreq)');
      const err = handle.writeLatin1String(wreq, 'hi');
      if (err)
        throw new Error(`write failed: ${process.binding('uv').errname(err)}`);
      testInitialized(wreq, 'WriteWrap');
    });
    console.log('pre handle.connect');
    net.connect(server.address(), (x) => {
      console.log('net.connect ', x);
      server.close();
    });
    req.address = server.address().address;
    req.port = server.address().port;
    handle.connect(req, req.address, req.port, (x) => {
      console.log('req.connect ', x);
    });
    testInitialized(req._handle, 'TCPConnectWrap');
  });
}

So now handle.connect(req, req.address, req.port) does throw EPIPE, but it doesn't work since testInitialized(req._handle, 'TCPConnectWrap'); Fails

If I comment out that assertion

handle.connect(req, req.address, req.port, (x) => {
      console.log('req.connect ', x);
});

Never actually connects, so req.oncomplete doesn't happen.

What am I doing wrong?

refack commented 7 years ago

P.S. the unwrapped connection works, stdout is:

server.listen undefined
pre handle.connect
net.connect  undefined
addaleax commented 7 years ago
handle.connect(req, req.address, req.port, (x) => {
      console.log('req.connect ', x);
});

Sorry, I’m probably missing context here, but this won’t work – handle.connect doesn’t take a callback, instead it calls req.oncomplete… does that help?

AndreasMadsen commented 7 years ago

@refack why are you connecting two sockets, first with net.connect and then with handle.connect? You are also closing the server in net.connect, I don't understand why. You should close the server in sreq.oncomplete.

and what @addaleax says is very important.

refack commented 7 years ago

Sorry, I’m probably missing context here, but this won’t work – handle.connect doesn’t take a callback, instead it calls req.oncomplete… does that help?

@refack why are you connecting two sockets, first with net.connect and then with handle.connect. You are also closing the server in net.connect, I don't understand why. You should close the server in sreq.oncomplete.

You're giving me good pointers. Current code is just a mish-mash of testing states.

The bottom line is

  1. testInitialized(req._handle, 'TCPConnectWrap'); fails with TypeError: Cannot read property 'getAsyncId' of undefined
  2. If I comment out that fail, req.oncomplete doesn't happen.

A reminder, before I added a real server, both would pass, but in the req.oncomplete we would get EPIPE on the write.