nodejs / nan

Native Abstractions for Node.js
MIT License
3.28k stars 505 forks source link

EXC_BAD_ACCESS in node::EmitAsyncDestroy #772

Open markandrus opened 6 years ago

markandrus commented 6 years ago

Hi,

I'm occasionally hitting an error in the destructor for AsyncResource. Sorry I don't have a minimal reproduction example yet. I've just started hitting this since migrating js-platform/node-webrtc to AsyncResource. I've got a graph of objects, some of them just ObjectWrap instances and some of the ObjectWrap + AsyncResource instances. The AsyncResource instances wrap libuv's uv_default_loop(). I call Ref when constructing these and Unref in the uv_close_cb. The graph looks like this:

RTCPeerConnection (ObjectWrap, AsyncResource)
  |
  |
  +--> RTCRtpReceiver (ObjectWrap)
         |
         |
         |    MediaStream (ObjectWrap)
         |      |
         |      v
         +--> MediaStreamTrack (ObjectWrap, AsyncResource)

In my tests, I'm only seeing the issues in ~MediaStreamTrack, and then, only occasionally. I'm also not seeing the destructor fire twice or anything like that. Interestingly, if I comment out the node::EmitAsyncDestroy call, I don't see crashes. I'm using Node v9.8.0.

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000100009322 node`node::EmitAsyncDestroy(v8::Isolate*, node::async_context) + 18
node`node::EmitAsyncDestroy:
->  0x100009322 <+18>: movq   (%rax), %rax
    0x100009325 <+21>: movq   0x37(%rax), %rax
    0x100009329 <+25>: movq   0x10f(%rax), %rdi
    0x100009330 <+32>: movsd  -0x8(%rbp), %xmm0         ; xmm0 = mem[0],zero
Target 0: (node) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x0000000100009322 node`node::EmitAsyncDestroy(v8::Isolate*, node::async_context) + 18
    frame #1: 0x000000010491cbe0 wrtc.node`Nan::AsyncResource::~AsyncResource(this=0x0000000102a02e88) at nan.h:514
    frame #2: 0x00000001049536f6 wrtc.node`node_webrtc::MediaStreamTrack::~MediaStreamTrack(this=0x0000000102a02e70) at mediastreamtrack.cc:44
    frame #3: 0x00000001049539c5 wrtc.node`node_webrtc::MediaStreamTrack::~MediaStreamTrack(this=0x0000000102a02e70) at mediastreamtrack.cc:40
    frame #4: 0x0000000104953a39 wrtc.node`node_webrtc::MediaStreamTrack::~MediaStreamTrack(this=0x0000000102a02e70) at mediastreamtrack.cc:40
    frame #5: 0x0000000104923fbb wrtc.node`Nan::ObjectWrap::WeakCallback(info=0x00007fff5fbf5b48) at nan_object_wrap.h:126
    frame #6: 0x00000001004f3384 node`v8::internal::GlobalHandles::DispatchPendingPhantomCallbacks(bool) + 180
    frame #7: 0x00000001004f36b1 node`v8::internal::GlobalHandles::PostGarbageCollectionProcessing(v8::internal::GarbageCollector, v8::GCCallbackFlags) + 49
    frame #8: 0x0000000100509085 node`v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) + 2533
    frame #9: 0x000000010050800d node`v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) + 733
    frame #10: 0x00000001004c9c92 node`v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationSpace) + 82
    frame #11: 0x0000000100774125 node`v8::internal::Runtime_AllocateInNewSpace(int, v8::internal::Object**, v8::internal::Isolate*) + 101
    frame #12: 0x00000c0326b842fd
    frame #13: 0x00000c0326c842ab
    frame #14: 0x00000c0326c3d196
    frame #15: 0x00000c0326c09cfc
    frame #16: 0x00000c0326b84239
    frame #17: 0x00000c0326b84101
    frame #18: 0x00000001004bf669 node`v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, bool, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, v8::internal::Handle<v8::internal::Object>, v8::internal::Execution::MessageHandling) + 633
    frame #19: 0x00000001004bf893 node`v8::internal::Execution::TryCall(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, v8::internal::Execution::MessageHandling, v8::internal::MaybeHandle<v8::internal::Object>*) + 227
    frame #20: 0x00000001005e1c9e node`v8::internal::Isolate::PromiseReactionJob(v8::internal::Handle<v8::internal::PromiseReactionJobInfo>, v8::internal::MaybeHandle<v8::internal::Object>*, v8::internal::MaybeHandle<v8::internal::Object>*) + 654
    frame #21: 0x00000001005e273c node`v8::internal::Isolate::RunMicrotasksInternal() + 1180
    frame #22: 0x00000001005e15e8 node`v8::internal::Isolate::RunMicrotasks() + 56
    frame #23: 0x000000010497d872 wrtc.node`node_webrtc::PromiseFulfillingEventLoop<node_webrtc::PeerConnection>::Run(this=0x0000000101ff7700) at promisefulfillingeventloop.h:34
    frame #24: 0x00000001049822c0 wrtc.node`node_webrtc::EventLoop<node_webrtc::PeerConnection>::EventLoop(this=0x0000000104982270, handle=0x0000000101ff77a0)::'lambda'(uv_async_s*)::operator()(uv_async_s*) const at eventloop.h:49
    frame #25: 0x0000000104982288 wrtc.node`node_webrtc::EventLoop<node_webrtc::PeerConnection>::EventLoop(handle=0x0000000101ff77a0)::'lambda'(uv_async_s*)::__invoke(uv_async_s*) at eventloop.h:47
    frame #26: 0x00000001019e77cb libuv.1.dylib`uv__async_io + 331
    frame #27: 0x00000001019fa41d libuv.1.dylib`uv__io_poll + 2061
    frame #28: 0x00000001019e7d43 libuv.1.dylib`uv_run + 371
    frame #29: 0x0000000100034b75 node`node::Start(v8::Isolate*, node::IsolateData*, int, char const* const*, int, char const* const*) + 661
    frame #30: 0x0000000100031e83 node`node::Start(uv_loop_s*, int, char const* const*, int, char const* const*) + 435
    frame #31: 0x0000000100031988 node`node::Start(int, char**) + 536
    frame #32: 0x00000001000012c4 node`start + 52

I'm opening this in case there's something obvious here I'm missing. The async hooks functionality is very new to me. If it's not obvious, I'll try to create a minimal repro.

Thanks, Mark

markandrus commented 6 years ago

I have a much smaller repro example: https://github.com/markandrus/nan-asyncresource-bug

kkoopa commented 6 years ago

@ofrobots Any idea what could be up with this?

markandrus commented 6 years ago

I built Node 9.8.0 locally and got a more detailed backtrace:

EDIT: Seems like isolate->GetCurrentContext() returns nullptr.

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x000000010000c3fc node`node::EmitAsyncDestroy(v8::Isolate*, node::async_context) [inlined] void* v8::internal::Internals::ReadEmbedderData<void*>(context=0x0000000000000000, index=32) at v8.h:9183
[opt]
   9180   V8_INLINE static T ReadEmbedderData(const v8::Context* context, int index) {
   9181     typedef internal::Object O;
   9182     typedef internal::Internals I;
-> 9183     O* ctx = *reinterpret_cast<O* const*>(context);
   9184     int embedder_data_offset = I::kContextHeaderSize +
   9185         (internal::kApiPointerSize * I::kContextEmbedderDataIndex);
   9186     O* embedder_data = I::ReadField<O*>(ctx, embedder_data_offset);
Target 0: (node) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x000000010000c3fc node`node::EmitAsyncDestroy(v8::Isolate*, node::async_context) [inlined] void* v8::internal::Internals::ReadEmbedderData<void*>(context=0x0000000000000000, index=32) at v8.h:9183 [opt]
    frame #1: 0x000000010000c3fc node`node::EmitAsyncDestroy(v8::Isolate*, node::async_context) [inlined] v8::Context::GetAlignedPointerFromEmbedderData(this=0x0000000000000000, index=32) at v8.h:10331 [opt]
    frame #2: 0x000000010000c3fc node`node::EmitAsyncDestroy(v8::Isolate*, node::async_context) [inlined] node::Environment::GetCurrent(v8::Local<v8::Context>) at env-inl.h:289 [opt]
    frame #3: 0x000000010000c3fc node`node::EmitAsyncDestroy(v8::Isolate*, node::async_context) [inlined] node::Environment::GetCurrent(isolate=<unavailable>) at env-inl.h:284 [opt]
    frame #4: 0x000000010000c3f7 node`node::EmitAsyncDestroy(isolate=<unavailable>, asyncContext=(async_id = 288081, trigger_async_id = 0)) at async_wrap.cc:801 [opt]
    frame #5: 0x00000001022cf13b addon.node`addon::Foo::~Foo() at nan.h:513 [opt]
    frame #6: 0x00000001022cf124 addon.node`addon::Foo::~Foo() [inlined] addon::Foo::~Foo(this=0x0000000104c412d0) at foo.h:10 [opt]
    frame #7: 0x00000001022cf11f addon.node`addon::Foo::~Foo() [inlined] addon::Foo::~Foo(this=0x0000000104c412d0) at foo.h:10 [opt]
    frame #8: 0x00000001022cf11f addon.node`addon::Foo::~Foo(this=0x0000000104c412d0) at foo.h:10 [opt]
    frame #9: 0x00000001004e3fac node`v8::internal::GlobalHandles::DispatchPendingPhantomCallbacks(bool) [inlined] v8::internal::GlobalHandles::PendingPhantomCallback::Invoke(v8::internal::Isolate*) at global-handles.cc:853 [opt]
    frame #10: 0x00000001004e3f7d node`v8::internal::GlobalHandles::DispatchPendingPhantomCallbacks(this=0x00000001024052e0, synchronous_second_pass=<unavailable>) at global-handles.cc:818 [opt]
    frame #11: 0x00000001004e42e1 node`v8::internal::GlobalHandles::PostGarbageCollectionProcessing(this=0x00000001024052e0, collector=SCAVENGER, gc_callback_flags=<unavailable>) at global-handles.cc:874 [opt]
    frame #12: 0x00000001004f9d9d node`v8::internal::Heap::PerformGarbageCollection(this=0x0000000103000c20, collector=<unavailable>, gc_callback_flags=kNoGCCallbackFlags) at heap.cc:1548 [opt]
    frame #13: 0x00000001004f8d87 node`v8::internal::Heap::CollectGarbage(this=0x0000000103000c20, space=<unavailable>, gc_reason=<unavailable>, gc_callback_flags=<unavailable>) at heap.cc:1169 [opt]
    frame #14: 0x00000001004bb572 node`v8::internal::Factory::NewFillerObject(this=0x0000000103000c00, size=376, double_align=<unavailable>, space=NEW_SPACE) at factory.cc:88 [opt]
    frame #15: 0x0000000100778c35 node`v8::internal::Runtime_AllocateInNewSpace(int, v8::internal::Object**, v8::internal::Isolate*) [inlined] v8::internal::__RT_impl_Runtime_AllocateInNewSpace(v8::internal::Arguments, v8::internal::Isolate*) at runtime-internal.cc:322 [opt]
    frame #16: 0x0000000100778bea node`v8::internal::Runtime_AllocateInNewSpace(args_length=<unavailable>, args_object=<unavailable>, isolate=0x0000000103000c00) at runtime-internal.cc:315 [opt]
    frame #17: 0x00000ecf4ad842fd
    frame #18: 0x00000ecf4ae37e43
    frame #19: 0x00000ecf4af08129
    frame #20: 0x00000ecf4ae38016
    frame #21: 0x00000ecf4ae09cfc
    frame #22: 0x00000ecf4ad84239
    frame #23: 0x00000ecf4ad84101
    frame #24: 0x00000001004b07ff node`v8::internal::(anonymous namespace)::Invoke(isolate=0x0000000103000cc8, is_construct=<unavailable>, target=<unavailable>, receiver=<unavailable>, argc=5, args=0x00007fff5fbf5de0, new_target=<unavailable>, message_handling=<unavailable>) at execution.cc:145 [opt]
    frame #25: 0x00000001004b0a05 node`v8::internal::Execution::TryCall(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, v8::internal::Execution::MessageHandling, v8::internal::MaybeHandle<v8::internal::Object>*) [inlined] v8::internal::(anonymous namespace)::CallInternal(callable=Handle<v8::internal::Object> @ r13, argc=5, argv=0x00007fff5fbf5de0, message_handling=kReport) at execution.cc:181 [opt]
    frame #26: 0x00000001004b0988 node`v8::internal::Execution::TryCall(isolate=0x0000000103000c00, callable=Handle<v8::internal::Object> @ r13, receiver=<unavailable>, argc=5, args=0x00007fff5fbf5de0, message_handling=kReport, exception_out=<unavailable>) at execution.cc:233 [opt]
    frame #27: 0x00000001005c763f node`v8::internal::Isolate::PromiseReactionJob(this=0x0000000103000c00, info=<unavailable>, result=0x00007fff5fbf5ee0, maybe_exception=0x00007fff5fbf5ee8) at isolate.cc:3436 [opt]
    frame #28: 0x00000001005c80b4 node`v8::internal::Isolate::RunMicrotasksInternal(this=0x0000000103000c00) at isolate.cc:3509 [opt]
    frame #29: 0x00000001005c6ffa node`v8::internal::Isolate::RunMicrotasks(this=0x0000000103000c00) at isolate.cc:3489 [opt]
    frame #30: 0x00000ecf4af06587
    frame #31: 0x00000ecf4af097c2
    frame #32: 0x00000ecf4ad84239
    frame #33: 0x00000ecf4ad84101
    frame #34: 0x00000001004b07ff node`v8::internal::(anonymous namespace)::Invoke(isolate=0x0000000103000cc8, is_construct=<unavailable>, target=<unavailable>, receiver=<unavailable>, argc=0, args=0x0000000000000000, new_target=<unavailable>, message_handling=<unavailable>) at execution.cc:145 [opt]
    frame #35: 0x00000001004b052c node`v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) [inlined] v8::internal::(anonymous namespace)::CallInternal(message_handling=kReport) at execution.cc:181 [opt]
    frame #36: 0x00000001004b04bd node`v8::internal::Execution::Call(isolate=0x0000000103000c00, callable=Handle<v8::internal::Object> @ r12, receiver=<unavailable>, argc=0, argv=0x0000000000000000) at execution.cc:191 [opt]
    frame #37: 0x0000000100177bf0 node`v8::Function::Call(this=0x0000000103058ce0, context=<unavailable>, recv=<unavailable>, argc=0, argv=0x0000000000000000) at api.cc:5385 [opt]
    frame #38: 0x0000000100025787 node`node::InternalCallbackScope::Close(this=0x00007fff5fbf6320) at node.cc:1316 [opt]
    frame #39: 0x000000010002592b node`node::InternalMakeCallback(env=0x00007fff5fbfea18, recv=<unavailable>, callback=<unavailable>, argc=<unavailable>, argv=0x00007fff5fbf6410, asyncContext=<unavailable>) at node.cc:1351 [opt]
    frame #40: 0x0000000100025ab9 node`node::MakeCallback(isolate=<unavailable>, recv=<unavailable>, callback=<unavailable>, argc=<unavailable>, argv=<unavailable>, asyncContext=<unavailable>) at node.cc:1405 [opt]
    frame #41: 0x00000001022ce077 addon.node`addon::Bar::Run() [inlined] Nan::AsyncResource::runInAsyncScope(this=<unavailable>, target=(val_ = 0x0000000103048628), method=<unavailable>, argc=1, argv=0x0000000103048620) at nan.h:551 [opt]
    frame #42: 0x00000001022ce04a addon.node`addon::Bar::Run(this=0x00000001051bfc80) at bar.cc:35 [opt]
    frame #43: 0x00000001008e8e96 node`uv__async_io(loop=0x0000000101583960, w=<unavailable>, events=<unavailable>) at async.c:118 [opt]
    frame #44: 0x00000001008f8aab node`uv__io_poll(loop=<unavailable>, timeout=<unavailable>) at kqueue.c:0 [opt]
    frame #45: 0x00000001008e9330 node`uv_run(loop=0x0000000101583960, mode=UV_RUN_DEFAULT) at core.c:368 [opt]
    frame #46: 0x00000001000318d2 node`node::Start(isolate=0x0000000103000c00, isolate_data=<unavailable>, argc=<unavailable>, argv=<unavailable>, exec_argc=0, exec_argv=0x0000000102403ed0) at node.cc:4751 [opt]
    frame #47: 0x000000010002f52e node`node::Start(event_loop=0x0000000101583960, argc=2, argv=0x0000000102404500, exec_argc=0, exec_argv=0x0000000102403ed0) at node.cc:4822 [opt]
    frame #48: 0x000000010002f1a4 node`node::Start(argc=<unavailable>, argv=0x0000000102404500) at node.cc:4879 [opt]
    frame #49: 0x0000000100001634 node`start + 52
ofrobots commented 6 years ago

I ran out of time today, and couldn't take a deeper look; but looking at just the stack, I don't think it is safe to 'do JS or VM operations' from a weak callback (i.e. from GC). GC callbacks do not guarantee that a V8 context is attached when the weak callback is called.

kkoopa commented 6 years ago

I thought something similar based on the stack trace, but then the documentation needs updating, since it cannot be used with ObjectWrap.

markandrus commented 6 years ago

@ofrobots

I ran out of time today, and couldn't take a deeper look; but looking at just the stack, I don't think it is safe to 'do JS or VM operations' from a weak callback (i.e. from GC). GC callbacks do not guarantee that a V8 context is attached when the weak callback is called.

So the weak callback here is the "pending phantom callbacks" invoked by DispatchPendingPhantomCallbacks, which includes the destructor for Foo? Since the context may not be attached, can we just check before calling EmitAsyncDestroy? e.g.,

--- nan.h   2018-05-16 10:24:36.000000000 -0700
+++ nan.h   2018-05-16 10:24:45.000000000 -0700
@@ -510,7 +510,9 @@
   ~AsyncResource() {
 #if NODE_MODULE_VERSION >= NODE_9_0_MODULE_VERSION
     v8::Isolate* isolate = v8::Isolate::GetCurrent();
-    node::EmitAsyncDestroy(isolate, context);
+    if (!isolate->GetCurrentContext().IsEmpty()) {
+      node::EmitAsyncDestroy(isolate, context);
+    }
 #endif
   }

In my testing, this seems to work, although I haven't looked to see if it passes existing nodejs/nan tests. EDIT: I don't see issues with nodejs/nan tests; however, they weren't testing instances of both ObjectWrap and AsyncResource anyway.

ofrobots commented 6 years ago

@markandrus that's probably not the right fix – it would work around the issue, but you would potentially leak if there is anything in the application that is keeping track of async resources. You're never calling the destroy hook on the resource in these cases.

My suggestion would be to defer this work to a microtask you post to run later. More elegant solutions might be possible if somehow you can decouple the async resource from the lifetime of your ObjectWrap instance.

@kkoopa

I thought something similar based on the stack trace, but then the documentation needs updating, since it cannot be used with ObjectWrap.

You're right. This might be a common gotcha people run into, and the documentation is worth updating, or perhaps this indicates that there might be a missing abstraction needed here. I'll try to do this once I have some bandwidth available to think. Feel free to go ahead if you have time / motivation, however.

markandrus commented 6 years ago

@ofrobots thanks for your response. When you say

My suggestion would be to defer this work to a microtask you post to run later.

Do you mean the work of calling node::EmitAsyncDestroy in the AsyncResource destructor or the work of calling Unref on my ObjectWrap instance?

This might be a common gotcha people run into, and the documentation is worth updating, or perhaps this indicates that there might be a missing abstraction needed here.

I think others may run into this if/when they start heeding the deprecation warning on Nan::MakeCallback, since ObjectWrap + Nan::MakeCallback worked previously, but ObjectWrap + AsyncResource may not (at least if you use the Ref/Unref APIs).

More elegant solutions might be possible if somehow you can decouple the async resource from the lifetime of your ObjectWrap instance.

Hmm, I will think on it. Maybe I could have some AsyncResource* member on my ObjectWrap instance, and then delete it before calling Unref. It feels awkward, though.

markandrus commented 6 years ago

I have a PR that demonstrates decoupling the ObjectWrap from the AsyncResource here: https://github.com/markandrus/nan-asyncresource-bug/pull/1