medooze / media-server-node

WebRTC Media Server for Node.js
MIT License
804 stars 119 forks source link

SegFault onDTLSStateChanged #91

Closed dt-atkinson closed 5 years ago

dt-atkinson commented 5 years ago

Hello again, I have found that when running load testing on my application I sometimes get a segfault - I have managed to get a stacktrace from gdb:

Thread 1 "node" received signal SIGSEGV, Segmentation fault.
0x0000000000b89249 in int v8::internal::BinarySearch<(v8::internal::SearchMode)1, v8::internal::DescriptorArray>(v8::internal::DescriptorArray*, v8::internal::Name*, int, int*) ()
(gdb) bt
#0  0x0000000000b89249 in int v8::internal::BinarySearch<(v8::internal::SearchMode)1, v8::internal::DescriptorArray>(v8::internal::DescriptorArray*, v8::internal::Name*, int, int*) ()
#1  0x0000000000fe4088 in v8::internal::LookupIterator::State v8::internal::LookupIterator::LookupInRegularHolder<false>(v8::internal::Map*, v8::internal::JSReceiver*) ()
#2  0x0000000000fe6792 in void v8::internal::LookupIterator::Start<false>() ()
#3  0x0000000000fe6e10 in v8::internal::LookupIterator::PropertyOrElement(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, bool*, v8::internal::LookupIterator::Configuration) ()
#4  0x000000000119383a in v8::internal::Runtime::GetObjectProperty(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, bool*) ()
#5  0x0000000000b2bde1 in v8::Object::Get(v8::Local<v8::Context>, v8::Local<v8::Value>) ()
#6  0x0000000000b2bf21 in v8::Object::Get(v8::Local<v8::Value>) ()
#7  0x00007f9231a71840 in DTLSICETransportListener::onDTLSStateChanged(DTLSICETransport::DTLSState)::{lambda()#1}::operator()() const (__closure=<optimized out>) at ../src/media-server_wrap.cxx:2276
#8  std::_Function_handler<void (), DTLSICETransportListener::onDTLSStateChanged(DTLSICETransport::DTLSState)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/7/bits/std_function.h:316
#9  0x00007f9231a6f253 in std::function<void ()>::operator()() const (this=0x7ffcdb9f1c00) at /usr/include/c++/7/bits/std_function.h:706
#10 MediaServer::async_cb_handler (handle=<optimized out>) at ../src/media-server_wrap.cxx:1830
#11 0x0000000000a6ee8f in uv__async_io (loop=0x264f860 <default_loop_struct>, w=<optimized out>, events=<optimized out>) at ../deps/uv/src/unix/async.c:118
#12 0x0000000000a80738 in uv__io_poll (loop=loop@entry=0x264f860 <default_loop_struct>, timeout=0) at ../deps/uv/src/unix/linux-core.c:379
#13 0x0000000000a6f7cb in uv_run (loop=0x264f860 <default_loop_struct>, mode=UV_RUN_DEFAULT) at ../deps/uv/src/unix/core.c:364
#14 0x0000000000904525 in node::Start(v8::Isolate*, node::IsolateData*, std::vector<std::string, std::allocator<std::string> > const&, std::vector<std::string, std::allocator<std::string> > const&) ()
#15 0x000000000090272c in node::Start(int, char**) ()
#16 0x00007f9234641b97 in __libc_start_main (main=0x8bbb30 <main>, argc=2, argv=0x7ffcdb9f5f58, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffcdb9f5f48) at ../csu/libc-start.c:310
#17 0x00000000008bbc65 in _start ()
dt-atkinson commented 5 years ago

Presumably this is a race condition between the main thread and the endpoint/connection threads, maybe when it's freeing memory up?

dt-atkinson commented 5 years ago

I see you already have this commit to look at this but maybe it hasn't solved all scenarios? https://github.com/medooze/media-server-node/commit/2c547554763e00a1a728a330d71183c85f3298e9

murillo128 commented 5 years ago

yes, the problem is that the event is scheduled before the object is deleted but runs after destructor (the v8 object is fred). Tried to solve it, but seems it isn't working.

Out on vacations this week, will fix it late next week. please revert to the version before the mem leak fix.

El mié., 10 jul. 2019 13:18, David Atkinson notifications@github.com escribió:

I see you already have this commit to look at this but maybe it hasn't solved all scenarios? 2c54755 https://github.com/medooze/media-server-node/commit/2c547554763e00a1a728a330d71183c85f3298e9

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/medooze/media-server-node/issues/91?email_source=notifications&email_token=AAIFN45LJSLN5OCHIZC3EQ3P6XHSJA5CNFSM4H7GS6W2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZTIXPA#issuecomment-510036924, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIFN4Y3RSCUT4TIQOI6DMTP6XHSJANCNFSM4H7GS6WQ .

murillo128 commented 5 years ago

Could you try to add a

this.transport.SetListener(null);
this.transport.SetSenderSideEstimatorListener(null);

in here https://github.com/medooze/media-server-node/blob/master/lib/Transport.js#L808

dt-atkinson commented 5 years ago

Still seems to segfault in the same place - sorry I didn't mean to disrupt your holiday!

murillo128 commented 5 years ago

Could you check with latest version? I solved a seg fault on dtls event I have been able to reproduce locally, but not sure if it is exactly the same one that you are facing.

dt-atkinson commented 5 years ago

Welcome back! New stack trace :)

Thread 1 "node" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f1dcbe74801 in __GI_abort () at abort.c:79
#2  0x00007f1dcbebd897 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f1dcbfeab9a "%s\n")
    at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007f1dcbec490a in malloc_printerr (str=str@entry=0x7f1dcbfe8cba "corrupted double-linked list") at malloc.c:5350
#4  0x00007f1dcbec4ac4 in malloc_consolidate (av=av@entry=0x7f1dcc21fc40 <main_arena>) at malloc.c:4456
#5  0x00007f1dcbecc03b in _int_free (have_lock=0, p=<optimized out>, av=0x7f1dcc21fc40 <main_arena>) at malloc.c:4362
#6  __GI___libc_free (mem=0x54f4150) at malloc.c:3124
#7  0x00007f1dc929cc60 in moodycamel::ConcurrentQueueDefaultTraits::free (ptr=<optimized out>)
    at ../media-server/include/concurrentqueue.h:332
#8  moodycamel::ConcurrentQueue<EventLoop::SendBuffer, moodycamel::ConcurrentQueueDefaultTraits>::destroy_array<moodycamel::ConcurrentQueue<EventLoop::SendBuffer, moodycamel::ConcurrentQueueDefaultTraits>::Block> (count=<optimized out>, p=<optimized out>)
    at ../media-server/include/concurrentqueue.h:3514
#9  moodycamel::ConcurrentQueue<EventLoop::SendBuffer, moodycamel::ConcurrentQueueDefaultTraits>::~ConcurrentQueue (this=0x4972aa0, 
    __in_chrg=<optimized out>) at ../media-server/include/concurrentqueue.h:814
#10 EventLoop::~EventLoop (this=0x4972a58, __in_chrg=<optimized out>) at ../media-server/src/EventLoop.cpp:52
#11 0x00007f1dc92a9191 in RTPBundleTransport::~RTPBundleTransport (this=0x4972a40, __in_chrg=<optimized out>)
    at ../media-server/src/RTPBundleTransport.cpp:53
#12 0x00007f1dc925a361 in _wrap_delete_RTPBundleTransport (data=...) at ../src/media-server_wrap.cxx:12563
#13 0x0000000000ec3013 in v8::internal::GlobalHandles::DispatchPendingPhantomCallbacks(bool) ()
#14 0x0000000000ec323a in v8::internal::GlobalHandles::PostGarbageCollectionProcessing(v8::internal::GarbageCollector, v8::GCCallbackFlags)
    ()
#15 0x0000000000effe3a in v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) ()
#16 0x0000000000f00c64 in v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) ()
#17 0x0000000000f0143a in v8::internal::Heap::FinalizeIncrementalMarkingIfComplete(v8::internal::GarbageCollectionReason) ()
#18 0x0000000000f04b17 in v8::internal::IncrementalMarkingJob::Task::RunInternal() ()
#19 0x0000000000bf5d36 in v8::internal::CancelableTask::Run() ()
#20 0x0000000000980e74 in node::PerIsolatePlatformData::RunForegroundTask(std::unique_ptr<v8::Task, std::default_delete<v8::Task> >) ()
#21 0x0000000000982552 in node::PerIsolatePlatformData::FlushForegroundTasksInternal() ()
#22 0x0000000000a6ee8f in uv__async_io (loop=0x264f860 <default_loop_struct>, w=<optimized out>, events=<optimized out>)
    at ../deps/uv/src/unix/async.c:118
#23 0x0000000000a80738 in uv__io_poll (loop=loop@entry=0x264f860 <default_loop_struct>, timeout=156)
    at ../deps/uv/src/unix/linux-core.c:379
#24 0x0000000000a6f7cb in uv_run (loop=0x264f860 <default_loop_struct>, mode=UV_RUN_DEFAULT) at ../deps/uv/src/unix/core.c:364
#25 0x0000000000904525 in node::Start(v8::Isolate*, node::IsolateData*, std::vector<std::string, std::allocator<std::string> > const&, std::vector<std::string, std::allocator<std::string> > const&) ()
#26 0x000000000090272c in node::Start(int, char**) ()
#27 0x00007f1dcbe55b97 in __libc_start_main (main=0x8bbb30 <main>, argc=2, argv=0x7ffc1a7ea438, init=<optimized out>, 
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc1a7ea428) at ../csu/libc-start.c:310
#28 0x00000000008bbc65 in _start ()
murillo128 commented 5 years ago

This one seems different, could you open a new issue instead?

Also, if it is a double free, it would be easy to catch it with asan, could you comment out this line: https://github.com/medooze/media-server-node/blob/master/binding.gyp#L29

and run the node project again? You will probably need to preload the asan lib:

LD_PRELOAD=/usr/lib/gcc/x86_64-linux-gnu/8/libasan.so node index.js

Also please enable debug logs and send me te lines before the crash