versatica / mediasoup

Cutting Edge WebRTC Video Conferencing
https://mediasoup.org
ISC License
6.18k stars 1.12k forks source link

Mediasoup worker died, exiting in 2 seconds... #1392

Closed miroslavpejic85 closed 4 months ago

miroslavpejic85 commented 4 months ago

Bug Report

System Information and Environment:

For reference: mediasoup.discourse.group

Issue Description:

Since upgrading to Mediasoup version 3.14.5, our system has encountered frequent instances of Mediasoup workers terminating unexpectedly. This behavior is indicated by the message Mediasoup worker died, exiting in 2 seconds....

Troubleshooting Steps:

Following the documentation, we managed obtained a core dump of the worker, which will be provided as an attachment for further analysis.

(gdb) bt
```bash (gdb) bt #0 0x000055604c64eb58 in RTC::TransportTuple::GetProtocol (this=0x556518627283) at ../../../include/RTC/TransportTuple.hpp:92 #1 0x000055604c766313 in RTC::WebRtcTransport::OnIceServerTupleRemoved (this=0x55604e65a2b0, tuple=0x556518627283) at ../../../src/RTC/WebRtcTransport.cpp:1183 #2 0x000055604c64ef97 in RTC::IceServer::OnTimer (this=0x55604e615820, timer=0x55604e67fa90) at ../../../src/RTC/IceServer.cpp:935 #3 0x000055604c5ff52e in TimerHandle::OnUvTimer (this=0x55604e67fa90) at ../../../src/handles/TimerHandle.cpp:162 #4 0x000055604c5fe87b in onTimer (handle=0x55604e692230) at ../../../src/handles/TimerHandle.cpp:13 #5 0x000055604cb49005 in uv__run_timers (loop=0x55604e53bff0) at ../../../subprojects/libuv-v1.48.0/src/timer.c:193 #6 0x000055604cb4ec72 in uv_run (loop=0x55604e53bff0, mode=UV_RUN_DEFAULT) at ../../../subprojects/libuv-v1.48.0/src/unix/core.c:466 #7 0x000055604c5cda83 in DepLibUV::RunLoop () at ../../../src/DepLibUV.cpp:98 #8 0x000055604c5e06ed in Worker::Worker (this=0x7fff8c7dd100, channel=0x55604e53c5d0) at ../../../src/Worker.cpp:56 #9 0x000055604c5c3414 in mediasoup_worker_run (argc=16, argv=0x7fff8c7dd348, version=0x7fff8c7dd200 "3.14.5", consumerChannelFd=3, producerChannelFd=4, channelReadFn=0x0, channelReadCtx=0x0, channelWriteFn=0x0, channelWriteCtx=0x0) at ../../../src/lib.cpp:142 #10 0x000055604c80ed5f in main (argc=16, argv=0x7fff8c7dd348) at ../../../src/main.cpp:25 ```

(gdb) bt full
```bash #0 0x000055604c64eb58 in RTC::TransportTuple::GetProtocol (this=0x556518627283) at ../../../include/RTC/TransportTuple.hpp:92 No locals. #1 0x000055604c766313 in RTC::WebRtcTransport::OnIceServerTupleRemoved (this=0x55604e65a2b0, tuple=0x556518627283) at ../../../src/RTC/WebRtcTransport.cpp:1183 No locals. #2 0x000055604c64ef97 in RTC::IceServer::OnTimer (this=0x55604e615820, timer=0x55604e67fa90) at ../../../src/RTC/IceServer.cpp:935 storedTuple = 0x556518627283 it = __for_range = std::__cxx11::list = {[0] = {hash = 15945316816845144064, udpSocket = 0x55604e6c2030, udpRemoteAddr = 0x55604e6d24a0, tcpConnection = 0x0, localAnnouncedAddress = "", udpRemoteAddrStorage = {ss_family = 2, __ss_padding = "\335I%)\273\036", '\000' , __ss_align = 0}, protocol = RTC::TransportTuple::Protocol::UDP}} __for_begin = __for_end = {hash = 1, udpSocket = 0x55604e6d2460, udpRemoteAddr = 0x55604e67fa90, tcpConnection = 0x0, localAnnouncedAddress = , udpRemoteAddrStorage = {ss_family = 1, __ss_padding = "\000\000\000\000\000\000xt\000\000\000\000\000\000\000\004\000\000\000\000\000\000\340YaN`U\000\000\003\000\000\000\000\000\000\000y\000\000\000\000\000\000\000", '\377' , "\003\000\000\000\002\000\000\000\001", '\000' , "a\000\000\000\000\000\000\000\260\326^N`U\000", __ss_align = 0}, protocol = (unknown: 0x80)} __FUNCTION__ = "OnTimer" #3 0x000055604c5ff52e in TimerHandle::OnUvTimer (this=0x55604e67fa90) at ../../../src/handles/TimerHandle.cpp:162 No locals. #4 0x000055604c5fe87b in onTimer (handle=0x55604e692230) at ../../../src/handles/TimerHandle.cpp:13 No locals. #5 0x000055604cb49005 in uv__run_timers (loop=0x55604e53bff0) at ../../../subprojects/libuv-v1.48.0/src/timer.c:193 heap_node = 0x55604e5fec88 handle = 0x55604e692230 queue_node = 0x55604e692298 ready_queue = {next = 0x55604e629318, prev = 0x55604e721bd8} #6 0x000055604cb4ec72 in uv_run (loop=0x55604e53bff0, mode=UV_RUN_DEFAULT) at ../../../subprojects/libuv-v1.48.0/src/unix/core.c:466 timeout = 1 r = 0 can_sleep = 1 #7 0x000055604c5cda83 in DepLibUV::RunLoop () at ../../../src/DepLibUV.cpp:98 __FUNCTION__ = "RunLoop" ret = 21856 #8 0x000055604c5e06ed in Worker::Worker (this=0x7fff8c7dd100, channel=0x55604e53c5d0) at ../../../src/Worker.cpp:56 --Type for more, q to quit, c to continue without paging-- No locals. #9 0x000055604c5c3414 in mediasoup_worker_run (argc=16, argv=0x7fff8c7dd348, version=0x7fff8c7dd200 "3.14.5", consumerChannelFd=3, producerChannelFd=4, channelReadFn=0x0, channelReadCtx=0x0, channelWriteFn=0x0, channelWriteCtx=0x0) at ../../../src/lib.cpp:142 worker = { = { = {_vptr.RequestHandler = 0x55604cf6bb10 }, = { _vptr.NotificationHandler = 0x55604cf6bb58 }, }, = {_vptr.Listener = 0x55604cf6bb80 }, = { _vptr.Listener = 0x55604cf6bba8 }, channel = 0x55604e53c5d0, signalHandle = 0x55604e5b3920, shared = 0x55604e5b2be0, mapWebRtcServers = {, std::allocator >, RTC::WebRtcServer*>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator, std::allocator > const, RTC::WebRtcServer*> > >> = {, std::allocator >, RTC::WebRtcServer*>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator, std::allocator > const, RTC::WebRtcServer*> > >> = { settings_ = {, std::allocator > const, RTC::WebRtcServer*> > >, absl::lts_20230802::integer_sequence, true>> = { = {}, > = { value = { = {}, control_ = 0x55604cd31d40 , slots_ = 0x0, capacity_ = 0, compressed_tuple_ = {, absl::lts_20230802::integer_sequence, true>> = { = {}, > = { value = 0}, > = { = {}, }, }, }}}, > = { = {}, }, > = { = {}, }, , std::allocator > const, RTC::WebRtcServer*> >, 3, true>> = {, std::allocator > const, RTC::WebRtcServer*> >> = {<__gnu_cxx::new_allocator, std::allocator > const, RTC::WebRtcServer*> >> = {}, }, }, }, }}, }, }, mapRouters = {, std::allocator >, RTC::Router*>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator, std::allocator > const, RTC::Router*> > >> = {, std::allocator >, RTC::Router*>, absl::lts_20230802::container_internal::StringHash, absl::lts_20230802::container_internal::StringEq, std::allocator, std::allocator > const, RTC::Router*> > >> = { settings_ = {, std::allocator > const, RTC::Router*> > >, absl::lts_20230802::integer_sequence, true>> = { = {}, > = { value = { = {}, control_ = 0x55604e6bef58, slots_ = 0x55604e6bef70, capacity_ = 3, compressed_tuple_ = {, absl::lts_20230802::integer_sequence, true>> = { = {}, > = { value = 2}, > = { = {}, }, }, }}}, > = { = {}, }, > = { = {}, }, , std::allocator > const, RTC::Router*> >, 3, true>> = {, std::allocator > const, RTC::Router*> >> = {<__gnu_cxx::new_allocator, std::allocator > const, RTC::Router*> >> = {}, }, }, }, }}, }, }, closed = false} channel = std::unique_ptr = {get() = 0x55604e53c5d0} __FUNCTION__ = "mediasoup_worker_run" #10 0x000055604c80ed5f in main (argc=16, argv=0x7fff8c7dd348) at ../../../src/main.cpp:25 __FUNCTION__ = "main" version = "3.14.5" statusCode = 0 ```
ibc commented 4 months ago

As @snnz said in the forum, this looks like the culprit:

https://mediasoup.discourse.group/t/mediasoup-worker-died-exiting-in-2-seconds/6035/7

It looks like after this commit 1 IceServer::OnTimer may end up calling IceServer::RemoveTuple, in the same way IceServer::~IceServer does.

I am on it.

ibc commented 4 months ago

@miroslavpejic85, PR here: https://github.com/versatica/mediasoup/pull/1393

However I may need your help if possible. Let's please follow up here in the PR: https://github.com/versatica/mediasoup/pull/1393#issuecomment-2100882064