versatica / mediasoup

Cutting Edge WebRTC Video Conferencing
https://mediasoup.org
ISC License
6.18k stars 1.12k forks source link

Worker hits ASSERT condition in production #1316

Open gitamirp opened 8 months ago

gitamirp commented 8 months ago

worker hits assert in

https://github.com/versatica/mediasoup/blob/v3/worker/src/RTC/RateCalculator.cpp#L37

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f8d1f851535 in __GI_abort () at abort.c:79
#2  0x000055814b9c3e7f in RTC::RateCalculator::Update(unsigned long, unsigned long) ()
#3  0x000055814ba754d1 in RTC::WebRtcTransport::SendRtpPacket(RTC::Consumer*, RTC::RtpPacket*, std::function<void (bool)> const*) ()
#4  0x000055814ba528d2 in RTC::Transport::OnConsumerSendRtpPacket(RTC::Consumer*, RTC::RtpPacket*) ()
#5  0x000055814ba116f0 in RTC::SimpleConsumer::SendRtpPacket(RTC::RtpPacket*, std::shared_ptr<RTC::RtpPacket>&) ()
#6  0x000055814b9d38c2 in RTC::Router::OnTransportProducerRtpPacketReceived(RTC::Transport*, RTC::Producer*, RTC::RtpPacket*) ()
#7  0x000055814ba510a0 in RTC::Transport::OnProducerRtpPacketReceived(RTC::Producer*, RTC::RtpPacket*) ()
#8  0x000055814b9bd28e in RTC::Producer::ReceiveRtpPacket(RTC::RtpPacket*) ()
#9  0x000055814ba3d326 in RTC::Transport::ReceiveRtpPacket(RTC::RtpPacket*) ()
#10 0x000055814ba7ee97 in RTC::WebRtcTransport::OnRtpDataReceived(RTC::TransportTuple*, unsigned char const*, unsigned long) ()
#11 0x000055814ba808b3 in RTC::WebRtcTransport::OnPacketReceived(RTC::TransportTuple*, unsigned char const*, unsigned long) ()
#12 0x000055814ba81192 in non-virtual thunk to RTC::WebRtcTransport::OnUdpSocketPacketReceived(RTC::UdpSocket*, unsigned char const*, unsigned long, sockaddr const*) ()
#13 0x000055814ba6a5d1 in RTC::UdpSocket::UserOnUdpDatagramReceived(unsigned char const*, unsigned long, sockaddr const*) ()
#14 0x000055814b9218ce in onRecv(uv_udp_s*, long, uv_buf_t const*, sockaddr const*, unsigned int) ()
#15 0x000055814bde4c5d in uv.udp_io ()
#16 0x000055814bde7fd6 in uv.io_poll ()
#17 0x000055814bddb6ae in uv_run ()
#18 0x000055814b8c410f in DepLibUV::RunLoop() ()
#19 0x000055814b8ed381 in Worker::Worker(Channel::ChannelSocket*, PayloadChannel::PayloadChannelSocket*) ()
#20 0x000055814b8be813 in mediasoup_worker_run ()
#21 0x000055814b8bc709 in main ()
jmillan commented 8 months ago

@gitamirp,

gitamirp commented 8 months ago

@jmillan thanks for looking into it. We are using version 3.12.12. Unfortunately, it happens in a production environment therefore its harder to get detailed logs or STR for it. It rarely happens but I can try to get some more info around it.

nazar-pc commented 8 months ago

You should upgrade to latest release first, 3.12.12 is very old and the issue you are hitting might have been long fixed

jmillan commented 8 months ago

Yes please, you're using quite an old version. Upgrade it as soon as you can.

Don't hesitate to reopen the issue if it comes out in the future.

ibc commented 8 months ago

Honestly there is no change since 3.12.12 that could have fix this issue. Let me reopen to not forget. I think we need a fuzzer test for this class.

gitamirp commented 8 months ago

thanks, fwiw there were no significant changes to RateCalculator.cpp over the last 2 years.

https://github.com/versatica/mediasoup/commits/v3/worker/src/RTC/RateCalculator.cpp

I suspect it might occur when

https://github.com/versatica/mediasoup/blob/v3/worker/src/RTC/RateCalculator.cpp#L27

nowMs - this->newestItemStartTime >= this->itemSizeMs

and

https://github.com/versatica/mediasoup/blob/v3/worker/src/RTC/RateCalculator.cpp#L32

this->newestItemIndex >= this->windowItems

and

this->oldestItemIndex == 0

probably just logging an error and resting the RateCalculator will be more appropriate error handling

ibc commented 8 months ago

probably just logging an error and resting the RateCalculator will be more appropriate error handling

Defensive programming hides real bugs. We don't do that. If something should never happen then it must never happen, otherwise it's a bug that needs to be fixed.

nazar-pc commented 8 months ago

If there was a memory corruption somewhere, side-effects might be unexpected. This needs to be confirmed on latest version first.

gitamirp commented 8 months ago

probably just logging an error and resting the RateCalculator will be more appropriate error handling

Defensive programming hides real bugs. We don't do that. If something should never happen then it must never happen, otherwise it's a bug that needs to be fixed.

also true

gitamirp commented 8 months ago

If there was a memory corruption somewhere, side-effects might be unexpected. This needs to be confirmed on latest version first.

I am not sure how fast we can get these hosts on latest release. for now I will just patch it to suit our needs.

I'll let you know if I find more about this condition.

thanks

ibc commented 7 months ago

@gitamirp any news? I assumed you patched version 3.12.12 so it's not a problem for you anymore but perhaps you updated to latest version (without patching it)?

gitamirp commented 7 months ago

We patched 3.12.12 by using

MS_ASSERT( this->newestItemIndex != this->oldestItemIndex || this->oldestItemIndex == -1 || this->newestItemIndex, "newest index overlaps with the oldest one");

I will let you know if we run into this ASSERT again with this patch. If you are interested we can add some logs when

this->newestItemIndex == this->oldestItemIndex

I believe that in 3.12.12 both newest && oldest can be zero at the same time but I don't have logs to support it.

ibc commented 7 months ago

Yes please. Add those logs and comment here when you get something. We will investigate this next week. Too busy these days.

gitamirp commented 7 months ago

Sure, we'll do. thanks