versatica / mediasoup

Cutting Edge WebRTC Video Conferencing
https://mediasoup.org
ISC License
6.14k stars 1.12k forks source link

consumer NACK with the specified sequence number packet but the retransmission fails #906

Open mengbieting opened 2 years ago

mengbieting commented 2 years ago

Hello, I want to inquire about a problem that NACK is sent but no retransmission succeed.

The scenario is as follows: SDK pushes the stream to mediasoup server1, and mediasoup server2 receives this stream from mediasoup server1 and then restores the packet to a frame (mediasoup server2 as consumer and receive data through pipetransport).

When I set the packet loss rate to 10% (the rtt is small) in test enviroment, I found that mediasoup server2 would sometimes request a packet with a certain sequence number until it finally reached the maximum number of times, and mediasoup server1 received the NACK of this sequence number packet message, but the packet buffer of this sequence number was not found in the send buffer of the consumer.

I continued to debug it and found that the problem is:

Suppose mediasoup server1 received RTP packets with sequence numbers 1, 2, 3, 4, and then received RTX packets with original sequence numbers 5 (the RTP packets with sequence numbers greater than 5 have not been received now), and later received RTP packet with sequence number 7. Because the sequence is out of order, nackGenerator will generate the NACK message of the seq=6 packet (the packet with the sequence number 5 will not be required to retransmit because it has been marked in the recoverList), but when the RTX packet with the original sequence number 5 is received, this sequence number is not in the In the nackList, so the packet with sequence number 5 was discarded and was not forwarded to the consumer (mediasoup server2). The seq=6 packet was finally retransmitted to the mediasoup server2 successfully, but the seq=5 packet was discarded, even if the mediasoup server2 kept requesting retransmission, it could not be retransmitted.

image

I compared the processing logic of NackModule::OnReceivedPacket in nack_module.cc in webrtc, and found that in this case, webrtc just inserts the packet into the recoverList, does not require retransmission, but it does not discard the packet, and It is to continue to stuff data into the packetBuffer in RtpVideoStreamReceiver.

image

I would like to ask, what is the purpose of not forwarding this packet to consumers here? Is it better to return true here, thanks

jmillan commented 2 years ago

Thanks for the detailed observations.

Next week some of us won't be fully online, but we'll check at this as soon as possible.

mengbieting commented 2 years ago

thank you~

penguinol commented 2 years ago

@jmillan @mengbieting We also noticed this issue. libwebrtc uses rtx for bandwidth estimate, so rtx packets may arrived earlier than rtp packets. Espically when there is packets loss, we will insert the rtx packet into recovered list and drop it, and will never sending for this packet.

And there is another situaltion: Currently we will not send nack immediatley, but wait a while, we may receive rtx packet before sending nack. https://github.com/versatica/mediasoup/blob/bba2aa05d5e88f0ab1d46ea215caf7b22331b420/worker/src/RTC/NackGenerator.cpp#L65-L86 So maybe we should always return true here.

mengbieting commented 2 years ago

thanks for reply~ Do you mean when mediasoup receive rtx packet before sending nack, maybe we should return true in NackGenerator::ReceivePacket if isRecovered deal logic? because if the rtx packet is received before the nack is sent, the code will enter the processing logic of NackGenerator::ReceivePacket if isRecovered

image

penguinol commented 2 years ago

There are two situaltions:

  1. RTX seq num >= max RTP seq num. Seq num is not in nack list. will return false here: https://github.com/versatica/mediasoup/blob/bba2aa05d5e88f0ab1d46ea215caf7b22331b420/worker/src/RTC/NackGenerator.cpp#L124-L126

  2. RTX seq num < max RTP seq num. Seq num is in nack list, but has not been sent yet, will return false here https://github.com/versatica/mediasoup/blob/bba2aa05d5e88f0ab1d46ea215caf7b22331b420/worker/src/RTC/NackGenerator.cpp#L82-L85

Return true is a temporary way to fix the problem, but i'm not sure whether there is any side-effect. In generally, we don't want to send duplicate packets to receivers. Need more time to think about it.

penguinol commented 2 years ago

The first situaltion is not recoverable, because we have inserted the seq num to recoveredList, and will never send a nack for this packet again. The second situaltion is recoverable, because we will send nack later.

mengbieting commented 2 years ago

thanks, yes, the second situaltion is recoverable, even if the nack has not been sent yet, we will send the nack later. After receiving the rtx packet again, it will be recovered and forwarded to the consumer, consumer will not need to nack this packet all the time. But the first case is unrecoverable. In this case, if we do not recovered and forwarding the packet to the consumer, the consumer will keep nacking the packet until the maximum number of times, so it may be a better decision to restore the packet to the consumer in this case

HuHeng commented 2 years ago

why mediasoup received rtx who's origin seq number greater than the max RTP sequence number?

penguinol commented 2 years ago

Because libwebrtc use rtx for bandwidth estimate, it may send rtx packet without receiving NACK. Such as: Libwebrtc sent rtp packets with seq num 1, 2, 3 and then sent rtx packet with rtp seq num 3 for bandwidth estimate. Rtp packet with seq num 3 may lost beacuse of networker problem, but rtx packet with rtp seq num 3 arrived. So finally we got rtp packets with seq num 1, 2 and rtx packet with seq num 3.

penguinol commented 1 year ago

thanks for reply~ Do you mean when mediasoup receive rtx packet before sending nack, maybe we should return true in NackGenerator::ReceivePacket if isRecovered deal logic? because if the rtx packet is received before the nack is sent, the code will enter the processing logic of NackGenerator::ReceivePacket if isRecovered

image

I did some tests, if return True here, we may send duplicate rtp packets to the consumer.

mengbieting commented 1 year ago

Hi, the scenario you describe is when the rtx packet arrives before the rtp packet with the corresponding seq number, and the rtp packet is not really lost, in this case, the modification of return true will cause mediasoup to send duplicate rtp packet the consumer, right? I applied this modification to my project before, and found that the caton rate on the consumer side has decrease compared with the same conditions, but I didn't notice the problem that duplicate seq number packets may be sent by mediasoup, thank you point out this problem, if I have a better way to modify it, I will submit it to this issue again, thanks~

penguinol commented 1 year ago

There are two situations:

  1. rtp packet arrived after rtx packet because of out of order.
  2. libwebrtc may send serveral rtx packets with same rtp seq num to fill the bandwidth.

912 considered the second situation.

ibc commented 1 year ago

We don't forget about this, it's just that we are focused on some incoming huge changes and prefer to handle this PR after them.

Romantic-LiXuefeng commented 1 year ago

Because libwebrtc use rtx for bandwidth estimate, it may send rtx packet without receiving NACK. Such as: Libwebrtc sent rtp packets with seq num 1, 2, 3 and then sent rtx packet with rtp seq num 3 for bandwidth estimate. Rtp packet with seq num 3 may lost beacuse of networker problem, but rtx packet with rtp seq num 3 arrived. So finally we got rtp packets with seq num 1, 2 and rtx packet with seq num 3.

Such as: Libwebrtc sent rtp packets with seq num 1, 2, 3 and then sent rtx packet with rtp seq num 3 for bandwidth estimate.

Hello @penguinol , Does I miss something? In probe state, Pacer deliver the the media rtp packets as probe packet directly. only padding packets in rtx. What's the situaltion sent the rtx packet with seq_num 3?