paullouisageneau / libdatachannel

C/C++ WebRTC network library featuring Data Channels, Media Transport, and WebSockets
https://libdatachannel.org/
Mozilla Public License 2.0
1.68k stars 345 forks source link

SRTP protect error, status=10 #866

Open toschlog opened 1 year ago

toschlog commented 1 year ago

I'm seeing a fair number of exceptions thrown from rtc::Track->send() with the message "SRTP protect error, status=10". Does anyone know what those are?

When I get them, I get a lot of them for the same user. So I'm wondering if maybe that indicates a problem with the connection and I should disconnect and reconnect.

Thanks.

paullouisageneau commented 1 year ago

This error indicates a replay and means the RTP stream is inconsistent because sequence numbers are not strictly increasing as they should. Possibly, the RTP stream was restarted or replaced with another stream. If it happens, you need to send the new stream with a new SSRC, which requires renegotiating the track.

toschlog commented 1 year ago

When I get this error, I get a lot of them. Is it that one sequence number is out of order and then every send after that also generates an error? Is there a way I check the stream to verify that it's in an error state?

paullouisageneau commented 1 year ago

When I get this error, I get a lot of them. Is it that one sequence number is out of order and then every send after that also generates an error?

Yes that's right, typically the sequence number restarts back from a past number, then every following packet will have numbers in the past.

Is there a way I check the stream to verify that it's in an error state?

You can check the sequence numbers and you should see the discontinuity. This is normally a symptom of the stream source restarting, for instance if you blindly override the SSRC you should instead check that it did not change.

toschlog commented 1 year ago

I'm finally getting back to looking at this. @paullouisageneau can you explain what you mean by the following?

if you blindly override the SSRC you should instead check that it did not change.

I can see in our code that we are in fact writing the SSRC to every packet we sent. How do I check if it's changed?

Related to this, what's the difference between SSRC and mid? I wonder if our code is mixing up the two.

Thanks for all your help.

paullouisageneau commented 1 year ago

I can see in our code that we are in fact writing the SSRC to every packet we sent. How do I check if it's changed?

Just read the value was before writing the new SSRC, and check if it changed compared to the last packet.

Related to this, what's the difference between SSRC and mid? I wonder if our code is mixing up the two.

The mid is the media ID, it's a string which identifies a media entry within the description, in libdatachannel it maps to a track. The SSRC is a random 32-bit number which identifies the source of an RTP stream.

toschlog commented 1 year ago

I'm not sure if this is my problem or not, but I just realized that my code is not handling control packets like "sender report" and "receiver report".

My app is media server. So in the onMessage handler for Alice's track I send the audio data to Bob and Carol. But what should I be doing with the control packets? (And is any payloadType >= 192 a control packet?) Should my server be sending a receiver report in reply to a sender report? Can I just ignore the sender reports? Can I ignore all the control packets? (My app is audio-only, so I think that FIR & PRI shouldn't be an issue for me.)

paullouisageneau commented 1 year ago

So what you are doing is actually close to the media-sfu example, I though that you forwarded an RTP stream from an external source. In that case you must indeed not forward sender reports or RTCP messages in general.

You should instead set aRtcpReceivingSession media handler on the track that will handle RTCP for you and you'll get only RTP in onMessage: https://github.com/paullouisageneau/libdatachannel/blob/1f6f09bbb5457895e422fea2960260b6dbef7192/examples/media-sfu/main.cpp#L52

toschlog commented 1 year ago

I've added the RtcpReceivingSession handler and that seems to do the trick. It's awesome that libdatachannel handles that for me.

I see in the source that this code is handling packet types 200 & 201. Are any of the other RTCP types used in WebRTC? I'm looking at this list: https://www.iana.org/assignments/rtp-parameters/rtp-parameters.xhtml#rtp-parameters-4

toschlog commented 1 year ago

Unfortunately, fixing the RTCP handing didn't fix the "SRTP protect error".

I've added some debugging to libsrtp and found that the error is happing in srtp_rdbx_check in this code:

    if ((int)(bitvector_get_length(&rdbx->bitmask) - 1) + delta < 0) {
        /* if delta is lower than the bitmask, it's bad */
        return srtp_err_status_replay_old;
    }

When this code runs, bitvector_get_length(&rdbx->bitmask) is 1024 and delta ranges -2017 from -28062.

I guess this sounds like a sequence number issue, but I don't really know. I did verify that SSRC values are not changing unexpectedly. I will continue poking around in libsrtp.

paullouisageneau commented 1 year ago

Unfortunately, fixing the RTCP handing didn't fix the "SRTP protect error".

Yes, the original issue is probably independent from RTCP handling.

I guess this sounds like a sequence number issue, but I don't really know. I did verify that SSRC values are not changing unexpectedly. I will continue poking around in libsrtp.

The error indeed means the sequence number went in the past for some reason, and cause is probably not in libsrtp.

I asked to check the SSRC because I thought that you where forwarding an external source which could have restarted. However, if you forward a track from a peer, you probably always set the same SSRC. Are you sure the track you forward did not restart, for instance because the peer reconnected?

toschlog commented 1 year ago

Are you sure the track you forward did not restart, for instance because the peer reconnected?

I'm pretty sure. If the peer disconnects we destroy everything and reconnect, creating a new PeerConnection and new tracks.

I don't expect that there's a problem in libsrtp, but I was hoping to get some information what's being sent to libsrtp that's causing a problem.

paullouisageneau commented 1 year ago

I'm pretty sure. If the peer disconnects we destroy everything and reconnect, creating a new PeerConnection and new tracks.

When the source peer reconnects, do you resume forwarding the packets to destination peers? You must not do that, since sequence numbers and timestamps will be inconsistent.

toschlog commented 1 year ago

When the source peer reconnects, do you resume forwarding the packets to destination peers? You must not do that, since sequence numbers and timestamps will be inconsistent.

We do, but we're not using the sequence numbers from the source. We keeps a sequence number value for each track, and every time we send a packet to that track we increment the value and stuff it into the packet. So regardless of what we send, the sequence number for each packet should be one greater than the previous packet.

paullouisageneau commented 1 year ago

Why didn't you tell this earlier? I don't know exactly how this logic ends up triggering the error, but in any case you must not blindly resequence the stream like this as the connection to the source peer might lose, reorder, or even duplicate packets. You must use the source sequence numbers and shift them by a given offset to map what the destination expects. The same goes for timestamps.

toschlog commented 1 year ago

Ok, thanks. I can write some code to maintain sequence order.

I'm surprised, however, that this would result in errors. At worst I'd expect degraded audio.

toschlog commented 1 year ago

Do I need to be messing with the sequence number at all? I just (belatedly, I know) looked at the media-sfu example and it doesn't do anything.

What about when a client (either the server or the receiver) disconnects and reconnects? Do I need to maintain the sequence number when communication resumes? Or is it ok for there to be a discontinuity?

bmionescu commented 1 year ago

I don't know if this will be helpful to anyone, but I also ran into this error message. I was using code based on both the "client" and "media-sender" examples in this repo. In my case it was because I (purposely) kept the webrtc connection up but restarted the gstreamer command which streamed to a port, which is read by the modified example script I'm using. Mr. Ageneau mentions that restarting the RTP stream can lead to this error and that's exactly my case. I want to be able to restart the stream to the port, without restarting/renegotiating anything else.

Where that error comes from on a lower level: when the stream is restarted, the sequence number is randomized, as it should be, for security reasons according to wikipedia. Sometimes when I restart the stream, the new randomly generated sequence number is greater than the last sequence number I had before I stopped the stream. In that case the stream restarts without error. But more often than not (and eventually guaranteed right?) if I keep restarting the stream I'll get a case where the randomly generated sequence number is less than the last sequence number I had before I stopped the stream, and that's an error. As Mr. Ageneau says in this thread, sequence numbers should always increase.

What I did to solve it in my case: it was suggested in this thread that if the stream restarts, a new SSRC number should be used and renegotiated etc. but (and maybe this is conceptually a bad thing to do, but, it works for me) instead of that, I use rtp->setSeqNumber to reset the sequence number before it is sent to the track. So my script defines and updates the sequence number and always resets the sequence number before sending it forward. This allows me to restart the gstreamer command as many times as I want without renegotiating the webrtc connection or anything else.