w3c / webrtc-encoded-transform

WebRTC Encoded Transform
https://w3c.github.io/webrtc-encoded-transform/
Other
124 stars 27 forks source link

expose RTP sequence numbers #166

Open fippo opened 1 year ago

fippo commented 1 year ago

Discussed at the October VI.
Action item: "wait for the use cases before proceeding"

Use case: https://w3c.github.io/webrtc-nv-use-cases/#auction Related requirement: N39 from here

PR #154 addresses this for incoming audio

alvestrand commented 1 year ago

per comment on the PR: Needs to discuss whether or not to expose the long number (seqno + ROC) or the short number (seqno only).

steely-glint commented 1 year ago

Some context - the short seqno in the RTP header wraps every 64k packets. That's every ~21 mins with a 20ms audio ptime. At the wrap time some packets will appear (to the naive javascript coder) to be out of order, so they will need to detect the wrap and create and maintain a rollover counter to compensate (adding state). This is exactly the process that the SRTP stack has already carried out in order to decrypt the packets. (What's more the decryption serves as a verification that out-of-order packets have been correctly handled - it fails if you get index wrong).

So if it is practical I recommend that we expose the long index - applications can always get the short seqno by ignoring the upper bits.

fippo commented 1 year ago

This is typically done only by the SRTP stack, the RTP stacks I know consider sequence numbers to be 16bit integers internally and do comparisons with wraparound (RTCP is the only exception here since it actually carries the extended sequence number in the RR report blocks).

steely-glint commented 1 year ago

Agreed, but webRTC is SRTP only so the value is available (and more importantly checked) in the SRTP layer. This looks like one thing that RTCP got right ;-)

vitaly-castLabs commented 1 year ago

Do I understand it correctly that there's no way to detect dropped frames until this feature is supported? What I observe in my setup (H.264 video extracted from Encoded Transform and fed into MSE) is if a frame(s) gets dropped webrtc requests a key frame from the sender, but in the meantime keeps pushing delta frames via Encoded Transform as if nothing happened, which obviously leads to ugly artifacts until that requested key frame arrives. I've looked into what I get from ET in Chrome to be able to spot a dropped frame(s), but didn't find anything useful

fippo commented 1 year ago

see #168 for the discussion on long vs short sequence numbers

@vitaly-castLabs the current PR only exposes this for audio where the definition is "easy". It won't help with detection of lost frames since H264 doesn't have the concept of a picture id that VPx has.

vitaly-castLabs commented 1 year ago

I don't think it's true. What about frame_num and pic_order_cnt_lsb in the slice header? (paragraph 7.3.3 in 14496-10:2020(E)). But anyway it's not even relevant. What I'm looking for (maybe other people will find it useful too) is that every encoded frame (both audio and video) gets assigned a sequence number on the sending side and this sequence number is exposed on the receiver as well. Honestly I don't see any reason why this sequence number couldn't be taken care of WebRTC in a codec-agnostic way