H264 temporal layers no longer detected due to removal of frame-marking RTP extension in libwebrtc (must implement dependency descriptor RTP extension)

ibc commented 5 months ago

H264.cpp payload descriptor parser relies on frame-marking RTP extension to get information about temporal layer of the payload (among other fields). However, libwebrtc (AKA Chrome) no longer enables/implements frame marking RTP extension. Result is that, even if client enables temporal layers in H264, mediasoup will never detect them and will consider that all H264 received packets belong to temporal layer 0.

It also affects to the H264_SVC.cpp codec which also relies on frame marking RTP extension. In this case it means that mediasoup will consider that all H264_SVC received packets belong to spatial layer 0 and temporal layer 0.

RTP extensions offered by Chrome when using H264 codec with temporal layers enabled (by passing proper scalabilityMode value to the encodings) are the following:

a=extmap:2 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:13 urn:3gpp:video-orientation
a=extmap:3 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:5 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay
a=extmap:6 http://www.webrtc.org/experiments/rtp-hdrext/video-content-type
a=extmap:7 http://www.webrtc.org/experiments/rtp-hdrext/video-timing
a=extmap:8 http://www.webrtc.org/experiments/rtp-hdrext/color-space
a=extmap:4 urn:ietf:params:rtp-hdrext:sdes:mid
a=extmap:10 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id
a=extmap:11 urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id
a=extmap:12 https://aomediacodec.github.io/av1-rtp-spec/#dependency-descriptor-rtp-header-extension
a=extmap:9 http://www.webrtc.org/experiments/rtp-hdrext/video-layers-allocation00

Relevant RTP extensions are the following:

https://aomediacodec.github.io/av1-rtp-spec/#dependency-descriptor-rtp-header-extension

RTP Payload Format For AV1 extension "describes an RTP payload format for the AV1 video codec".

So clearly this is not valid here.

http://www.webrtc.org/experiments/rtp-hdrext/video-layers-allocation00

Video Layers Allocation extension "is for a video sender to provide information about the target bitrate, resolution and frame rate of each scalability layer in order to aid a selective forwarding middlebox to decide which layer to relay."

So this is NOT what we need since it doesn't indicate which spatial/temporal layer current packet belongs to. This is a RTP extension to tell the remote side how many spatial/layers we are generating, the target bitrate of each layer and so on.

Conclusion

AFAIS there is literally no way to detect which spatial/temporal layer a received H264/H264_SVC payload belongs.

Or perhaps we must parse the codec payload to parse its spatial/temporal layers? Note that we already do a very basic parsing of the H264/H264_SVC payload to detect keyframes:

Lynnworld commented 5 months ago

chrome use [dependency-descriptor-rtp-header-extension](https://aomediacodec.github.io/av1-rtp-spec/#dependency-descriptor-rtp-header-extension) on H264 too

ibc commented 5 months ago

Ok, so this issue is about implementing dependency descriptor RTP extension.

fippo commented 5 months ago

DD does transport the temporal layer and it not codec dependent. GFD (generic frame descriptor) does too but is deprecated, see comments here

ibc commented 4 months ago

DD does transport the temporal layer and it not codec dependent. GFD (generic frame descriptor) does too but is deprecated, see comments here

What is GFD? The old frame-marking extension?

fippo commented 4 months ago

this one -- similar to DD, a bit older and easier to parse. But it did not even get a pseudo spec on the url

ibc commented 4 months ago

this one -- similar to DD, a bit older and easier to parse. But it did not even get a pseudo spec on the url

So then we ignore it totally

versatica / mediasoup