Open ibc opened 5 months ago
chrome use [dependency-descriptor-rtp-header-extension](https://aomediacodec.github.io/av1-rtp-spec/#dependency-descriptor-rtp-header-extension)
on H264 too
Ok, so this issue is about implementing dependency descriptor RTP extension.
DD does transport the temporal layer and it not codec dependent. GFD (generic frame descriptor) does too but is deprecated, see comments here
DD does transport the temporal layer and it not codec dependent. GFD (generic frame descriptor) does too but is deprecated, see comments here
What is GFD? The old frame-marking extension?
H264.cpp
payload descriptor parser relies on frame-marking RTP extension to get information about temporal layer of the payload (among other fields). However, libwebrtc (AKA Chrome) no longer enables/implements frame marking RTP extension. Result is that, even if client enables temporal layers in H264, mediasoup will never detect them and will consider that all H264 received packets belong to temporal layer 0.RTP extensions offered by Chrome when using H264 codec with temporal layers enabled (by passing proper
scalabilityMode
value to theencodings
) are the following:Relevant RTP extensions are the following:
https://aomediacodec.github.io/av1-rtp-spec/#dependency-descriptor-rtp-header-extension
RTP Payload Format For AV1 extension "describes an RTP payload format for the AV1 video codec".
So clearly this is not valid here.
http://www.webrtc.org/experiments/rtp-hdrext/video-layers-allocation00
Video Layers Allocation extension "is for a video sender to provide information about the target bitrate, resolution and frame rate of each scalability layer in order to aid a selective forwarding middlebox to decide which layer to relay."
So this is NOT what we need since it doesn't indicate which spatial/temporal layer current packet belongs to. This is a RTP extension to tell the remote side how many spatial/layers we are generating, the target bitrate of each layer and so on.
Conclusion
AFAIS there is literally no way to detect which spatial/temporal layer a received H264/H264_SVC payload belongs.
Or perhaps we must parse the codec payload to parse its spatial/temporal layers? Note that we already do a very basic parsing of the H264/H264_SVC payload to detect keyframes: