Closed mickel8 closed 2 years ago
In AV1, it is possible for an encoder to generate multiple encodings within the same bitstream. AFAICT this capability is not supported in other codecs. Currently, the 'S' modes defined in AV1 are not supported within WebCodecs (or WebRTC).
VP8 does not support spatial scalability, only simulcast. So the paragraph was trying to make clear that spatial modes such as L3T3
are not possible with VP8.
The difference between simulcast and SVC can be seen in the dependency diagrams. A frame in a simulcast layer depends only on other frames in that same layer, whereas in SVC a higher layer can depend on a lower layer.
In WebCodecs, simulcast can be supported by creating another encoding, so any codec that WebCodecs can encode can support simulcast. Within WebCodecs, SVC is supported in VP8 (temporal only), H.264 (temporal only), VP9 (temporal at the moment with work on spatial scalability in progress) .
L1T3
because it illustrates the use of simulcast along with temporal scalability. Each simulcast stream is using a different resolution so there is no need to also employ spatial scalability - only temporal scalability is needed. For resources, you might consider looking at the AV1 bitstream and AV1 RTP Payload specifications. I'd also recommend articles such as these:
https://webrtchacks.com/chrome-vp9-svc/ https://webrtc.ventures/2021/04/webrtclive-new-generation-realtime-streaming/
The difference between simulcast and SVC can be seen in the dependency diagrams. A frame in a simulcast layer depends only on other frames in that same layer, whereas in SVC a higher layer can depend on a lower layer.
This must mean that all SxTy modes refer to simulcast streams not SVC ones with multiple spatial layers.
This must also mean that there is no possibility to send SVC stream with at least two spatial layers using one RTP stream (one SSRC) because:
Also, why can't we use S3T3 instead of L1T3 for the previous example? The only reason I can see is that VP8 RTP payload format doesn't support sending multiple encodings using one RTP stream.
P.S. Thank you for the links!
"This must mean that all SxTy modes refer to simulcast streams not SVC ones with multiple spatial layers."
[BA] SxTy means x simulcast streams, each of which have y temporal layers. You can see the difference between SxTy and LxTy in the dependency diagrams.
"This must also mean that there is no possibility to send SVC stream with at least two spatial layers using one RTP stream"
[BA] That is not correct. VP9 and AV1 support spatial scalability and their RTP payload specs only support sending spatial SVC layers on a single RTP stream.
"all LxTy scalability modes mean sending spatial layers using separate RTP streams (different SSRCs)"
[BA] VP9 and AV1 codecs do not support sending spatial scalability layers on multiple SSRCs. It is possible to send simulcast layers on multiple SSRCs but that is not the same thing.
In general, it is important to distinguish an encoder bitstream from how this is packaged within RTP. The codec is defined in the codec specification. RTP packaging is defined within the RTP payload specification.
This distinction carries over to Web APIs. Within WebCodecs, scalabilityMode can be used to configure an encoder. How that encoding is subsequently packaged is up to the application and transport APIs. The packetization and serialization can be done in Web Assembly and transport could be done over WebRTC Data Channel or WebTransport. So RTP is not the only potential transport available to Web applications.
"Also, why can't we use S3T3 instead of L1T3 for the previous example? The only reason I can see is that VP8 RTP payload format doesn't support sending multiple encodings using one RTP stream."
[BA] The VP8 codec does not support 'S' modes. So you cannot ask a VP8 encoder to generate a single simulcast bitstream. You could of course have multiple VP8 encoders, each configured differently, generating their own bitstreams.
BTW, I would recommend looking at the WebCodecs specification. Playing with WebCodecs (which also supports temporal scalability) may help clarify some of your questions. A video with pointers to sample code is here.
Thanks a lot! I will definitely study all resources you linked
Hi, I've read this draft and have a few questions. Could you please answer them or link some resources where I can read more?
Could you please explain it to me what you mean by writing
single stream simulcast
in Section 4.2 Negotiation? I always thought that simulcast means multiple encodings of the same stream sent as separate RTP streams. Doessingle stream simulcast
mean that we can send multiple encodings of the same stream using one RTP stream i.e. using one SSRC?In Section 6. Scalability modes, there is
But of course, you can send multiple spatial layers with VP8, each layer using separate RTP stream i.e. different SSRC so in my opinion the statement
For example, VP8 [RFC6386] only supports temporal scalability (e.g. "L1T2", "L1T3");
is incorrect.What is the difference between Simulcast and SVC? Is it possible that one codec supports both simulcast and SVC? For example, VP8 supports both simulcast (by multiple encodings of the same stream) and SVC (by temporal layers in each encoding)
Could you please explain it to me why this example uses
L1T3
instead ofL3T3