w3c / webrtc-svc

W3C Scalable Video Coding (SVC) Extension for WebRTC
https://w3c.github.io/webrtc-svc/
Other
39 stars 14 forks source link

[Question] Difference between SVC and Simulcast #60

Closed mickel8 closed 2 years ago

mickel8 commented 2 years ago

Hi, I've read this draft and have a few questions. Could you please answer them or link some resources where I can read more?

  1. Could you please explain it to me what you mean by writing single stream simulcast in Section 4.2 Negotiation? I always thought that simulcast means multiple encodings of the same stream sent as separate RTP streams. Does single stream simulcast mean that we can send multiple encodings of the same stream using one RTP stream i.e. using one SSRC?

  2. In Section 6. Scalability modes, there is

    For example, VP8 [RFC6386] only supports temporal scalability (e.g. "L1T2", "L1T3"); H.264/SVC [RFC6190],
    which supports both temporal and spatial scalability, only permits transport of simulcast on distinct SSRCs, 
    so that it does not support the "S" modes, where 
    multiple encodings are transported on a single RTP stream.

    But of course, you can send multiple spatial layers with VP8, each layer using separate RTP stream i.e. different SSRC so in my opinion the statement For example, VP8 [RFC6386] only supports temporal scalability (e.g. "L1T2", "L1T3"); is incorrect.

  3. What is the difference between Simulcast and SVC? Is it possible that one codec supports both simulcast and SVC? For example, VP8 supports both simulcast (by multiple encodings of the same stream) and SVC (by temporal layers in each encoding)

  4. Could you please explain it to me why this example uses L1T3 instead of L3T3

    pc.addTransceiver(stream.getVideoTracks()[0], {
      direction: 'sendonly',
      sendEncodings: [
        {rid: 'q', scaleResolutionDownBy: 4.0, scalabilityMode: 'L1T3'}
        {rid: 'h', scaleResolutionDownBy: 2.0, scalabilityMode: 'L1T3'},
        {rid: 'f', scalabilityMode: 'L1T3'},
      ]    
    });
aboba commented 2 years ago
  1. In AV1, it is possible for an encoder to generate multiple encodings within the same bitstream. AFAICT this capability is not supported in other codecs. Currently, the 'S' modes defined in AV1 are not supported within WebCodecs (or WebRTC).

  2. VP8 does not support spatial scalability, only simulcast. So the paragraph was trying to make clear that spatial modes such as L3T3 are not possible with VP8.

  3. The difference between simulcast and SVC can be seen in the dependency diagrams. A frame in a simulcast layer depends only on other frames in that same layer, whereas in SVC a higher layer can depend on a lower layer.

In WebCodecs, simulcast can be supported by creating another encoding, so any codec that WebCodecs can encode can support simulcast. Within WebCodecs, SVC is supported in VP8 (temporal only), H.264 (temporal only), VP9 (temporal at the moment with work on spatial scalability in progress) .

  1. The example uses L1T3 because it illustrates the use of simulcast along with temporal scalability. Each simulcast stream is using a different resolution so there is no need to also employ spatial scalability - only temporal scalability is needed.

For resources, you might consider looking at the AV1 bitstream and AV1 RTP Payload specifications. I'd also recommend articles such as these:

https://webrtchacks.com/chrome-vp9-svc/ https://webrtc.ventures/2021/04/webrtclive-new-generation-realtime-streaming/

mickel8 commented 2 years ago

The difference between simulcast and SVC can be seen in the dependency diagrams. A frame in a simulcast layer depends only on other frames in that same layer, whereas in SVC a higher layer can depend on a lower layer.

This must mean that all SxTy modes refer to simulcast streams not SVC ones with multiple spatial layers.

This must also mean that there is no possibility to send SVC stream with at least two spatial layers using one RTP stream (one SSRC) because:

Also, why can't we use S3T3 instead of L1T3 for the previous example? The only reason I can see is that VP8 RTP payload format doesn't support sending multiple encodings using one RTP stream.

P.S. Thank you for the links!

aboba commented 2 years ago

"This must mean that all SxTy modes refer to simulcast streams not SVC ones with multiple spatial layers."

[BA] SxTy means x simulcast streams, each of which have y temporal layers. You can see the difference between SxTy and LxTy in the dependency diagrams.

"This must also mean that there is no possibility to send SVC stream with at least two spatial layers using one RTP stream"

[BA] That is not correct. VP9 and AV1 support spatial scalability and their RTP payload specs only support sending spatial SVC layers on a single RTP stream.

"all LxTy scalability modes mean sending spatial layers using separate RTP streams (different SSRCs)"

[BA] VP9 and AV1 codecs do not support sending spatial scalability layers on multiple SSRCs. It is possible to send simulcast layers on multiple SSRCs but that is not the same thing.

In general, it is important to distinguish an encoder bitstream from how this is packaged within RTP. The codec is defined in the codec specification. RTP packaging is defined within the RTP payload specification.

This distinction carries over to Web APIs. Within WebCodecs, scalabilityMode can be used to configure an encoder. How that encoding is subsequently packaged is up to the application and transport APIs. The packetization and serialization can be done in Web Assembly and transport could be done over WebRTC Data Channel or WebTransport. So RTP is not the only potential transport available to Web applications.

"Also, why can't we use S3T3 instead of L1T3 for the previous example? The only reason I can see is that VP8 RTP payload format doesn't support sending multiple encodings using one RTP stream."

[BA] The VP8 codec does not support 'S' modes. So you cannot ask a VP8 encoder to generate a single simulcast bitstream. You could of course have multiple VP8 encoders, each configured differently, generating their own bitstreams.

BTW, I would recommend looking at the WebCodecs specification. Playing with WebCodecs (which also supports temporal scalability) may help clarify some of your questions. A video with pointers to sample code is here.

mickel8 commented 2 years ago

Thanks a lot! I will definitely study all resources you linked