Closed fippo closed 1 year ago
I would say that we already have a similar problem if we configure the rtp sender to use simulcast but then after negotiation the receiver is not able to do that. The application would have to query the updated parameters to be aware of the fact.
In that regard, scalability mode would work the same. You would set your svc mode based on your preferred codec, but you would have to query after the remote description is set to check if that codec is used and if the scalability mode is actually supported.
Also, I think there are two different scenarios:
Regarding the question about simulcast and svc, just minor clarification, I assume you are meaning simulcast and spatial svc, as simulcast and spatial svc would not have any issue to be supported.
On the case of simulcast and spatial svc, I think that the proposed solution was not to allow both to be set simultaneously.
For example,
In the first case, L3T3 is set, snd the negotiated codecs are vp8,vp9 (in that order). Should vp9 be chosen over vp8?
In the second case, L1T2 is set, but only h264 is negotiated (which doesn't support it in current implementation).
There is even a third case, when the hardware encoder does not support the scalability mode, but the software one does. Should the scalability mode force the usage of the software encoder over the hardware one in this case?
I would say that we already have a similar problem if we configure the rtp sender to use simulcast but then after negotiation the receiver is not able to do that. The application would have to query the updated parameters to be aware of the fact.
True. That is probably worth pointing out in webrtc-pc (and should be asserted with a test) But you don't need to do this if the UA is granted some flexibility (while allowing enough control)
Also, I think there are two different scenarios: [...] Agree. I also assume that client and server announce and negotiate scalability modes out of band (coming back to the above, why check after sdp negotiation if you can ignore it and set what is appropriate)
There is a third scenario: if you do replaceTrack and replace camera with screenshare the libwebrtc implementation will behave very different. If the UA is constrained to a single encoding via scalabilityMode should there be an error?
True. That is probably worth pointing out in webrtc-pc (and should be asserted with a test)
Agree, this should be done.
But you don't need to do this if the UA is granted some flexibility (while allowing enough control)
I am worried that the algorithm to choose the best scalability mode applicable is not deterministic enough or causes unnecessary complexity to the implementations.
Also, in the SVC case, unlike simulcast, you can dynamically change it. So it the apps is concerned about SVC modes being supported or not after the renegotiationm the app could start without it enabled and set it after the remote description is set. That way it could check what is the codec choose, gather the scalability modes supported for that codec implementation (to differentiate the hw from sw implementation, we might need more work around this) and set a valid one (even getting a promise rejection if the mode is not supported).
Agree. I also assume that client and server announce and negotiate scalability modes out of band (coming back to the above, why check after sdp negotiation if you can ignore it and set what is appropriate)
Agree, the typical use case would put the correct values and not worry about failure.
There is a third scenario: if you do replaceTrack and replace camera with screenshare the libwebrtc implementation will behave very different. If the UA is constrained to a single encoding via scalabilityMode should there be an error?
Isn't this an implementation issue of libwebrtc and not a spec one? I mean, number of encodings and scalabilityMode do not change if you do a replace track.
Fippo said:
"Lets assume I set L1T3 in addTransceiver.
[BA] The spec envisages that the application will need to use getParameters()
to determine whether the requested scalabilityMode
was set. For example, in case 3, getParameters()
would reveal that "L3T3" was not set.
Exactly how this should be represented may potentially be impacted by https://github.com/w3c/webrtc-svc/issues/48
For example, would getParameters()
return no scalabilityMode
member, or should it return scalabilityMode: L1T1
?
@fippo Let me try to explain how the negotiation works (or at least what might be involved). Through discussions with Danil and Chris I've come to appreciate that it is somewhat more complex than I had initially thought.
In a conferencing scenario (where SVC is most likely to be of use), the SFM will typically be the Offerer and the browser the Answerer. In that situation, the SFM will Offer what codecs it can send or receive and the browser will Answer.
To figure out what codecs/modes the browser can send to the SFM, the browser can compute the intersection of RTCRtpSender.getCapabilities("video") and the SFM's simulated RTCRtpReceiver.getCapabilities("video"). Note that for a mode to be receivable by the SFM, the browser and SFM need to jointly support an RTP header extension that makes this possible (e.g. the browser can't send "L2T2_KEY_SHIFT" without negotiating the AV1 Dependency Descriptor).
The codecs/mode that the SFM can send to the browser is determined by the RTP header extensions that they jointly support, as well as by whether the browser supports referenceScaling
for the hardware/software codec. For example, an AV1 hardware decoder might not support referenceScaling
in which case the browser won't be able to decode spatial scalability modes (but would still be able to decode temporal modes). So in that case (where the browser required hardware acceleration to obtain the required performance) the SFM couldn't send spatial scalability to the browser even if DD was negotiated.
For reference, Chromium currently supports the following scalabilityMode values for AV1: "L1T2", "L1T3", "L2T1", "L2T1h", "L2T1_KEY", "L2T2", "L2T2_KEY", "L2T2_KEY_SHIFT", "L3T1", "L3T3", "L3T3_KEY", "S2T1"
For VP8/VP9 ("profile-id=0" only), the following values are advertised: "L1T2", "L1T3"
Some implications of this:
There are situations in which the codec used for sending may differ from the codec used for receiving (e.g. AV1 decoding is further along than AV1 encoding).
If the browser wants to send modes that are only supported in AV1 (e.g. "S2T1" or "L3T3_KEY") then it needs to prefer AV1 in the Offer and Answer.
We should wait until we have some experience with the code around this so I don't think this is useful to spend time on at TPAC. Maybe in practice nobody is going to try mixed VP8 / VP9 on a single m-line anyway but using separate transceivers + setCodecPreferences anyway
I have a codec selection API proposal over at https://github.com/w3c/webrtc-extensions/issues/126. If we go with that, I think all of this becomes a non-issue since you can decide both codec, scalabilityMode and active in a single API call, which means you don't need (underspecified!) fallback values to save you
We still need a fallback value for the default case, but we don't need to make the API more advance for other than default picking
I am going to close this issue, since the issues are largely addressed by https://github.com/w3c/webrtc-extensions/issues/126
I had some questions regarding negotiation. Lets assume I set L1T3 in addTransceiver. Assuming I set L1T3 and VP8 gets negotiated: yay Assuming I set L1T3 and VP9 gets negotiated: uhm? Assuming I set L3T3 and VP8 gets negotiated: uhm?
It does (generally) not make sense for VP9 to do L1T3 (there are use-cases for it) -- not touching renegotiation even. Would it make more sense to specify a list of scalability modes instead of a single one? For example that list could be
In the case VP9 is negotiated this would pick L3T3. If VP8 gets negotiated this would determine L3T3 does not make sense and fallback to L1T3 (three temporal layers which as we all know users know and like)
(related, how does a sender say 'I could do simulcast or svc, pick your poison?)