Open xingri opened 11 months ago
It says This helps the application recover faster from lossy network conditions.
. I think it would be more correct to state something like This helps the video rendering to recover faster in lossy network conditions.
. OTOH, this sentence is more of an explanation for why the requirement makes sense, and should perhaps not be part of the requirement itself?
It says
This helps the application recover faster from lossy network conditions.
. I think it would be more correct to state something likeThis helps the video rendering to recover faster in lossy network conditions.
. OTOH, this sentence is more of an explanation for why the requirement makes sense, and should perhaps not be part of the requirement itself?
Thanks @stefhak for the feedback. Will update the requirement shortly based on your advice.
Checking: Below this table, there's the statement "Experience: Microsoft's Xbox Cloud Gaming and NVIDIA's GeForce NOW are examples of this use case, with media transported using RTP or RTCDataChannel."
For which codecs and which platforms do we have experience with video loss recovery without a keyframe that has proved to be helpful?
(Not as a suggestion to add this to the PR, but because I wonder how well this works in practice)
Checking: Below this table, there's the statement "Experience: Microsoft's Xbox Cloud Gaming and NVIDIA's GeForce NOW are examples of this use case, with media transported using RTP or RTCDataChannel."
For which codecs and which platforms do we have experience with video loss recovery without a keyframe that has proved to be helpful?
(Not as a suggestion to add this to the PR, but because I wonder how well this works in practice)
@alvestrand For the evaluation, we are looking at the H.264 codec but did not specify the platform yet.
It says
This helps the application recover faster from lossy network conditions.
. I think it would be more correct to state something likeThis helps the video rendering to recover faster in lossy network conditions.
. OTOH, this sentence is more of an explanation for why the requirement makes sense, and should perhaps not be part of the requirement itself?
@stefhak Removed the sentence per your feedback.
For which codecs and which platforms do we have experience with video loss recovery without a keyframe that has proved to be helpful?
(Not as a suggestion to add this to the PR, but because I wonder how well this works in practice)
@alvestrand I would like to share a little more contexts for you and the other WebRTC working group members.
FYI, the NVIDIA GeForce Now has two different implementations. One is based on native with custom protocol between client and server. And another one is through the WebRTC protocol over the different browser implementations.
We have already evaluated the feasibility of the continuous decoding on the various native platforms through custom protocol including Windows and Mac using AV1, HEVC and H.264.
Now we are working on supporting this through the WebRTC protocol and that's why we proposed this requirement.
And as we discussed previously, we figured out the Dependency Descriptor could be a communication protocol by the discussion https://bugs.chromium.org/p/webrtc/issues/detail?id=15192.
Now we want to have the consensus of this efforts to be applicable on browsers in near future.
@alvestrand We have some experience with H.264 LTR. If a base layer P-frame can't be repaired via NACK or FEC, LTR can be better than sending a key-frame. The simplest use case is 1-1, such as in gaming where there is only a single participant. For H.265 the RPSI provides an indication that the sender has received and decoded the LTR. For other codecs (e.g. VP8, VP9), RPSI is used as a positive acknowledgement.
However, the conferencing use case is more complicated. If there are a lot of new joiners (e.g. at the beginning of the meeting), repairing one participant can result in sending undecodable P-frames to new joiners who weren't present to receive the LTR. This would be worse than sending a keyframe, because it multiplies the number of participants experiencing a video quality problem. If you started off with 1 participant experiencing loss, and send a P-frame based on an LTR that 3 new joiners didn't receive/decode, now you have 3 participants with a problem! Not good.
So the SFU needs to figure out if participants can decode the new P-frame before engaging the LTR repair mechanism by forwarding an RPSI to the encoder. Because of the potential for multiplying video quality issues, the SFU needs to be extra careful. If the SFU does forward an RPSI to the encoder, the Dependency Descriptor (DD) on the new P-frame helps the SFU figure out whether the new P-frame's dependencies and chains have been sent to a participant. But just because dependencies and chains were sent by the SFU does not imply that they were received and decoded. At the beginning of a meeting where lots of people are joining, lots of keyframes are being sent and loss can be high. So you can't assume that an LTR that is sent is always received and decoded.
The LNTF RTCP feedback message provides information beyond what the SFU can glean from the DD - the last sequence no. received/decoded. This indicates whether current participants have received and decoded the LTR. However, if the new joiner rate is high, by time the new P-frame based on the LTR arrives at the SFU, there will be new joiners who won't have the LTR. So the SFU can decide not to use LTR for repair even if the LNTF info indicates that current participants will be able to decode the new P-frame.
RFC 8834 Section 5.1.4 recommends that WebRTC implementations support RPSI, but in retrospect this seems simplistic. Implementing RPSI is just the start - WebRTC implementations will also need LNTF or an equivalent RTCP feedback mechanisms, and of course the SFU will need to be modified to take the LNTF info, DD and the new join rate into account. And all this work will only help with codecs that support LTR (H.264, H.265, VVC). The AV1 RTP Payload format doesn't support RPSI.
@aboba thanks for describing the complexities involved. I a bit slow to rev up, but I can't find the section in RFC7742 that says "WebRTC implementations SHOULD support RPSI" - could you provide a pointer?
Also, in the case of a game server -> (a single) browser over WebRTC, is it correct to understand that what is now missing is only some way to reference specific frames (hence the discussion of using DD for this purpose)?
@stefhak Sorry, the reference is RFC 8834 Section 5.1.4 (corrected).
For gaming, support for RPSI and DD should be sufficient. The RPSI indicates the LTR to use to generate the new P-frame. DD will provide the dependencies and chains of the generated P-frame. The DD info might be useful in other situations, but for the purposes of LTR-based recovery in a single person conference, DD doesn't provide additional useful information beyond what is in the RPSI. If there is only a single participant (the one that sent the RPSI), there will only be a single dependency (the LTR) and the RPSI tells the server that the participant has received and decoded the LTR. So the RPSI provides all the info needed to forward the newly generated P-frame. However, in other situations (such as where the gaming server encodes SVC), DD could help the SFU decide whether to forward frames after a loss.
Since the 1-1 case is so much simpler, we have seen RPSI implementations that only support that use case. However, for a general purpose implementation like libwebrtc, it is hard to argue for a simple implementation approach that will work in 1-1 but could degrade video quality in conferencing.
@aboba thanks a lot for the elaboration.
To some extent I agree to that implementations should support the generic case. But looking at it from another angle: there is one identified case where a "simple implementation" is believed to work well. I think it would be really good if this could be supported to allow more experimentation, and then perhaps be re-done in a more generic way (if proven to add value). But the discussion on implementation in libwebrtc probably belongs in another venue.
Taking a step back, the requirement as such makes sense to me. How the API is defined, and how it is implemented is a later discussion IMHO.
@alvestrand & @aboba Could we think we reached the consensus for this requirement? If so, could we merge this requirement? Otherwise, we would like to discuss this on upcoming working group call on Jan-16.
@aboba Thanks again for allowing us to present this PR during the WG meeting today. Also thanks for sharing the feedback during today's call. I am more than happy to update this PR by your feedback during the call but would you mind sharing your feedback once again on this PR? Then it will be more helpful for me to get the same consensus as you shared today.
It seems like the essence of the requirement is to add LTR (or reference control) to the set of robustness technologies already supported: NACK/RTX, RED, FEC, SVC. As Erik Sprang noted at TPAC 2023, this requires major changes to the WebRTC (and possibly the WebCodecs) encoder APIs. See: https://docs.google.com/presentation/d/1FpCAlxvRuC0e52JrthMkx-ILklB5eHszbk8D3FIuSZ0/edit#slide=id.g2397bc70323_0_0
The actual desire for an LTR could be communicated without a new API surface (e.g. sent over data channel between the peers). But if there coincidently a desire to control existing robustness mechanisms (e.g. custom NACK/RTX), here is a potential approach (still early in its development): https://github.com/w3c/webrtc-rtptransport
The way I see it is: a) We have an agreed sub-use case "Game streaming" in the "Low latency streaming" use case of the document "WebRTC Extended Use Cases" b) Current providers of game streaming that are using WebRTC have determined that continuing to decode after a frame loss (event though no key frame has arrived) improves the gamer's experience So to me it makes total sense to, as it can be derived from this agreed (sub-)use case, to add a requirement phrased something like "The application must be able to control video decoding to continue even after a frame-loss without waiting for a key frame.".
Exactly how to meet that requirement is a later question. Perhaps the approach pointed out by @aboba in https://github.com/w3c/webrtc-nv-use-cases/pull/129#issuecomment-2021621725 is a good one, and in one previous meeting it was pointed out that modifying libwebrtc in to allow decoding to continue is pretty simple, but as said how to meet the requirement is a later question IMHO.
This issue was discussed in WebRTC March 26 2024 meeting – 26 March 2024 (PR #129 video decoding recovery after packet loss)
@aboba could you please review the updated message for this requirement?
The current formulation works for me. I would change "application" for "user-agent", but that is just a detail.
Partial fixes for #103
Preview | Diff