whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.18k stars 2.69k forks source link

Interoperable <video> underflow #6359

Open chcunningham opened 3 years ago

chcunningham commented 3 years ago

@hober @jernoble @eric-carlson @jyavenard @padenot @gregwhitworth @mounirlamouri @dalecurtis

Playback underflow occurs when a <video> does not have media data after the currentTime. The spec says to set readyState = HAVE_CURRENT_DATA and throw the 'waiting' event.

I'd like to start a discussion about the subtleties and differences between UAs. Hopefully we can add clarity to the spec and perhaps give authors some new knobs to control the behavior.

What qualifies as underflow

A <video> may fail to have media data at the currentTime for at least two reasons:

  1. it failed to download the data in time (network underflow)
  2. it failed to decode the data in time (decoder underflow)

My read of the readyState description suggests that either could trigger changing the readyState and firing the 'waiting' event ("Media elements have a ready state, which describes to what degree they are ready to be rendered at the current playback position"). I notice that the seek algorithm does explicitly wait for having "decoded enough data" to emit "seeked". Perhaps readyState transitions to HAVE_FUTURE_DATA and HAVE_ENOUGH_DATA are intended to work in the same way.

For better or worse, Chromium's behavior for the last ~5 years has been to consider both sources of as a trigger to change readyState. Sites have actually found this useful to understand the user experience (they can distinguish 1 from 2 by comparing currentTime to video.buffered).

Different UAs handle these cases differently

I know Chromium and Firefox differ in both cases. I can't easily check Safari (help appreciated).

For both network and decoder underflow, it often occurs that the video track will experience a transient underflow while the audio track continues to have media data. Rather than immediately stopping playback, users may prefer that audio continue to play while video is given a chance to catch up. Firefox and Chromium both do this, but the details differ.

Chromium will give the video track 3 seconds to rebuffer while audio continues to play. It does so for both decoder and network underflow.

In my observation, Firefox will immediately stop both tracks in the case of network underflow in the video track. For decoder underflow, I recall hearing that Firefox attempts to catch up by skipping to the next keyframe (giving a slide show effect if the decoder can't catch up).

Here is a demonstration of network underflow that works in Chromium and Firefox. The demo uses MSE, withholding parts of the video buffer to simulate a network hiccup. https://oxidized-tide-seeder.glitch.me/

kixelated commented 3 years ago

Hey Chris, thanks for bringing this up!

Twitch is planning to utilize Chromium's underflow behavior. For low-latency media, we've found that it's a better user experience to temporarily forgo video while audio continues uninterrupted. This gives the player more time to recover from network congestion rather than stalling playback and subsequently increasing latency.

The 3 second threshold works well for our use-case, but this behavior should be configurable and documented at the very least. Alternatively, there could be another mechanism to let the developer control media synchronization?

jernoble commented 3 years ago

There's a couple of weird things about this test. In Firefox & Safari, the video track seems to start at 0.2s, and the audio exactly at 0.0s. Chrome, meanwhile, seems to think the video track starts at 0.766s. All browsers think the first appended video segment ends at 3.1s (or 3.0999s for Chrome). Looking at the file itself, it has a an edit list which looks like it intends to shift the first sample back one frame-duration so both tracks start at 0s, but also has a sidx box indicating the first frame is available at 0.2s.

Firefox and Chrome allow the user to play through a 0.2s or 0.766s gap at the start of a video; both have a readyState of HAVE_ENOUGH_DATA. Safari treats this as an unbuffered range and stalls waiting for data to be appended, and has a readyState of HAVE_METADATA. I would argue Safari's behavior here is correct, since we literally do not have data for the current time available for decoding.

So that's something to keep in mind when interpreting results.

jernoble commented 3 years ago

Different UAs handle these cases differently

I know Chromium and Firefox differ in both cases. I can't easily check Safari (help appreciated).

Safari stops immediately upon encountering a buffered range gap. Here's what I get from the Underflow demo:

video.readyState HAVE_CURRENT_DATA
video.currentTime 3.093934117
videoSourceBuffer.buffered.end(0) 3.1
chcunningham commented 3 years ago

There's a couple of weird things about this test.

Apologies! I've since uploaded new files without edit lists and a matching start time of 0. The demo now works in Safari.

I filed an MSE spec issue to clarify expected behavior for the previous broken demo.

Safari stops immediately upon encountering a buffered range gap.

Thanks! What is Safari's behavior wrt decoder underflow?

Adding some player folks to give opinions on underflow behavior. @mwatson2 @gregwfreedman @joeyparrish @stevenrobertson

joeyparrish commented 3 years ago

Surprisingly, I have little opinion on the specifics of this.

Consistency would be nice, with reliable events that work the same across browsers. Shaka Player doesn't use waiting today, primarily because it was not reliable and consistent ~6 years ago and we never looked back.

So however we land on this spec-wise, it sounds like it will soon be a good time for Shaka Player to re-evaluate the landscape of underflow.

Incidentally, would it be helpful to see how this behavior looks on more non-desktop UAs? We could convert this demo into a JS test case we could run in the Shaka Player test lab and see what TVs, STBs, and game consoles do, too, if that's useful info.

jernoble commented 3 years ago

Safari stops immediately upon encountering a buffered range gap.

Thanks! What is Safari's behavior wrt decoder underflow?

The decoder will drop what it can in order to keep up with the media timeline, if dropping is allowed by the container. If it's a temporary hiccup/underrun, it will recover by "fast forwarding" & displaying decoded frames as quickly as it can to return to realtime. If, for example, playbackRate is set to a high value, and the decoder can't keep up, it'll switch to a keyframe-only mode.

Basically, the decoder's abilities will never affect the media timeline. We will make our best effort to play at the rate requested by the page, reporting dropped frames through the VideoPlaybackQuality API, and stall only when we literally do not have data available to decode.

chcunningham commented 3 years ago

@jernoble @padenot

I want to highlight the comments from Twitch above.

For low-latency media, we've found that it's a better user experience to temporarily forgo video while audio continues uninterrupted. This gives the player more time to recover from network congestion rather than stalling playback and subsequently increasing latency.

Do you find this persuasive? My first reaction is Chrome is weird to be the outlier here, but then it turns out to be pretty useful. I expect Safari and Firefox may not want change their default behavior, but would you be open to exposing a <video> or MSE knob to opt-in to this behavior?

jernoble commented 3 years ago

There is an existing "MSE v2" feature request to allow websites to customize the behavior of players when encountering an underflow/gap, tracked by: https://github.com/w3c/media-source/issues/160. I think it would be worth raising this request there just to make sure it's included in the use cases for that issue.

@chcunningham said:

I expect Safari and Firefox may not want change their default behavior, but would you be open to exposing a <video> or MSE knob to opt-in to this behavior?

I'd support adding this feature to MSE, but for Type 1 content in <video> elements, the sources are always (modulo HLS) going to be muxed anyway, and therefore it doesn't really seem useful there to allow audio playback to continue if video playback won't keep up. So I'd argue MSE is the best place to make this change.

chcunningham commented 3 years ago

That sounds good to me. @wolenetz FYI

kixelated commented 3 years ago

Twitch is going to start public experiments for Chrome viewers using this video underflow behavior. It lets us reduce the buffer size for lower latency with less downside, as video underflow is less invasive for the viewer compared to buffering. The reception internally has been great so far, and we would love to see this behavior added to MSE for other browsers too.