video-dev / media-ui-extensions

Extending the HTMLVideoElement API to support advanced player user-interface features
MIT License
29 stars 8 forks source link

Playback quality selection #1

Open heff opened 2 years ago

heff commented 2 years ago

Allow a user to select from a set of video quality levels/resolutions/renditions/bitrates/variants/representations.

Was hoping to start this with a PR, but some research and discussion will be helpful first.

Related conversation: https://github.com/whatwg/html/issues/562 from @dmlap

The proposed extension to VideoTrack seems promising.

partial interface VideoTrack {
  sequence<string> getAvailableRenditions();
  // promise resolves when change has taken effect
  Promise<void> setPreferredRendition(string rendition);
};

Something to solve for is "auto".

Ping @gkatsev @littlespex

gkatsev commented 2 years ago

Also wanted to mention https://github.com/videojs/videojs-contrib-quality-levels which we wrote with making it be updated to a spec in mind.

littlespex commented 2 years ago

Amending the VideoTrack API seems promising as it allows for each track to have multiple renditions. The two new functions would allow for a basic selection menu.

With the wide variety of bitrate ladders out in the wild, having getAvailableRenditions return a list of strings may be limiting. It may be necessary to return a Rendition object similar to videojs-contrib-quality-levels so that dimensions, bitrates and codecs can be used to generate the list of menu items.

For more advanced use cases, it may also be necessary to dispatch events so that changes made to the renditions list can be reflected in the UI:

One possible alternative, though certainly not as simple, would be something similar to the existing text/audio/video track APIs:

partial interface VideoTrack {
  readonly attribute VideoRenditionList renditions;
};

interface VideoRenditionList : EventTarget {
  readonly attribute unsigned long length;
  getter VideoRendition (unsigned long index);
  VideoRendition? getRenditionById(DOMString id);
  readonly attribute long selectedIndex;

  attribute EventHandler onchange;
  attribute EventHandler onaddrendition;
  attribute EventHandler onremoverendition;
};

interface VideoRendition {
  readonly attribute DOMString id;
  readonly attribute unsigned long width;
  readonly attribute unsigned long height;
  readonly attribute unsigned long bitrate;
  readonly attribute unsigned long codec;
  attribute boolean selected;
};
heff commented 2 years ago

We're stepping into this with media-chrome, and it looks like @luwes has already done work on a version.

How should mixed audio/video renditions (.ts HLS) be handled in an API like this? Should the assumption be if there's no audio renditions then there's only mixed media renditions? Or should the rendition list not be media type specific, with a rendition type field that can be video/audio/mixed. I think I remember a proposal from @wilaw somewhere with those options.

luwes commented 2 years ago

Yes, good food for thought. @cjpillsbury brought this also up when we discussed my draft implementation.

Maybe it'd be easier to not have to patch the Video/AudioTrack apis for browsers other than Safari.

would be more like

partial interface HTMLMediaElement {
  readonly attribute RenditionList renditions;
}

interface RenditionList : EventTarget {
  readonly attribute unsigned long length;
  getter Rendition (unsigned long index);
  Rendition? getRenditionById(DOMString id);
  readonly attribute long selectedIndex;

  attribute EventHandler onchange;
  attribute EventHandler onaddrendition;
  attribute EventHandler onremoverendition;
};

interface Rendition {
  readonly attribute DOMString trackId;
  readonly attribute video | audio | mixed type; 

  readonly attribute DOMString id;
  readonly attribute unsigned long width;
  readonly attribute unsigned long height;
  readonly attribute unsigned long bitrate;
  readonly attribute unsigned long codec;
  attribute boolean selected;
};
heff commented 2 years ago

The multiple video tracks use case is one to consider here. On one hand, it means you'll still end up identifying which video track a rendition belongs to. On the other hand, I question how much we can rely on the native VideoTracks list to actually represent the multiple video tracks in an adaptive manifest. If it doesn't, then that makes it more complicated to extend the native VideoTracks in the (maybe rare) use case of multiple video tracks with multiple renditions each. Anybody have experience with that or want to test it?

cjpillsbury commented 2 years ago

There is poor (none that I know of) support of "alternate video" in native playback for browser/browser-like envs (and players generally). However, there is decent support for "alternate audio".

gkatsev commented 2 years ago

I think there are two things to consider here:

  1. what's the easiest and best API to have for something like media-chrome
  2. what's the best API that's we can propose to the w3c/whatwg to get it into the standards.

For an API we can use today, not having to extend Audio/Video Tracks is definitely nice, but I think such an API is less likely to get accepted into the relevant specs. In addition, I don't think it really matters if a rendition is muxed content. From a user's perspective, it doesn't matter if the audio is available in the same segment as the video or if it was downloaded from a separate segment.

I think that adding a RenditionList to Audio and Video Tracks, similar to what @littlespex prposed above, is better than a combined RenditionList. In the majority case, since alternative video tracks aren't very common, you'd end up with a single Video Track, which has the specified renditions on it. Additionally, you'd have one or more AudioTracks, potentially with their own renditions. In the case of muxed content, you'd have a track show up under both AudioTrack and VideoTrack. This how Safari currently implements things, where for media, including mp4, you get video.videoTracks[0] pointing at the video portion and video.audioTracks[0] point at the audio portion. You can then separately turn off audio and video with video.audioTracks[0].enabled = false and video.videoTracks[0].selected = false. The way AudioTracks and VideoTracks are defined is that you could theoretically enable multiple audio tracks at the same time, but not multiple video tracks. This is why videojs-contrib-quality-levels uses enabled on the renditions list, so that you could have multiple enabled, rather than only selecting one.

The tricky part of a renditions API is likely supporting everything that DASH allows. DASH is tricky here because you can have different renditions per period and potentially different audio tracks per video track. Maybe a non-goal would be to not support all permutations that DASH allows. HLS is simpler because it doesn't allow you multiple renditions per audio track.

cjpillsbury commented 2 years ago

@gkatsev

HLS is simpler because it doesn't allow you multiple renditions per audio track.

I'd be careful here. Folks definitely use EXT-X-MEDIA:TYPE=AUDIO to provide multiple encodings/"renditions" of "the same" audio content, and Safari will represent them as a single AudioTrack. For example, Apple's official test stream https://devstreaming-cdn.apple.com/videos/streaming/examples/bipbop_adv_example_hevc/master.m3u8 includes:

#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="a1",NAME="English",LANGUAGE="en-US",AUTOSELECT=YES,DEFAULT=YES,CHANNELS="2",URI="a1/prog_index.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="a2",NAME="English",LANGUAGE="en-US",AUTOSELECT=YES,DEFAULT=YES,CHANNELS="6",URI="a2/prog_index.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="a3",NAME="English",LANGUAGE="en-US",AUTOSELECT=YES,DEFAULT=YES,CHANNELS="6",URI="a3/prog_index.m3u8"

(note the shared NAME and LANGUAGE but the differences in e.g. CHANNEL (and also the encoded content itself)

and when playing in Safari, you'll get:

Screen Shot 2022-09-01 at 8 20 50 AM

(aka a single AudioTrack).

Additionally, no "Languages" control menu is added to the controls, since there is only one "track".

Compare to this example https://storage.googleapis.com/shaka-demo-assets/angel-one-hls/hls.m3u8 which includes:

#EXT-X-MEDIA:TYPE=AUDIO,URI="playlist_a-eng-0128k-aac-2c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="en",NAME="stream_5",DEFAULT=YES,AUTOSELECT=YES,CHANNELS="2"
#EXT-X-MEDIA:TYPE=AUDIO,URI="playlist_a-deu-0128k-aac-2c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="de",NAME="stream_4",AUTOSELECT=YES,CHANNELS="2"
#EXT-X-MEDIA:TYPE=AUDIO,URI="playlist_a-ita-0128k-aac-2c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="it",NAME="stream_8",AUTOSELECT=YES,CHANNELS="2"
#EXT-X-MEDIA:TYPE=AUDIO,URI="playlist_a-fra-0128k-aac-2c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="fr",NAME="stream_7",AUTOSELECT=YES,CHANNELS="2"
#EXT-X-MEDIA:TYPE=AUDIO,URI="playlist_a-spa-0128k-aac-2c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="es",NAME="stream_9",AUTOSELECT=YES,CHANNELS="2"
#EXT-X-MEDIA:TYPE=AUDIO,URI="playlist_a-eng-0384k-aac-6c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="en",NAME="stream_6",CHANNELS="6"

Note that they all share the same GROUP-ID but vary with e.g. LANGUAGE and NAME (though there are two en playlists, which still have different NAMEs). Here's what you get when playing in Safari:

Screen Shot 2022-09-01 at 8 29 29 AM

And here's what shows up in the automatically added "Language" control menu:

Screen Shot 2022-09-01 at 8 49 48 AM

Finally, here's what happens when I create a local version of the multivariant playlist where the two english EXT-X-MEDIA playlists share the same NAME. playlist tags:

#EXT-X-MEDIA:TYPE=AUDIO,URI="https://storage.googleapis.com/shaka-demo-assets/angel-one-hls/playlist_a-eng-0128k-aac-2c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="en",NAME="English",DEFAULT=YES,AUTOSELECT=YES,CHANNELS="2"
#EXT-X-MEDIA:TYPE=AUDIO,URI="https://storage.googleapis.com/shaka-demo-assets/angel-one-hls/playlist_a-deu-0128k-aac-2c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="de",NAME="German",AUTOSELECT=YES,CHANNELS="2"
#EXT-X-MEDIA:TYPE=AUDIO,URI="https://storage.googleapis.com/shaka-demo-assets/angel-one-hls/playlist_a-ita-0128k-aac-2c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="it",NAME="Italian",AUTOSELECT=YES,CHANNELS="2"
#EXT-X-MEDIA:TYPE=AUDIO,URI="https://storage.googleapis.com/shaka-demo-assets/angel-one-hls/playlist_a-fra-0128k-aac-2c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="fr",NAME="French",AUTOSELECT=YES,CHANNELS="2"
#EXT-X-MEDIA:TYPE=AUDIO,URI="https://storage.googleapis.com/shaka-demo-assets/angel-one-hls/playlist_a-spa-0128k-aac-2c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="es",NAME="Spanish",AUTOSELECT=YES,CHANNELS="2"
#EXT-X-MEDIA:TYPE=AUDIO,URI="https://storage.googleapis.com/shaka-demo-assets/angel-one-hls/playlist_a-eng-0384k-aac-6c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="en",NAME="English",CHANNELS="6"

Safari's audioTracks:

Screen Shot 2022-09-01 at 8 53 11 AM

(note there's only one en AudioTrack now)

"Languages" control menu:

Screen Shot 2022-09-01 at 8 54 45 AM

All this is to say that Safari will treat multiple audio playlists as different tracks or as the same track depending on details in their attributes

gkatsev commented 2 years ago

I'd be careful here. Folks definitely use EXT-X-MEDIA:TYPE=AUDIO to provide multiple encodings/"renditions" of "the same" audio content, and Safari will represent them as a single AudioTrack. For example, Apple's official test stream https://devstreaming-cdn.apple.com/videos/streaming/examples/bipbop_adv_example_hevc/master.m3u8 includes:

However, this will still only match a specific audio track to a specific set of video renditions. The audio renditions won't be switching independently of the video renditions here, which is specifically what I was calling out, maybe it wasn't clear enough.

I tested locally and as far as I can tell, Safari is ignoring the second English track (I just named all the tracks English).

cjpillsbury commented 2 years ago

"ignoring" may be wrong here. iirc AVFoundation/AVPlayer (which Safari HLS playback is built on top of) will do some filtering based on support (6 channels being relevant here) but will also use ABR switching, similar to video playlists, for multiple audio playlists with "similar relevant features". It just isn't exposed in the browser.

gkatsev commented 2 years ago

Yeah, maybe it selects one from the available options and sticks with it. Either way, it seems simplified compared to what you can do in DASH.

gkatsev commented 2 years ago

If it does ABR the audio renditions, I couldn't get it to happen. But maybe my test wasn't great.

cjpillsbury commented 2 years ago

We don't have a good test stream. We'd want all the same container format & codec & channels with matching names & languages but notably different bitrates (including a stupidly large bitrate). We'd also likely want only one EXT-X-STREAM-INF to avoid the dance of video ABR switching vs. (potential) audio ABR switching.

cjpillsbury commented 2 years ago

Or maybe someone with more knowledge of how this works under the hood will chime in 🤞