Open heff opened 3 years ago
Also wanted to mention https://github.com/videojs/videojs-contrib-quality-levels which we wrote with making it be updated to a spec in mind.
Amending the VideoTrack
API seems promising as it allows for each track to have multiple renditions. The two new functions would allow for a basic selection menu.
With the wide variety of bitrate ladders out in the wild, having getAvailableRenditions
return a list of strings may be limiting. It may be necessary to return a Rendition
object similar to videojs-contrib-quality-levels so that dimensions, bitrates and codecs can be used to generate the list of menu items.
For more advanced use cases, it may also be necessary to dispatch events so that changes made to the renditions list can be reflected in the UI:
change
event to catch changes made by things like the streaming library's ABR algorithm. For example, some menus have Auto
checked, and a separate indicator showing which rendition is currently being rendered.add/remove
events. Multi-period DASH manifests are allowed to have different numbers of Representations
per Period
. One possible alternative, though certainly not as simple, would be something similar to the existing text/audio/video track APIs:
partial interface VideoTrack {
readonly attribute VideoRenditionList renditions;
};
interface VideoRenditionList : EventTarget {
readonly attribute unsigned long length;
getter VideoRendition (unsigned long index);
VideoRendition? getRenditionById(DOMString id);
readonly attribute long selectedIndex;
attribute EventHandler onchange;
attribute EventHandler onaddrendition;
attribute EventHandler onremoverendition;
};
interface VideoRendition {
readonly attribute DOMString id;
readonly attribute unsigned long width;
readonly attribute unsigned long height;
readonly attribute unsigned long bitrate;
readonly attribute unsigned long codec;
attribute boolean selected;
};
We're stepping into this with media-chrome, and it looks like @luwes has already done work on a version.
How should mixed audio/video renditions (.ts HLS) be handled in an API like this? Should the assumption be if there's no audio renditions then there's only mixed media renditions? Or should the rendition list not be media type specific, with a rendition type field that can be video/audio/mixed. I think I remember a proposal from @wilaw somewhere with those options.
Yes, good food for thought. @cjpillsbury brought this also up when we discussed my draft implementation.
Maybe it'd be easier to not have to patch the Video/AudioTrack
apis for browsers other than Safari.
would be more like
partial interface HTMLMediaElement {
readonly attribute RenditionList renditions;
}
interface RenditionList : EventTarget {
readonly attribute unsigned long length;
getter Rendition (unsigned long index);
Rendition? getRenditionById(DOMString id);
readonly attribute long selectedIndex;
attribute EventHandler onchange;
attribute EventHandler onaddrendition;
attribute EventHandler onremoverendition;
};
interface Rendition {
readonly attribute DOMString trackId;
readonly attribute video | audio | mixed type;
readonly attribute DOMString id;
readonly attribute unsigned long width;
readonly attribute unsigned long height;
readonly attribute unsigned long bitrate;
readonly attribute unsigned long codec;
attribute boolean selected;
};
The multiple video tracks use case is one to consider here. On one hand, it means you'll still end up identifying which video track a rendition belongs to. On the other hand, I question how much we can rely on the native VideoTracks list to actually represent the multiple video tracks in an adaptive manifest. If it doesn't, then that makes it more complicated to extend the native VideoTracks in the (maybe rare) use case of multiple video tracks with multiple renditions each. Anybody have experience with that or want to test it?
There is poor (none that I know of) support of "alternate video" in native playback for browser/browser-like envs (and players generally). However, there is decent support for "alternate audio".
I think there are two things to consider here:
For an API we can use today, not having to extend Audio/Video Tracks is definitely nice, but I think such an API is less likely to get accepted into the relevant specs. In addition, I don't think it really matters if a rendition is muxed content. From a user's perspective, it doesn't matter if the audio is available in the same segment as the video or if it was downloaded from a separate segment.
I think that adding a RenditionList to Audio and Video Tracks, similar to what @littlespex prposed above, is better than a combined RenditionList. In the majority case, since alternative video tracks aren't very common, you'd end up with a single Video Track, which has the specified renditions
on it. Additionally, you'd have one or more AudioTracks, potentially with their own renditions. In the case of muxed content, you'd have a track show up under both AudioTrack and VideoTrack.
This how Safari currently implements things, where for media, including mp4, you get video.videoTracks[0]
pointing at the video portion and video.audioTracks[0]
point at the audio portion.
You can then separately turn off audio and video with video.audioTracks[0].enabled = false
and video.videoTracks[0].selected = false
.
The way AudioTracks and VideoTracks are defined is that you could theoretically enable multiple audio tracks at the same time, but not multiple video tracks. This is why videojs-contrib-quality-levels uses enabled
on the renditions list, so that you could have multiple enabled, rather than only selecting one.
The tricky part of a renditions API is likely supporting everything that DASH allows. DASH is tricky here because you can have different renditions per period and potentially different audio tracks per video track. Maybe a non-goal would be to not support all permutations that DASH allows. HLS is simpler because it doesn't allow you multiple renditions per audio track.
@gkatsev
HLS is simpler because it doesn't allow you multiple renditions per audio track.
I'd be careful here. Folks definitely use EXT-X-MEDIA:TYPE=AUDIO
to provide multiple encodings/"renditions" of "the same" audio content, and Safari will represent them as a single AudioTrack
. For example, Apple's official test stream https://devstreaming-cdn.apple.com/videos/streaming/examples/bipbop_adv_example_hevc/master.m3u8 includes:
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="a1",NAME="English",LANGUAGE="en-US",AUTOSELECT=YES,DEFAULT=YES,CHANNELS="2",URI="a1/prog_index.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="a2",NAME="English",LANGUAGE="en-US",AUTOSELECT=YES,DEFAULT=YES,CHANNELS="6",URI="a2/prog_index.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="a3",NAME="English",LANGUAGE="en-US",AUTOSELECT=YES,DEFAULT=YES,CHANNELS="6",URI="a3/prog_index.m3u8"
(note the shared NAME
and LANGUAGE
but the differences in e.g. CHANNEL
(and also the encoded content itself)
and when playing in Safari, you'll get:
(aka a single AudioTrack
).
Additionally, no "Languages" control menu is added to the controls, since there is only one "track".
Compare to this example https://storage.googleapis.com/shaka-demo-assets/angel-one-hls/hls.m3u8 which includes:
#EXT-X-MEDIA:TYPE=AUDIO,URI="playlist_a-eng-0128k-aac-2c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="en",NAME="stream_5",DEFAULT=YES,AUTOSELECT=YES,CHANNELS="2"
#EXT-X-MEDIA:TYPE=AUDIO,URI="playlist_a-deu-0128k-aac-2c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="de",NAME="stream_4",AUTOSELECT=YES,CHANNELS="2"
#EXT-X-MEDIA:TYPE=AUDIO,URI="playlist_a-ita-0128k-aac-2c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="it",NAME="stream_8",AUTOSELECT=YES,CHANNELS="2"
#EXT-X-MEDIA:TYPE=AUDIO,URI="playlist_a-fra-0128k-aac-2c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="fr",NAME="stream_7",AUTOSELECT=YES,CHANNELS="2"
#EXT-X-MEDIA:TYPE=AUDIO,URI="playlist_a-spa-0128k-aac-2c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="es",NAME="stream_9",AUTOSELECT=YES,CHANNELS="2"
#EXT-X-MEDIA:TYPE=AUDIO,URI="playlist_a-eng-0384k-aac-6c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="en",NAME="stream_6",CHANNELS="6"
Note that they all share the same GROUP-ID
but vary with e.g. LANGUAGE
and NAME
(though there are two en
playlists, which still have different NAME
s). Here's what you get when playing in Safari:
And here's what shows up in the automatically added "Language" control menu:
Finally, here's what happens when I create a local version of the multivariant playlist where the two english EXT-X-MEDIA
playlists share the same NAME
. playlist tags:
#EXT-X-MEDIA:TYPE=AUDIO,URI="https://storage.googleapis.com/shaka-demo-assets/angel-one-hls/playlist_a-eng-0128k-aac-2c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="en",NAME="English",DEFAULT=YES,AUTOSELECT=YES,CHANNELS="2"
#EXT-X-MEDIA:TYPE=AUDIO,URI="https://storage.googleapis.com/shaka-demo-assets/angel-one-hls/playlist_a-deu-0128k-aac-2c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="de",NAME="German",AUTOSELECT=YES,CHANNELS="2"
#EXT-X-MEDIA:TYPE=AUDIO,URI="https://storage.googleapis.com/shaka-demo-assets/angel-one-hls/playlist_a-ita-0128k-aac-2c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="it",NAME="Italian",AUTOSELECT=YES,CHANNELS="2"
#EXT-X-MEDIA:TYPE=AUDIO,URI="https://storage.googleapis.com/shaka-demo-assets/angel-one-hls/playlist_a-fra-0128k-aac-2c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="fr",NAME="French",AUTOSELECT=YES,CHANNELS="2"
#EXT-X-MEDIA:TYPE=AUDIO,URI="https://storage.googleapis.com/shaka-demo-assets/angel-one-hls/playlist_a-spa-0128k-aac-2c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="es",NAME="Spanish",AUTOSELECT=YES,CHANNELS="2"
#EXT-X-MEDIA:TYPE=AUDIO,URI="https://storage.googleapis.com/shaka-demo-assets/angel-one-hls/playlist_a-eng-0384k-aac-6c.mp4.m3u8",GROUP-ID="default-audio-group",LANGUAGE="en",NAME="English",CHANNELS="6"
Safari's audioTracks
:
(note there's only one en
AudioTrack
now)
"Languages" control menu:
All this is to say that Safari will treat multiple audio playlists as different tracks or as the same track depending on details in their attributes
I'd be careful here. Folks definitely use
EXT-X-MEDIA:TYPE=AUDIO
to provide multiple encodings/"renditions" of "the same" audio content, and Safari will represent them as a singleAudioTrack
. For example, Apple's official test stream https://devstreaming-cdn.apple.com/videos/streaming/examples/bipbop_adv_example_hevc/master.m3u8 includes:
However, this will still only match a specific audio track to a specific set of video renditions. The audio renditions won't be switching independently of the video renditions here, which is specifically what I was calling out, maybe it wasn't clear enough.
I tested locally and as far as I can tell, Safari is ignoring the second English track (I just named all the tracks English).
"ignoring" may be wrong here. iirc AVFoundation/AVPlayer (which Safari HLS playback is built on top of) will do some filtering based on support (6 channels being relevant here) but will also use ABR switching, similar to video playlists, for multiple audio playlists with "similar relevant features". It just isn't exposed in the browser.
Yeah, maybe it selects one from the available options and sticks with it. Either way, it seems simplified compared to what you can do in DASH.
If it does ABR the audio renditions, I couldn't get it to happen. But maybe my test wasn't great.
We don't have a good test stream. We'd want all the same container format & codec & channels with matching names & languages but notably different bitrates (including a stupidly large bitrate). We'd also likely want only one EXT-X-STREAM-INF
to avoid the dance of video ABR switching vs. (potential) audio ABR switching.
Or maybe someone with more knowledge of how this works under the hood will chime in 🤞
Allow a user to select from a set of video quality levels/resolutions/renditions/bitrates/variants/representations.
Was hoping to start this with a PR, but some research and discussion will be helpful first.
Related conversation: https://github.com/whatwg/html/issues/562 from @dmlap
The proposed extension to VideoTrack seems promising.
Something to solve for is "auto".
Ping @gkatsev @littlespex