whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.02k stars 2.62k forks source link

MediaElement and SyntheticMediaElement #8129

Open ysulyma opened 2 years ago

ysulyma commented 2 years ago

Introduction

The <video> element was one of the most revolutionary new features of HTML5, and a key part of "Web 2.0". Today, the web platform has become powerful enough to create videos. Libraries such as GSAP, Liqvid, and Remotion allow developers to create seekable animations and even full-length videos using Javascript, just as we did with Flash in the days of yore. (Disclaimer: I am the author of Liqvid.) Such "videos" are just DOM manipulation synced up to an audio track (which is still a normal audio file) and a scrubber bar; in particular, they can be interactive, which is impossible with .mp4 videos.

This proposal standardizes the behavior which is common between GSAP's Timeline, Liqvid's Playback, Remotion's PlayerRef, and other libraries. It defines one new interface, MediaElement, and one new class, SyntheticMediaElement. The desiderata are:

In other words, SyntheticMediaElement implements a subset of the functionality of <audio>/<video> elements. It has a current time, a playback rate and a duration, and can be played, paused, and seeked.

The choice of which properties/events to include is dicated by experience. The Liqvid plugin suite is compatible with all three of GSAP/Liqvid/Remotion, and this proposal is a less-kludgy version of the @lqv/playback interface that those plugins are built around.

Details

The MediaElement interface includes the following properties of HTMLMediaElement:

It also supports addEventListener and removeEventListener with the following event types:

The SyntheticMediaElement class implements MediaElement.

Polyfill

Polyfill: mjs, types, source.

This polyfill is based on Liqvid's Playback class. However, due to design errors that class does not currently implement MediaElement as defined above (it measures currentTime in milliseconds rather than seconds, and some of the event names are different).

Enhancements

All three reference libraries implement additional functionality beyond the MediaElement interface defined above. We have not included these in the proposal since they violate Desiderata 1 and/or 3. However, they are useful to keep in mind.

ysulyma commented 2 years ago

Tagging Remotion @JonnyBurger @Iamshankhadeep and GSAP @jackdoyle @PeterDaveHello

annevk commented 2 years ago

Thanks for raising this! I think this might benefit from going through https://wicg.io/ or equivalent to get some help with flushing out the proposal a bit more.

cc @whatwg/media

tomByrer commented 2 years ago

I'm interested in this idea also; I'd like to sync CSS & Lottie animations with an audio/video file. Currently exploring using VTT as a unified timed RPC listing to trigger commands.

Folks at Mux, Inc are taking a different approach to similar problem; they are[abstracting <Video> with Custom Elements as a HTML-UI wrapper, vs an JS-API wrapper like you have here. But maybe @heff or @luwes would like to give feedback anyhow?

heff commented 2 years ago

Hi, thanks for the ping @tomByrer. I don't totally understand what the result of the proposal would be, but it does sound related to what we're doing with media-chrome, which is a set of media UI elements that can work with any html element (native or custom) that exposes the same API as the native media elements (<video> and <audio>).

We have a growing list of custom media elements, including wrappers for the youtube player and HLS.js. Many of them simply extend a custom-video-element class we built, while other start from scratch.

Also, under video-dev/media-ui-extensions we have early proposals for extensions of the media element API for common needs like quality rendition switching and ad UIs. I gave a related talk at Demuxed. Happy to chat more if there's interest.

ysulyma commented 2 years ago

@heff Definitely related! One concrete difference is that MediaElement is not DOM-aware at all. As a perverse (but illustrative) example, one could use it in Node to create ASCII videos in the terminal.

The discussion at https://github.com/muxinc/media-chrome/pull/182 gets more to the heart of the difference. Particularly this comment:

I'm definitely in the camp of the media controller not extending the video element and mimicking its API.

If Media Chrome is a UI framework I'm also not convinced that the controller should be a source of truth for video state outside Media Chrome. The only reason the controller is there is to act like a middle man between MC controls and media.

In my use case, I am making "videos" out of DOM manipulation (example), and I need a SyntheticMediaElement to be the source of truth about "what time it is". Any actual <audio> or <video> elements, if any, are controlled by the SyntheticMediaElement.

You could also use this to control multiple <video> elements (or YouTube/Vimeo videos wrapped with the MediaElement API) which need to be synced up in some complicated way. If your source of truth for the "current time" is a single, actual <audio>/<video> element, then you do not need SyntheticMediaElement. However, you could still make use of plugins (c.f. the example) targeting the MediaElement API.

Basically, it's a pattern for general-purpose imperative animation. Like the Web Animations API, it isn't inherently tied to a scrubber bar interface (although in most applications it will be). Unlike the Web Animations API, which can only animate CSS properties, this can be used to e.g. sync up a THREE.js scene to a scrubber bar.

heff commented 2 years ago

I think that makes sense to focus on an interface for just the media state/control API. Video can get more complicated with element attributes and child nodes (track, source). I see how it can makes sense to break that out of the requirement of being an element so it could work in other contexts like node. On that note, is it not a Media "Element" interface then?

ysulyma commented 2 years ago

On that note, is it not a Media "Element" interface then?

Yeah, the logic was HTMLMediaElementHTML, but I guess Element should be subtracted as well. On the other hand, Media is too generic. Open to other suggestions; perhaps MediaElementPlayable, SyntheticMediaElementPlayback?

heff commented 2 years ago

Yeah, maybe one of those. Probably not worth worrying about naming until this gets a little further.