whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.13k stars 2.67k forks source link

Deprecate mediagroup/MediaController? #192

Closed annevk closed 8 years ago

annevk commented 9 years ago

Per @rocallahan Mozilla has no interest in implementing these. It seems Safari does support them. Not sure about other browsers or their plans.

foolip commented 9 years ago

I unshipped MediaController in Blink in April 2014: https://groups.google.com/a/chromium.org/d/msg/blink-dev/MVcoNSPs1UQ/LIF-fvu2lwoJ

I'm pretty sure it was only ever implemented in WebKit (by @jernoble) and WebKit is now the only engine with it enabled. I'd be happy to remove it from the spec and to drop the disabled code from Blink.

domenic commented 9 years ago

We should get some comment from WebKit though on their plans for it. Although @annevk's OP says Safari does not support them, so I am getting conflicting reports here... any web page demos we can all test? I have an iPad at least, if not a Mac.

foolip commented 9 years ago

Just go to http://jsconsole.com/?MediaController to see if the interface object is there.

domenic commented 9 years ago

Still exists in iOS Safari. (Does not exist in Edge.)

But what about mediagroup, or other aspects of the feature?

domenic commented 9 years ago

@travisleithead, does Edge have an opinion on MediaController and mediagroup? It currently appears to be a Safari-only feature.

annevk commented 9 years ago

@domenic OP says Safari does support them.

foolip commented 9 years ago

But what about mediagroup, or other aspects of the feature?

It's just a declarative shortcut to setting the same MediaController on multiple elements, the MediaController interface itself is pretty much the whole feature.

foolip commented 9 years ago

Should we send a message to what email list and say that we're going to remove this feature if no implementor speaks up within a week?

annevk commented 9 years ago

I emailed webkit-dev. We could email whatwg too to be sure.

foolip commented 9 years ago

WebKit is indeed the most interesting party to ask, but it wouldn't hurt to send the same message to whatwg.

annevk commented 9 years ago

Will do.

eric-carlson commented 9 years ago

MediaController is frequently mentioned as a technique for playing a synchronized video of a sign language interpreter, eg. http://www.w3.org/TR/2012/NOTE-WCAG20-TECHS-20120103/G81.

How will people do this if you remove MediaController?

It is technically possible to do it if the spec was changed to allowed more than one video track in a container to be enabled simultaneously, but 1) not all media engines support that, and 2) that would require the sign language track to be downloaded even when it is not used.

foolip commented 9 years ago

There's nothing wrong with the use cases for MediaController, but the implementor interest just hasn't materialized. Even as implemented in WebKit (last I looked) it doesn't actually synchronize at the media engine level, which is the hard part.

A path that seems somewhat more likely to succeed, I think, is a way to get multiple video tracks of a single media element rendered into separate containers. Combining that with MSE ought to prevent the bandwidth waste.

travisleithead commented 9 years ago

I chatted with a few folks about the feature. We find that one of the most encouraging use cases is for accessibility--being able to synchronize Closed captions, sign-language PiP for the hearing-impaired, etc., together with a video. Perhaps those use-cases could be achieved in some other way. For MediaController specifically, we just have to consider it in the larger scope of features we would like to implement, and it just never shows up as a blip on the radar.

foolip commented 9 years ago

@travisleithead, for closed captions we have the text track APIs, but I agree about sign-language PiP, that's one of the obvious use cases for MediaController.

One way of solving the same problem might be to try to expose the level beneath HTMLMediaElement, what is internally a player object of some kind which itself doesn't know anything about HTML elements or CSS, it just plays audio and decodes video into a texture of some sort. If that "texture of some sort" concept were made explicit and could be connected to an element, it should be possible to define HTMLMediaElement in those terms, and also to direct different video streams to different elements.

travisleithead commented 9 years ago

That starts to sounds like what the proposed Timed Media WG wants to put together. If there is a clear, superior feature that comes available, and where all the use cases for the MediaController are subsumed in other platform features, then it makes sense to drop MediaController. Until then, it probably doesn't make sense for any browsers to start work on implementing it either.

foolip commented 9 years ago

@travisleithead, it's concerning that http://w3c.github.io/charter-html/timed-media-wg.html doesn't include breaking apart HTMLMediaElement itself into the underlying layers, but overall it sure does sound similar to what I'm saying. I hope that this work is done is close cooperation with the WHATWG, because part of the problem in this area is that the interaction between MSE and HTMLMediaElement is monkey-patched, and you cannot realistically solve it without changing HTML.

cconcolato commented 9 years ago

The feature of mediagroup/MediaController is interesting and useful for synchronizing several video streams (and/or several audio streams). There are several use cases for that. Two of them are:

I don't have a preference for the API design. It might be changed, or better integrated with MSE, but I'd like to keep the feature. I'd be curious to know why it hasn't been implemented: for lack of user interest? because it's too hard to implement on all platforms? Maybe it could be pushed to a level 2 spec.

foolip commented 9 years ago

I'd be curious to know why it hasn't been implemented: for lack of user interest? because it's too hard to implement on all platforms?

It's hard to say in general, when implementors don't show any interest they usually don't say anything at all. It's not because anybody has outright said it's a terrible idea, however, more a lack of enthusiasm.

The use cases are obvious, and yet when we unshipped the API in Blink (Chrome, Opera) there was hardly any reaction at all, the only sign of interest I remember or can find is this bug report: https://code.google.com/p/chromium/issues/detail?id=403320

I would say that it is actually quite hard to implement the synchronization correctly, but it all depends on what media engine you're using. For some it might be trivial, and for others it may not be possible at all.

hober commented 9 years ago

I'm trying to dig up some usage data, but in the interim I can say that we are unlikely to drop support for MediaController absent a compelling alternative that addresses its use cases.

What is the technical substance of roc's objection? All I know is that he won't implement; I'd love to know why.

cookiecrook commented 9 years ago

Here’s an demo of mediagroup in WebKit/Safari.

WWDC 2012 synchronized video demo. Explanation starts around 3:30 (example even uses the sign language use case); demo starts around 10:30 and the corny banter between Beth and Vicki is quite entertaining. Demo ends around 16:00 when they start talking about the fullscreen API. https://developer.apple.com/videos/wwdc/2012/?id=604

The video being used is this one of Bohemian Rhapsody, signed by Stephen Torrence. The video has its own caption track for ASL, which is not a 1:1 mapping to the English captions of the original Queen song, and demonstrates why these can and should be achieved using separate media elements synced using mediagroup. This ASL track is an alternative for another video, but it also has its own media alternatives (captions), which would conflict with main video captions if these were included as two video tracks in the same video element. http://www.youtube.com/watch?v=sjln9OMOw-0

The Accessibility discussions have already declined additional features based on the availability of mediagroup in the spec and in WebKit.

Thread on public-html-a11y: https://lists.w3.org/Archives/Public/public-html-a11y/2013Dec/0071.html

Referenced telecon minutes from public-html-media. https://lists.w3.org/Archives/Public/public-html-media/2013Dec/0034.html

domenic commented 9 years ago

I'd ask everyone to stop mentioning use cases for MediaController in this thread. As @foolip has stated repeatedly, the fact that a feature has use cases is not relevant to this discussion. The reason for discussing its removal is lack of implementation interest, not lack of use cases. (You should perhaps be taking those arguments to various browser vendors' mailing lists to attempt to convince them to implement the feature.) Adding your demos of MediaController or stories of how it is a useful thing to you are off-topic here, and further posts along those lines will be deleted.

(Note: the last three paragraphs of your post are fine, @cookiecrook.)

cookiecrook commented 9 years ago

The use cases are relevant. Because of the use cases, it may be that others working on Mozilla and Chromium do have interest in implementation.

cookiecrook commented 9 years ago

You should perhaps be taking those arguments to various browser vendors' mailing lists to attempt to convince them to implement the feature.

I've mentioned this thread to a few members of the Mozilla and Chromium accessibility teams.

foolip commented 9 years ago

I'm trying to dig up some usage data

@hober, at the time I unshipped it in Blink, usage was around 0.0004% of page views. Usage may be different in Safari, of course.

rocallahan commented 9 years ago

What is the technical substance of roc's objection? All I know is that he won't implement; I'd love to know why.

I have no technical objection; it's just been low priority for a long time and doesn't seem to ever rise in priority. Our media team constantly receives feature requests and this is never one of them, so we are currently uninterested in implementing this feature and do not forsee that changing.

Furthermore, implementing this feature properly would require invasive changes to our media stack, adding significant complexity, so it's worth considering whether that complexity will ever be justified. (But in response to Philip: this is definitely implementable in Gecko.)

FWIW if we do keep this feature in the spec, we need to describe the behavior of a video element which is part of a media group and whose source is a MediaStream. It seems infeasible to me to sync MediaStream output with the decoding of media resources, so I would very strongly recommend that this combination be ruled out somehow.

silviapfeiffer commented 9 years ago

It's a shame that we continue to argue with page views around features that are (mainly) accessibility related. We should take the number of people that have accessibility needs as our baseline, not the whole Internet population when claiming low interest.

Unfortunately, the percentage of accessibility feature developers in browsers is even lower than the percentage of people with accessibility needs in the population. Minority needs like that unfortunately get ignored until they become aggressive about it (see the laws about captioning in the US).

I'm very sad to see these features go, but I only have use case arguments. There is definitely not a "90% of the Internet population needs this feature" arguments, so that won't swing any browser dev to implement them.

rocallahan commented 9 years ago

FWIW I expect the specific use-case of rendering a sign-language video track alongside a movie doesn't absolutely require this feature. I'm confident you could hack your own synchronization adequately in Javascript (e.g. to within 100ms almost always) by monitoring currentTime and pausing the leading video if the other one falls behind --- because, as I understand it, a sign-language track doesn't have to be perfectly in sync with the other tracks.

IMHO the only use-cases that would justify implementing this feature are the ones that require frame-accurate synchronization.

chaals commented 9 years ago

@rocallahan shows why use cases are relevant. If we don't need this particular API to solve the use cases, a few people could spend a little time telling other people (who care deeply about the use cases) to stop wasting theirs - e.g. not to fill bug trackers based on misconceptions.

100ms is probably acceptable or at least "acceptably bad" for signing in most cases. I have also been told that audio description for video may rely on synchronisation. Again, 100ms accuracy is probably workable in almost all cases. People who rely on these features in real life have learned to deal with a janky world...

@rocallahan, how confident are you that you're better than order-of-magnitude right about the performance?

rocallahan commented 9 years ago

I'm pretty confident.

Here's a POC I hacked together: http://people.mozilla.org/~roc/videosync.html Try it for yourself, and check the source. It's terribly simple and could definitely be improved.

chaals commented 9 years ago

Looks pretty reasonable to me. Later tonight I'll get a chance to try it on a really flaky network, and on some lower-end hardware.

Thank you for putting the demo together. It's terribly simple and for most cases could probably just be copy-pasted without making the world worse…

(the exception being the art case where you want to have a large number of things synching. In which case synch limitations seem one of the limits of the medium that provides the technical boundaries against which, in part, artists define themselves...)

rocallahan commented 9 years ago

Couldn't resist tweaking the POC a bit :-). Applied some hysteresis to reduce the frequency of pause/play transitions.

If you try it on low-end hardware, be sure to toggle the graph display off, since that may be non-trivial. (It redraws the graph 100 times per second.)

silviapfeiffer commented 9 years ago

Might be better to use some actual sign language for the actual video. Here's a pair you can use:

There is more work to be done when you're trying to interact with the videos. Pause and play and navigate. Even if you restrict that to the main video - you have to replicate all the activities on the other video, too.

zcorpan commented 9 years ago

@rocallahan put it on github so discussions about improving it can be separate from this issue. :-)

rocallahan commented 9 years ago

I don't intend to do any more work on it. I think it proves the point.

Robbert commented 9 years ago

Check out http://thebookofsand.net for a use case for mediagroup. I did a custom implementation of MediaController, took me weeks and weeks to get right, with buffering and all.

Did some more fun stuff for that project, like implementing the HTMLMediaElement API for the YouTube and Vimeo JavaScript APIs (methods, properties, playback and buffer events) so I could keep using the same code for mediagroup. In the end the latency of the YouTube API for getting currentTime turned out to cause the lipsync to be too inaccurate, so we went with hosting the video ourselves.

One problem with implementing mediagroup functionality in JS is that for starting one video the latency for actual playback is for example 100ms, but when calling play() on three video's at the same time would cause the first video to play after 160ms, the second after 300ms and the third after 450ms. You would need to compensate for that. I try to gather latency statistics for different situations and rewind to slightly the wrong offset to end up with accurate sync, but that information isn't shared across websites of course.

By the way: IIRC, I aimed for <20ms latency, anything over 20ms started to get annoying.

foolip commented 9 years ago

I have now removed the code for MediaController in Blink: https://codereview.chromium.org/1373423003

foolip commented 9 years ago

As for the spec, my proposal (https://github.com/whatwg/html/pull/277) is to remove it in 6 months if no implementor interest materializes.

foolip commented 9 years ago

The spec now has this in a red box:

This feature has been implemented only in one user agent since it was introduced in 2011. If a second implementation is not in progress by April 2016, it will be removed from this specification. Please see issue #192 for more details.

foolip commented 8 years ago

Note to future self: https://www.w3.org/Bugs/Public/show_bug.cgi?id=22471#c26

jernoble commented 8 years ago

This API is still the only way for authors to implement out-of-band audio tracks. @rocallahan's example does not show frame-accurate synchronization and @robbert's experience implies that a pure-javascript solution will not synchronize accurately enough for this use case.

I object to this feature being removed from the spec until an alternative has been proposed.

domenic commented 8 years ago

@jernoble you can object, but without implementer support we can't keep it in the spec regardless. You've got a month to get some progress started in another UA...

jernoble commented 8 years ago

@domenic There were multiple implementations, until Blink removed theirs (ostensibly because of usage metrics). So the argument is not a lack of UA interest (since at least two UAs implemented the feature at some point), but a lack of author interest. So the use cases are directly relevant.

domenic commented 8 years ago

@jernoble "implemented at some point" is not the bar. "Implemented" is.

sideshowbarker commented 8 years ago

This API is still the only way for authors to implement out-of-band audio tracks. @rocallahan's example does not show frame-accurate synchronization and @robbert's experience implies that a pure-javascript solution will not synchronize accurately enough for this use case.

All of those assertions seem to be true. (If anybody believes they’re not, they should chime in here to say so.) And if so, then the alternatives really aren’t alternatives if they don”t really solve the problems the feature was originally designed for. So not implementing the feature amounts to saying that we do not need to solve those problems (despite having once had agreement—at the time the feature was originally specified—that those were problems we needed to solve).

All that said, I think we also have an obligation to figure out if this is even something that there would be enough real author need for even if we had it implemented across all UAs. I think data which would provide evidence of author need for this would be a list of known Web sites/apps that are trying to solve the problem as well as they can just using JavaScript and existing APIs, but falling short.

But if we don’t even have authors who are attempting it, then that would seem to suggest there may not actually be a strong need for it.

annevk commented 8 years ago

Well, @robbert is one such author^H^H^H^Hdeveloper. Haven't met any others, but then I don't get out much.

silviapfeiffer commented 8 years ago

Read the comments on a very old blog post of mine: http://gingertech.net/2011/05/01/html5-multi-track-audio-or-video/ .

There's about 10 people asking for the feature. You can imagine that only about 1% of all people that are looking for a solution here to my blog post and only 1% of those leave a comment. Those that are looking after probably just 1% of those that actually have a need. So, my prediction is that there are probably about 10M users of such a feature. On 17 Mar 2016 12:11 AM, "Anne van Kesteren" notifications@github.com wrote:

Well, @robbert https://github.com/robbert is one such author^H^H^H^Hdeveloper. Haven't met any others, but then I don't get out much.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/whatwg/html/issues/192#issuecomment-197319455

domenic commented 8 years ago

Yes, I anticipate this is probably a feature with nonzero usage. However, it is a feature with only one implementation, and no sign of willingness to get another. You may find lobbying the bug trackers of implementers more fruitful.

cookiecrook commented 8 years ago

I think this is feature will have more adoption once more people realize what it makes possible.

For example, video providers like Netflix could use this to provide timing-critical out-of-band audio descriptions. This would reduce their CDN duplication, because they would not need to duplicate the 5.1 audio track in both AD and non-AD versions. The main audio could be 5.1, and the AD track could be low-bandwidth stereo, or even mono.

Most production studios distribute movies audio as separate audio tracks for the distribution houses: sound effects, music, various language tracks (e.g. English, Spanish, Japanese), and alternate tracks (audio descriptions, director's commentary, etc.). These tracks are mixed down to a single audio file prior to CDN distribution, but combining them on the fly would allow users to do delightful things like lower the volume of the background audio (sound effects and music) to more clearly understand the foreground dialog. No video service has implemented this user feature yet, but deprecating the HTML feature now only ensures none of them will be able to.

Perhaps David Bolter, Alex Surkov, Eitan Isaacson, Marco Zehe, or others at Mozilla Accessibility have different opinion than @rocallahan?

Perhaps the media distribution companies and production houses have an opinion, too. Netflix and Disney/Pixar are treating accessibility very seriously now. Disney just released the "Disney Movies Anywhere" app and seems to be retroactively recording audio descriptions for the entire Disney movie catalog. Netflix is one of the largest users of bandwidth on the Web (in the US anyway) and may be interested in one of these techniques to reduce their CDN duplication.

Given the single implementation (in WebKit/Safari) and its impact to accessibility (where timing is more important than the detractors realize), it would be a shame to deprecate the feature prematurely.

rocallahan commented 8 years ago

I'm no longer with Mozilla. The person you need to convince is probably Anthony Jones.

FWIW I believe what I wrote above is still the case: the media team is very busy and gets lots of feature requests from users, and this is much less requested than other features.

| @rocallahan's example does not show frame-accurate synchronization

That's true. The question is, which use-cases require frame-accurate synchronization? Signing doesn't. AFAICT audio descriptions don't. Separate distribution of audio tracks doesn't: Netflix is using Media Source Extensions which means they can put separate audio tracks on the CDN and bundle them into a single media resource at the client. Per-audio-track volume control doesn't, that could be provided with a 'volume' attribute on AudioTrack, which is better than requiring MediaController since it would work when the audio tracks have been bundled into a single resource, and would work with MediaStreams.