w3c / picture-in-picture

Picture-in-Picture (PiP)
https://w3c.github.io/picture-in-picture
Other
311 stars 39 forks source link

Define API for arbitrary content #136

Closed beccahughes closed 1 year ago

beccahughes commented 5 years ago

Preview | Diff

marcoscaceres commented 5 years ago

Given Apple’s decision to not support this (as per blink-dev), should this be kept open? Although it would be a pretty neat feature, keeping this PR open as a “v2” thing is sending the wrong signal (like it’s still happening).

mounirlamouri commented 5 years ago

I'm not sure why Apple lack of interest should be a blocker. As you said, it would be an interesting addition to the platform and Apple concerns didn't seem, to me as fundamental issues with the feature itself but more as an unlikeliness to implement in Safari.

marcoscaceres commented 5 years ago

I'm not sure why Apple lack of interest should be a blocker.

If Apple doesn’t implement it, and neither does Firefox, then it risks ending up being a Chrome only thing. Further, if the API/spec that then ships ends up being only implementable in Chrome because of architectural differences, and no one spots that when the spec ships, then it means other browser can implement it (which would be bad for teh webs).

Also, it means that lack of implementation of the feature would prevent the spec from progressing along the W3C Rec Track: we would be put into a situation where this would need to be marked “at risk”, and eventually removed before progressing out of CR.

As you said, it would be an interesting addition to the platform and Apple concerns didn't seem, to me as fundamental issues with the feature itself but more as an unlikeliness to implement in Safari.

We can’t make that determination unless other browser engineers are given a chance to both review AND implement the spec (even as a prototype). Most spec issues are found during implementation. So, definitely neat feature, but lots of scary unknowns.

modest commented 5 years ago

I'd like to voice my support for this proposal (with or without the interactive flag) and note a couple of nuances.

While Safari only supports PiP on <video> elements, it does include shadow DOM elements in its PiP renderings. Most importantly, this includes UA-rendered subtitles (but not custom subtitle renderers). UA rendering of subtitles is arguably more common in HLS-based streams than MSE-based streams, so it's not surprising that custom subtitle renderers are a lower priority scenario for Apple.

This proposal would unblock our adoption of picture-in-picture video on the web. We're currently blocked by:

I would consider separating out the "interactive" portion of the proposal, as that raises far more questions around security/abuse, cross-platform UI design considerations, and scope. Without the "interactive" portion, this is a much safer proposal. Pages can already paint arbitrary content to the PiP using <canvas> and other workarounds today, so this is no different.

(A more constrained proposal: The target element must have a descendent <video> element in the tree.)

OrenMe commented 5 years ago

I want to second @modest reply, as these are the exact same blockers for us. This is a clear usage pattern and requirements that I hear from my users and from peers, and I believe there’s an obvious requirement for these features that @modest mentioned.

chrisn commented 5 years ago

I agree with @modest and @OrenMe. Although the BBC has deployed PiP for video, the v2 proposal means we could deliver a much better UX. In addition to the points made my @modest, there's also the ability to provide custom media playback / transport controls, to be consistent with the visual design we use with in-page video.

marcoscaceres commented 5 years ago

As @modest writes:

A more constrained proposal: The target element must have a descendent

I wonder if the above, instead of any Element, would make this proposal more palatable to other browser vendors (and Apple)? If Apple is already kinda doing this, and it addresses the use cases of folks who have commented, might be worth pursuing.

eric-carlson commented 5 years ago

While Safari only supports PiP on <video> elements, it does include shadow DOM elements in its PiP renderings.

WebKit does render captions and subtitles in the PiP window, but the fact that they are in the shadow DOM is an implementation detail. Cues are in a private part of the shadow DOM, not accessible to script in the page. No other part of the shadow DOM is rendered to PiP.

Most importantly, this includes UA-rendered subtitles (but not custom subtitle renderers). UA rendering of subtitles is arguably more common in HLS-based streams than MSE-based streams, so it's not surprising that custom subtitle renderers are a lower priority scenario for Apple.

It has nothing to do with priority, WebKit renders all subtitles and captions it can identify in PiP. This includes in-band captions/subtitles (from any media container supported, not just HLS), <track> elements with a WebVTT source, as well as WebVTT cues added with the track.addCue() method.

Custom rendered cues are not included in the PiP rendering because there is currently no way for WebKit to identify them as subtitles or captions. Of course this is just one of the problems with custom rendered cues, for example they are not styled with a user’s caption rendering preferences, performance and power usage are worse, etc. Those who were at FOMS in April know that we are working on a proposal to fix this - stay tuned for more details.

Pages can already paint arbitrary content to the PiP using <canvas> and other workarounds today, so this is no different.

How can pages paint arbitrary content to PiP with <canvas>?

OrenMe commented 5 years ago

@eric-carlson thanks for the reply, I was in FOMS but unfortunately I don't recall anything about Custom rendered cues so am very curious to know more details related to this.

Just an FYI about this - in all browsers beside Safari we render our captions/subtitles ourselves due to lack of proper support for styling - and this comes from compliance requirements with CVAA so this is very critical. Safari is a bit of a strange thing here - in general you have best support for cue styling, so we can use the native rendering and still be able to style the captions, and there's also another pain which is iOS fullscreen that doesn't enable custom rendering there - so this is causing the implementations(player vendors) to need to hold two rendering and styling pipes due to this quirks. If you will say that the other browser vendors need to implement proper cue styling so this is supported across the board then everything is good, but the fact is that it isn't and it forces the player creators to be very creative. Having this now on the PiP features will just make it worse.

beaufortfrancois commented 5 years ago

Pages can already paint arbitrary content to the PiP using <canvas> and other workarounds today, so this is no different.

How can pages paint arbitrary content to PiP with <canvas>?

@eric-carlson Here's an example:

const canvas = document.createElement('canvas');
// Draw something to canvas.
canvas.getContext('2d').fillRect(0, 0, canvas.width, canvas.height);

const video = document.createElement('video');
video.muted = true;
video.srcObject = canvas.captureStream();
video.play();

// Later on, video.requestPictureInPicture();

Source: https://developers.google.com/web/updates/2018/10/watch-video-using-picture-in-picture#show_canvas_element_in_picture-in-picture_window

eric-carlson commented 5 years ago

Just an FYI about this - in all browsers beside Safari we render our captions/subtitles ourselves due to lack of proper support for styling - and this comes from compliance requirements with CVAA so this is very critical.

And what about the ability of a user to have their caption styling preferences honored?

Safari is a bit of a strange thing here - in general you have best support for cue styling, so we can use the native rendering and still be able to style the captions, and there's also another pain which is iOS fullscreen that doesn't enable custom rendering

I don't understand this. WebKit's native cue rendering works in fullscreen on iOS, so what is the problem if you use native cues in Safari?

If you will say that the other browser vendors need to implement proper cue styling so this is supported across the board then everything is good, but the fact is that it isn't and it forces the player creators to be very creative.

I assume you have filed bugs about the problems you are having?

marcoscaceres commented 5 years ago

I have to agree with @eric-carlson: It sounds like the custom caption renders are attempting to work around browser bugs in inefficient ways, or ways that could be detrimental to accessibility and user control. It would be better for us to work together and have browser vendors fix those issues.

From Firefox’s perspective, the “Lack of the ability to overlay the video element with graphics and text” could easily be (ab)used to show ads on-top of videos. We know end-users really dislike those kinds of ads, so it’s not something we’d really want to enable (just drives users to install more ad blockers).

OrenMe commented 5 years ago

And what about the ability of a user to have their caption styling preferences honored?

@eric-carlson you are right, I forgot to mention that in the part of Safari is a bit of a strange thing here (actually webkit). So yes, And what about the ability of a user to have their caption styling preferences honored? is another strange thing cause you connect OS level config to a browser level config, which is very unique to Apple echo system(right? I don't think this happens on other OSes - windows and Android).

I don't understand this. WebKit's native cue rendering works in fullscreen on iOS, so what is the problem if you use native cues in Safari?

I meant this the other way around - custom rendering of captions cannot be achieved due to this limitation, unless webkit iOS implements fullscreen API in iOS, cause if I set playsinline true and start in browser playback and then user clicks fullscreen button the video will go to native iOS fullscreen and custom captions will be lost

I assume you have filed bugs about the problems you are having?

We discuss this on every FOMS and the situation is that text track rendering is not getting finalized and there's a discrepancy in implementation. I'm not saying one browser vendor should take ownership of this, and as I mentioned, on webkit the situation is the best, but I think that if browser land can't solve it then it should be at least open to userland(player vendors) to give the interim solution. But of course this is probably not a valid claim to you or any specific browser vendor and especially if I'm claiming Webkit is leading in support on this. Maybe what I'm saying - this is a bit frustrating from a user (of the API) standpoint.

@marcoscaceres I'm not starting this for pro or con for ads (as a user I don't like ads as well) but eventually this is the thing that pays for free content, and you can't really stop it as there are already solutions such as SSAI, so I would just say that ads are valid and there should be other measurements to make sure this is not abused. If you want to get an even better answer - some of our clients are not willing to adopt the PiP cause you can't show ads in it, so this might drive adoption from this feature. I don't have real data to my claim above, this is just my two cents from conversations I had with clients.

marcoscaceres commented 5 years ago

If you want to get an even better answer - some of our clients are not willing to adopt the PiP cause you can't show ads in it, so this might drive adoption from this feature. I don't have real data to my claim above, this is just my two cents from conversations I had with clients.

I appreciate that. In Firefox, we are currently experimenting allowing users to access PiP without the API: https://hacks.mozilla.org/2019/07/testing-picture-in-picture-for-videos-in-firefox-69/

eric-carlson commented 5 years ago

@OrenMe

And what about the ability of a user to have their caption styling preferences honored?

@eric-carlson you are right, I forgot to mention that in the part of Safari is a bit of a strange thing here (actually webkit). So yes, And what about the ability of a user to have their caption styling preferences honored? is another strange thing cause you connect OS level config to a browser level config, which is very unique to Apple echo system(right? I don't think this happens on other OSes - windows and Android).

No, I believe it is supported on at least Android and Windows.

I assume you have filed bugs about the problems you are having?

We discuss this on every FOMS and the situation is that text track rendering is not getting finalized and there's a discrepancy in implementation. I'm not saying one browser vendor should take ownership of this, and as I mentioned, on webkit the situation is the best, but I think that if browser land can't solve it then it should be at least open to userland(player vendors) to give the interim solution. But of course this is probably not a valid claim to you or any specific browser vendor and especially if I'm claiming Webkit is leading in support on this. Maybe what I'm saying - this is a bit frustrating from a user (of the API) standpoint.

It is frustrating from an implementor’s perspective too!

I ask again, have you filed bugs about the problems you are having?

If you want to get an even better answer - some of our clients are not willing to adopt the PiP cause you can't show ads in it, so this might drive adoption from this feature.

You can’t show interactive ads, but you can definitely show ads with images, video, and audio.

beccahughes commented 5 years ago

The problem with the <canvas> approach is that it breaks support for encrypted media which most content played in Picture-in-Picture needs.

Custom controls are a big use case too because sites have their own requirements around what UI to show e.g. skip ad, next episode, etc. To fix this properly we need interactive Picture-in-Picture for arbitrary content.

eric-carlson commented 5 years ago

@eric-carlson thanks for the reply, I was in FOMS but unfortunately I don't recall anything about Custom rendered cues so am very curious to know more details related to this.

@OrenMe The slides for our TextTrackCue enhancements session are available.

modest commented 5 years ago

The problem with the <canvas> approach is that it breaks support for encrypted media which most content played in Picture-in-Picture needs.

Exactly, and this incompatibility with EME is why such solutions are not viable for us.

From Firefox’s perspective, the “Lack of the ability to overlay the video element with graphics and text” could easily be (ab)used to show ads on-top of videos. We know end-users really dislike those kinds of ads, so it’s not something we’d really want to enable (just drives users to install more ad blockers).

When the site owner has control over the video, putting ads on top of videos is already feasible through either (a) encoding them into the video or (b) doing some inefficient client-side compositing in . There is a valid concern is that site owners may abuse this to put ads on top of videos that they don't control, such as on top of an embedded YouTube player. This could be mitigated through a blanket policy prohibiting <iframe> descendents or through some form of feature policy enabled by the child frame.

Our requirements for overlaying items on top of video element are common:

These requirements can sometimes be mitigated by re-encoding these into the video, but this is a lower quality experience (motion behind a transparent overlay tends to attract compression artifacts), less responsive to screen sizes, and extremely costly to perform.

I appreciate that. In Firefox, we are currently experimenting allowing users to access PiP without the API

Without the ability to programmatically trigger picture-in-picture or be aware of when picture-in-picture is occurring, we are unable to offer some great user experiences that involve browsing content during PiP playback - for example, letting users interact with the live TV guide in the page while playback continues in a picture-in-picture view. So this type of forced integration can enable a subset of scenarios for the limited cohort of users who discover an off-page browser command to invoke PiP, but it will not enable well-designed app integrations and will often result in semi-broken experiences.

eric-carlson commented 5 years ago

Without the ability to programmatically trigger picture-in-picture or be aware of when picture-in-picture is occurring, we are unable to offer some great user experiences that involve browsing content during PiP playback - for example, letting users interact with the live TV guide in the page while playback continues in a picture-in-picture view. So this type of forced integration can enable a subset of scenarios for the limited cohort of users who discover an off-page browser command to invoke PiP, but it will not enable well-designed app integrations and will often result in semi-broken experiences.

Programmatically triggering PiP and the ability to listen for state change events don't require PiP for arbitrary content - both are features of the "standard" picture-in-picture API.

chrisn commented 2 years ago

Should we close this PR, it seems to be superseded by the Document Picture in Picture proposal?

beaufortfrancois commented 1 year ago

Should we close this PR, it seems to be superseded by the Document Picture in Picture proposal?

I agree. This effort is now tracked at https://github.com/WICG/document-picture-in-picture/