Early design review: Document Picture-in-Picture

steimelchrome commented 1 year ago

Wotcher TAG!

I'm requesting a TAG review of Document Picture-in-Picture.

There currently exists a Web API for putting an HTMLVideoElement into a Picture-in-Picture window (HTMLVideoElement.requestPictureInPicture()). This limits a website's ability to provide a custom picture-in-picture experience (PiP). We want to expand upon that functionality by giving websites the ability to open a picture-in-picture (i.e., always-on-top) window with a blank document that can be populated with arbitrary HTMLElements instead of only a single HTMLVideoElement.

Explainer¹: https://github.com/WICG/document-picture-in-picture
Security and Privacy self-review²: https://github.com/WICG/document-picture-in-picture/blob/main/security-privacy-questionnaire.md
GitHub repo (if you prefer feedback filed there): https://github.com/WICG/document-picture-in-picture
Primary contacts (and their relationship to the specification):
- Tommy Steimel (steimelchrome), Google
- Frank Liberato (liberato-at-chromium), Google
Organization/project driving the design: Google Chrome
External status/issue trackers for this feature (publicly visible, e.g. Chrome Status): https://chromestatus.com/feature/5755179560337408

Further details:

[x] I have reviewed the TAG's Web Platform Design Principles
The group where the incubation/design work on this is being done (or is intended to be done in the future): WICG
The group where standardization of this work is intended to be done ("unknown" if not known): unknown
Major unresolved issues with or opposition to this design:
- See github issues list for known problems. One notable issue is that we're still trying to figure out how to best design and specify how CSS copying works for this feature
This work is being funded by: Google

We'd prefer the TAG provide feedback as (please delete all but the desired option):

🐛 open issues in our GitHub repo for each point of feedback

slightlyoff commented 1 year ago

This is an exciting proposal!

Some questions:

the lifetime of the PIP window isn't exactly clear from the explainer. Consider, for instance, wanting to create an MPA-style media application that uses Viewport Transitions, but continually plays media in a PIP window. Will transitioning from the original host window to the next (same origin) page in the main frame kill PIP playback in this model?
if not, how does the destination window receive or re-create a handle to the PIP window? Is that what the global documentPictureInPicture is for?
How are multiple PIP windows handled in a desktop scenario where multiple windows request PIP?
is documentPictureInPicture the actual entrypoint? It seems a strange location for the API A more natural location might be on navigator or as an extension to clients.openWindow()

steimelchrome commented 1 year ago

Yes, in this model if the main frame is navigated (even to the same origin) the PiP window is killed (similar to the existing requestPictureInPicture() API for HTMLVideoElement)
- N/A
- That is left up to the user agent (similiar to the existing requestPictureInPicture() API for HTMLVideoElement). In Chrome, we will only allow one PiP window to exist at a time. I know other browsers do something different for HTMLVideoElement and therefore may do something different for this as well.
Yes. I originally had it on navigator, but was told that it wasn't a good place for the API. I don't have strong feelings about API name or placement personally

arnaudbud commented 1 year ago

Dialpad would benefit if this feature would be available in the browser. Is there a demo available? This is our use case, Dialpad Anywhere: https://help.dialpad.com/hc/en-us/articles/360000407666-Dialpad-Everywhere#access-call-controls I support this proposal.

tomayac commented 1 year ago

(I have PR'ed the Document Picture in Picture API into this Pomodoro Timer app.)

torgo commented 1 year ago

Previous related reviews:

torgo commented 1 year ago

Yes, in this model if the main frame is navigated (even to the same origin) the PiP window is killed (similar to the existing requestPictureInPicture() API for HTMLVideoElement)

N/A

That is left up to the user agent (similiar to the existing requestPictureInPicture() API for HTMLVideoElement). In Chrome, we will only allow one PiP window to exist at a time. I know other browsers do something different for HTMLVideoElement and therefore may do something different for this as well.

Yes. I originally had it on navigator, but was told that it wasn't a good place for the API. I don't have strong feelings about API name or placement personally

@steimelchrome have you updated the explainer to clarify these issues? (And by the way, thanks @slightlyoff!)

From our review in today's TAG breakout, this looks like a generally useful feature.

A few other questions:

what is the planned route for standardizaion for this? Right now it just lists WICG.
we noted that while the explainer is well written, it doesn't start with user needs as we've been encouraging. Can you add some material documenting the use cases from a user's perspective?
is there any relationship with Popover #743 - considering these are both to do with layering of content
we discussed a potential issue around accessibility... for example if there are subtitles in the PiP "window" ensuring those can be picked up by assistive technology appropriately. What other accessibility considerations have you discussed?
we're slightly concerned with the proposed mitigation to the spoofing issue - although it's good that this consideration is called out. Can you strengthen this wording maybe with an example?
we'd like to encourage you to use normative language in the security and privacy considerations sections, as you develop those further
has there been any feedback from other browsers? Have you opened up issues in Mozilla or Webkit standards positions?
how would this feature work with multiple screens? Would it be up to implementations to decide which screen the PIP window shows up in? It seems like it would be useful to factor in multiple screens, given that proposals like this have come forward.
Is the aspect ratio (width/height vs height/width) following a common pattern? If so, we might want to document this as a design principle.

Also just noting: we're going to bring more CSS expertise to bear on this review so expect some further questions.

Thanks!

steimelchrome commented 1 year ago

@torgo Yep, I've just updated the explainer to clarify those (and the spec should also be clear on them).

For standardization, I'm going to bring it to the media working group. I have presented it there before, but haven't discussed bringing it to the wg with Chris yet.
Sure I can add user needs. I assume that's different from the "Use Cases" section in that it's from the user's perspective and not the web developers? Do you have a link to an example user needs section?
I don't think this proposal really has any overlap with the Popover API
The PiP window is a full HTML document that can be focused and picked up by assitive technology like any other browser window, so that shouldn't be an issue. We also ensure that the PiP window is in the tab order (so it can be focused via keyboard) and that the toolbar buttons are keyboard focusable as well. However, these things (especially the toolbar buttons) are Chrome-specific UI, so I'm not sure it makes sense to call them out in the spec/explainer for the Web API, but I don't know what's normally done for that.
Added an example to the spoofing section
Okay I'll keep that in mind
Yes, we have opened issues for standards positions:
- Mozilla: https://github.com/mozilla/standards-positions/issues/670
- WebKit: https://github.com/WebKit/standards-positions/issues/41
Right now, we leave placement entirely up to the user agent (screen and location). On Chrome, we just use the screen that the opener window is on
It might accidentally be following a common pattern, but I think I just made it that way because I had to pick one or the other.

CSS expertise sounds good. I believe @liberato-at-chromium was talking with some CSS people a while back about it

Thanks!

torgo commented 1 year ago

I don't think this proposal really has any overlap with the Popover API

I think we meant: In the popover api, you have a window that sits on top that can have arbitrary content.. However I think one difference is that the Popover is only visible in the current browsing context whereas the PiP floating window is visible in other apps, etc... is that correct?

torgo commented 1 year ago

@steimelchrome there was also a concern raised in the Mozilla Standards Position thread about this being misused by advertisers or other actors that want to interrupt the user experience - that this could become another popup. Can you elaborate on how this concern has been addressed? In the response to this question I'm reading from @liberato-at-chromium "However, I don't believe that Document PiP makes the situation any worse." We're trying to push spec developers to "leave the web better than you found it." See our design principle on this topic. So I think we'd like to understand how Document picture-in-picture makes things better for end users on this front.

On a related note, is a permission request to the user currently necessary in order to invoke document picture-in-picture?

ylafon commented 1 year ago

The spoofing section is giving hints and should use stronger wording to avoid, for example, payment website spoofing, or as stated in the document System UI used to gather user passwords. Having PiP restricted to video was enough to avoid this issue, but opening it up to be any document leads to need to care about security/spoofingin a normative way.

tomayac commented 1 year ago

Since you can render HTML content to a video that you can then PiP with the traditional API, I don't think a proper API as proposed here causes new spoofing surface—arguably even less, since the UA per encouragement in the spoofing section renders UI such as a title bar, at least as implemented in Chrome. As an example, here's my custom-built solar system PiP dashboard that I like to keep an eye on:

Screenshot 2023-04-20 at 09 25 20

Compared to a traditional PiP window with no UI:

Screenshot 2023-04-20 at 09 28 22

Update: to be fair, you can't interact with a video PiP window much and with a document PiP window you can, but there's more to come for video.

liberato-at-chromium commented 1 year ago

Can you elaborate on how this concern has been addressed? In the response to this question I'm reading from @liberato-at-chromium "However, I don't believe that Document PiP makes the situation any worse." We're trying to push spec developers to "leave the web better than you found it." [...] So I think we'd like to understand how Document picture-in-picture makes things better for end users on this front.

Document PiP makes the web better because we've seen a lot of (legitimate, not abusive) demand for always-on-top arbitrary content.

My comments earlier were just trying to say that we aren't introducing a new vector for abuse in the process of providing those improvements, because it's not more abusable that what's already there. I might be able to make the case that it's actually less so. For example, the site can't move or resize a document pip window via scripting. However, I don't think those differences are a reason we'd do any of this.

steimelchrome commented 1 year ago

The spoofing section is giving hints and should use stronger wording to avoid, for example, payment website spoofing, or as stated in the document System UI used to gather user passwords. Having PiP restricted to video was enough to avoid this issue, but opening it up to be any document leads to need to care about security/spoofingin a normative way.

That makes sense. I've updated the spec to have normative language around spoofing prevention

LeaVerou commented 1 year ago

Hi @steimelchrome,

First of all we know we're late getting back to you on this. Thank you for bearing with us.

We appreciate the addition of normative language around spoofing. We remain concerned about the lack of multi-stakeholder support - particularly the lack of support from other browsers - unlike picture in picture itself which enjoys strong support across engines. We're also concerned that this feature could be used to enable surprising and disruptive advertising experiences. We also remain concerned about the browser chrome around this picture-in-picture window. E.g. the documentation presumes there will be a close button but this is highly dependent on platform.

In terms of API design, we did see a lot of commonalities between this functionality and an "always on top" option in window.open(). Integrating it in window.open() would also fix a lot of issues around it (guaranteed prominent window chrome, guaranteed close button, existing ways to interact with said windows, naming, namespacing etc) and it also reduces the new API surface that authors need to learn. We do see that this was considered as an alternative, but rejected due to lack of feature detectibility for window.open() options and some functional differences (never outliving the opener). There are discussions around creating a new method that fixes the various issues of window.open(), it may be a good idea to collaborate with the folks working on this effort. The functional differences between this and window.open() may be useful more broadly too, never outliving the opener certainly would be!

steimelchrome commented 1 year ago

Sorry for the delay! I had drafted most of this a day or two after you sent your response but somehow forgot to ever send it.

Re: not as much cross-browser support: As websites continue to implement this feature, we expect other browsers to implement the feature as well. For reference, Safari didn't release the original picture-in-picture feature until ~1 year after our launch.

Re: surprising and disruptive advertising experiences: I agree that that concern is reasonable (for both document picture-in-picture and the original video picture-in-picture). The fact that neither type of picture-in-picture outlives the document that opened it alleviates some of the issues (unlike popups/unders that can live on after you closed the original tab). Are there other restrictions/limitations that you think we should be considering to lessen the concern?

Re: close button being dependent on platform: Are you saying you're concerned that other browsers will open a document picture-in-picture window and not provide any mechanism for closing it? I don't think we need to over-dictate what other browsers' UI looks like, and I don't expect a browser to not provide a way to close the window, but if there's specific language you think I should add please let me know.

Re: modernized window.open(): Domenic gave some input on this in the intent to ship: https://groups.google.com/a/chromium.org/g/blink-dev/c/JTPl7fM64Lc

LeaVerou commented 10 months ago

Hi @steimelchrome thank you for bearing with us. We're picking this back up today and trying to get you some useful feedback.

We're concerned about spoofing. There's some language in the spec about this now - great - but there is no discussion in the explainer about possible abuse cases and mitigations against those abuse cases - that would be very valuable. Also the language in the spec just says UAs need to provide "enough UI" which is a bit vague. Is there any non-normative language that could be added here to elaborate on the kind of UI that should be provided? Also, could you confirm that the PiP window will only be opened as a direct result of user action?

We're concerned about accessibility. Specifically the explainer doesn't mention how focus management is expected to work. How will users move between the PiP window and the main document?

We are still concerned that from an author perspective, this introduces a feature that is very related to window.open() but solves subtly different problems. We saw the part in the explainer about this, but it may be useful to decompose the problem into the parts where window.open() behavior conflicts with what is desired here, and examine whether these primitives may be useful for window.open() as well. Based on the differences mentioned in the explainer, that does seem to be the case:

a window that does not outlive its opener is definitely a useful concept for window.open(), in fact if we were to design window.open() today I'd argue it should be the default!
Feature testing window.open() features is also a more general problem.

Decomposing this into lower level functionality that can be integrated in window.open() would also address the spoofing concerns as well. An important question we need clarity on is, is there something that makes this functionality fundamentally incompatible with window.open() or is it about managing design & implementation effort?

Thanks for sending @domenic’s comment (though it took some effort to track down in that long thread)

Re: modernized window.open(): Domenic gave some input on this in the intent to ship: groups.google.com/a/chromium.org/g/blink-dev/c/JTPl7fM64Lc

There is some TAG feedback that seems to wish this was part of window.open() or some other more-general API, but I think that advice is not correct, and the current API design is good, due to the singleton-per-top-level-traversable nature of a document PiP window. In my opinion this makes the window.documentPictureInPicture entrypoint, with its requestWindow(), window, and onenter properties, a good API for the use case.

We'd love if you or @domenic could elaborate on this, as we're a bit unclear on the argument being made (what is singleton-per-top-level-traversable-nature?).

Can you please let us know the current status and any response to these issues?

domenic commented 10 months ago

(what is singleton-per-top-level-traversable-nature?).

There is only one document PiP window allowed per top-level traversable. I.e., there is a distinct "document PiP window spawned by this top-level traversable" slot, which is suitably exposed through the documentPictureInPicture object, which can be used to fill that slot and monitor it. The document PiP API is stateful.

Whereas, window.open() can be called arbitrarily, as many times as you want, and doesn't fill a single "Window-opened by top-level traversable" slot. window.open() is stateless.

LeaVerou commented 9 months ago

Thanks for clarifying @domenic, I understand the reasoning more clearly now.

A clarifying question: Where does this restriction of one PiP window per top-level traversable come from? Is it to prevent abuse and improve the end-user experience? Or is it an underlying platform limitation?

LeaVerou commented 9 months ago

@domenic There reason I was asking is that there are three potential design directions here:

Design this as an entirely orthogonal feature (current direction)
Design this as additional configuration for window.open()
Design this as an abstraction using window.open() as lower-level functionality.

I just wanted to make sure all avenues relating this to an existing web platform primitive have been explored before introducing a new one.

E.g. pretty sure I've come across use cases where a top-level window should only be able to spawn one window of a certain type (for example, preview in new tab for a website creation tool, or some kind of monitoring script in a popup), so it sounds like that is also a primitive that would be more broadly useful.

beaufortfrancois commented 8 months ago

For info, as requested by developers, we proposed adding display-mode for picture-in-picture to CSS Media Queries Level 5.

@media all and (display-mode: picture-in-picture) {
  body {
    margin: 0;
  }
  h1 {
    font-size: 0.8em;
  }
}

See the PR here: https://github.com/w3c/csswg-drafts/pull/9920

steimelchrome commented 8 months ago

Thanks for clarifying @domenic, I understand the reasoning more clearly now.

A clarifying question: Where does this restriction of one PiP window per top-level traversable come from? Is it to prevent abuse and improve the end-user experience? Or is it an underlying platform limitation?

It's not an underlying platform limitation. It's more to prevent abuse and improve the end-user experience. In fact, in Chrome we currently only allow one video OR document picture-in-picture window total across all tabs, though that is stricter than the spec dictates. There's concern that multiple always-on-top windows can quickly become unwieldy

steimelchrome commented 8 months ago

As a small addition, we're also proposing explicitly allowing Window's focus() API to focus the opener window from the picturein-in-picture window, so that websites can programmatically return to the opener tab. This consumes a user gesture from the picture-in-picture window.

PR: https://github.com/WICG/document-picture-in-picture/pull/109 ChromeStatus: https://chromestatus.com/feature/6313015987404800 Intent to Ship: https://groups.google.com/a/chromium.org/g/blink-dev/c/eu2Vyh176wM

steimelchrome commented 8 months ago

Another addition we're proposing is a new boolean parameter disallowReturnToOpener, which defaults to false. When set to true, it hints to the user agent that showing a button in the document picture-in-picture UI that allows the user to return to the opener does not make sense for their use case, so the user agent can hide the button.

Initial request: https://github.com/WICG/document-picture-in-picture/issues/113 PR: https://github.com/WICG/document-picture-in-picture/pull/114, https://github.com/WICG/document-picture-in-picture/pull/116 ChromeStatus: https://chromestatus.com/feature/6223347936657408

edit 2024-02-28: changed allowReturnToTab to allowReturnToOpener edit 2024-03-14: changed allowReturnToOpener to disallowReturnToOpener

matatk commented 7 months ago

Hi @steimelchrome, thank you for your recent updates. We are still unclear as to whether options 2 and 3 from Lea's comment have been considered - could you point us to the outcome of any discussions on those?

hober commented 7 months ago

I'm still concerned that this feature doesn't appear to be widely implementable across platforms, as discussed in the WebKit standards-positions issue on this.

torgo commented 6 months ago

Hi @steimelchrome can you feed back on any updates to this proposal? Matthew asked a question above regarding Lea's feedback that looks like it's still pending. Thanks!

steimelchrome commented 6 months ago

Sorry for the delay.

Hi @steimelchrome, thank you for your recent updates. We are still unclear as to whether options 2 and 3 from Lea's comment have been considered - could you point us to the outcome of any discussions on those?

I don't think we have any written-down outcome/resolution I can point to. I just sent an email to Domenic to discuss further and I'll post a resolution here.

I'm still concerned that this feature doesn't appear to be widely implementable across platforms, as discussed in the WebKit standards-positions issue on this.

For Android (and possibly iOS, but I'm less familiar with iOS), with current system APIs we could implement a non-interactive version of document picture-in-picture (allowing the website to populate it with arbitrary HTML elements, but not actually allow input). There are some potential issues (e.g. a pip window that has an active media session would show media controls, which may or may not be appropriate depending on the use case), but we haven't seen any demand for this so we haven't pursued it. But you're right that arbitrary interactive HTML would not be implementable without new Android/iOS APIs to support it.

Otherwise, I'm not sure what changes we could make to the API to support these use cases on desktop while remaining 100% implementable on mobile. Do you have any ideas?

steimelchrome commented 6 months ago

We're also proposing allowing user gestures in the document picture-in-picture window to be usable in the opener window and vice versa. This makes it more ergonomic to use user-activation-gated APIs, since often event handlers in the document picture-in-picture window are actually run in the opener's context, so the opener's context needs access to the user gesture. This essentially makes the document picture-in-picture window act the same as a same-origin iframe inside the opener as far as user gesture propagation is concerned.

PR: https://github.com/WICG/document-picture-in-picture/pull/117 Chromestatus: https://chromestatus.com/feature/5185710702460928

beaufortfrancois commented 5 months ago

For info, Spotify folks are using the Document Picture-in-Picture API for their Miniplayer. You can learn more about their journey and use cases at https://developer.chrome.com/blog/spotify-picture-in-picture

LeaVerou commented 5 months ago

Hi folks,

We (@plinss @matatk and I) discussed this again during a breakout today.

Overall, we see why the current window.open() doesn’t work for what this API is trying to do, however it appears that all of these differences are things that would be useful for window.open() as well:

An async API to allow gating behind a permissions prompt
Feature detection for individual parameters
Allowing up to one window per top-level traversible
Ability to create "always on top" windows
...

We understand that improving window.open() is a substantial undertaking, however from an architectural point of view, we cannot justify creating a parallel, more narrowly scoped API for the sole reason of avoiding that work. Instead, we encourage people to work on the existing effort to modernize window.open() and ensure it covers these use cases as well.

The video-specific use cases appear to be covered already by video.pictureInPicture() so designing this as a more general API seems appropriate. It is unfortunate that not every existing platform can implement this API, but it is clear that there are use cases that go beyond video, so we think that as long as feature detection is possible and has good ergonomics, this may be worth doing.

hober commented 4 months ago

The TAG revisited this issue today, and have decided to close this review as unsatisfied. We would prefer enhancing window.open(), as described in Lea's comment above, as a way to address your use cases more in line with Web platform architecture.

(Personally, I also remain concerned with adding a feature like this to the Web platform without a clear strategy for making it available on entire classes of very popular, Web-capable devices.)

w3ctag / design-reviews

Early design review: Document Picture-in-Picture #798