w3ctag / design-reviews

W3C specs and API reviews
Creative Commons Zero v1.0 Universal
328 stars 55 forks source link

Element Capture #954

Open eladalon1983 opened 4 months ago

eladalon1983 commented 4 months ago

Hej TAG!

I'm requesting a TAG review of Element Capture.

A combination of pre-existing mechanisms (getDisplayMedia, Region Capture) already allows Web applications to capture a portion of the current tab as video MediaStreamTrack, robustly cropping away irrelevant pixels. Such videos can than be transmitted remotely; removing pixels not intended for sharing helps the sharing user's privacy, and prevents distraction by the receiving users. It also helps conserve compute and network resources.

Our new API, Element Capture, takes this a step further, allowing Web applications to remove unwanted occlusions. For example, if a private message notification appears over the shared region, it is possible to avoid capturing that message, which also avoids transmitting it remotely, and therefore helps uphold privacy guarantees implicitly made to the user, who had only intended to share the target-region, and not whatever happened to be drawn over it.

Further details:

You should also know that...

Strong positive Web developer feedback for this feature was expressed on https://github.com/screen-share/element-capture/issues/3 and during Screen Capture CG meetings.

torgo commented 1 month ago

Hi - some pieces of feedback from our TAG breakout this morning where we reviewed this:

It seems like the explainer is very lean. We think that there are a number of issues that need to be more fully explored before we can be more sure about this proposal.

In the use case that you're sharing a specific content area to an embedded iFrame (the use case in the explainer) what is the permissions flow for this scenario? For example - in current screen sharing scenarios, the user may be prompted to share a tab, a window, or the whole screen. What would the user be prompted for in this case? Would they be able to choose an alternative sharing target such as an other tab or the screen or is it envisioned that in this case they would be constrained to only share content from the designated application?

Can this be treated like an extension to ViewPortCapture? We note that this sort of sharing carries similar security risks as that API, and the additional constraints on capture in that API might be better suited to this use case than the more general getDisplayMedia.

The proposed API starts by preparing to share the whole of the content, and then restricting it to a particular part - have you considered ways to start with the specific part to be shared instead? (How would this affect occlusion?)

You have a goal of avoiding occlusion, but what about elements that are partially-transparent? Would this capture what is rendered behind an element?

eladalon1983 commented 3 weeks ago

(Note: Questions reordered to make the answers clearer, as later answers build on top of earlier ones.)

It seems like the explainer is very lean.

I aimed to make the explainer brief, and this article goes into more details and is more "instructive" in its tone. HTH?

You have a goal of avoiding occlusion, but what about elements that are partially-transparent?

Occluded content is "magic erased" from the capture as is occluding content. The article (link above) discusses this in detail, while the explainer, I acknowledge, only made passing and implicit reference to this fact ("frames produced on the restricted video track only consist of information from the target-element and its descendants"). Hope that's clear now. :-)

what is the permissions flow for this scenario? [...] What would the user be prompted for in this case?

This API builds on top of existing screen-sharing API, meaning that the permission flow remains entirely unchanged. An application would first call getDisplayMedia(...) or getViewportMedia() or any other past/future screen-sharing API, and the user would go through the usual selection process associated with it. It's only after this completes, if the user shares the (entire) current tab, that the Element Capture API can be invoked.

Can this be treated like an extension to ViewPortCapture?

That's an alternative approach that we have considered. But as of the time of writing, getViewportMedia() remains theoretical, several years after it was initially proposed. To ensure impact, we have shaped Element Capture to be agnostic of whether gDM, gVM or any other API produced the track which is being "restricted" by our new API.

[...] have you considered ways to start with the specific part to be shared instead?

I actually think that starting with the entire current tab, is a strength of the current API shape, because we lean on established methods to prompt the user to share something they know is compromising, and avoid giving them the false sense of security, that they are sharing "less". Imagine a user, for instance, sharing "just the X iframe" and not realizing that it could, at any moment, be navigated, or load cross-origin resources... But sharing the entire current tab, that's a concept users already understand, and they know that it requires elevated trust.

matatk commented 2 weeks ago

Thanks for your detailed reply @eladalon1983. The article you linked to answers several questions; thanks for that too. It would be really helpful for review, and future reference, if you could that content from the article into the explainer; it's OK to give a bit of the 'how to' info, as long as the explainer starts with the user needs being solved. That info, and the code snippets, helps to convey the intended API shape.

There are a couple of additional things that we'd really like to see in the explainer:

Thanks in advance; we are looking forward to learning more about the above.