API for display-capturing the current tab

eladalon1983 commented 3 years ago

Ya ya yawm TAG!

I'm requesting a TAG review of getCurrentBrowsingContextMedia.

Overview

Consider the existing navigator.mediaDevices.getDisplayMedia(). It allows a user unlimited choice of sources - any monitor, window or tabs.

We’re in the process of standardizing a new API - getViewportMedia - that will allow web-applications to present a simple confirmation-only prompt to the user. The security requirements of this API are under active discussion, but consensus is forming that both cross-origin isolation and a new opt-in header will be required.

Not all applications can accept these requirements - at least not in the short-term. However, by forcing such applications to use getDisplayMedia, the user is pushed towards the riskier option of sharing the entire monitor. Why is that the riskier option? Because at the moment capture starts, the entire current monitor includes the current tab. Note that the moment capture starts is sufficient for almost any attack, as all attacks we have thus far considered could be carried out using a single frame.

A hybrid API is deemed necessary in order to offer some of the benefits of getViewportMedia without its elevated security requirements. This hybrid API will allow the application to signal its preference for capturing the current tab by way of a new dictionary member parameter for getDisplayMedia. Namely, we will extend DisplayMediaStreamConstraints by adding another dictionary member called preferCurrentTab with a default value of false. When getDisplayMedia is invoked with preferCurrentTab=true, the browser will offer the current tab as the first option to the user, but will still offer unlimited choice of capture sources (see image below).

The unlimited choice of sources makes this new API compliant with the requirements of getDisplayMedia. Since it complies with the requirements of getDisplayMedia, the security requirements placed on getDisplayMedia are sufficient for this new hybrid API.

Screen Shot 2021-06-03 at 23 40 58

Links and Details

Explainer: bit.ly/3dJgLfS
Specification URL: https://eladalon1983.github.io/prefer-current-tab/
Security and Privacy self-review: TODO (I will edit this comment and add the link.)
Primary contacts (and their relationship to the specification):
- Elad Alon (@eladalon1983), Google,
Organization(s)/project(s) driving the specification: Google
Key pieces of existing multi-stakeholder review or discussion of this specification: getViewportMedia and its security-requirements
External status/issue trackers for this specification (publicly visible, e.g. Chrome Status): Chrome Status entry

Further details:

[X] I have reviewed the TAG's Web Platform Design Principles
Relevant time constraints or deadlines: We aim to ship in Chrome m92 or m93.
The group where the work on this specification is currently being done: WebRTC WG works on getViewportMedia, but is not interested in this hybrid API.
The group where standardization of this work is intended to be done (if current group is a community group or other incubation venue): WICG (I will link once this is in the WICG.)
Major unresolved issues with or opposition to this specification:
- Mozilla and Apple have voiced the opinion that getViewportMedia should be sufficient, and were not interested in "weakened" version.
- Our position, on the contrary, is that this hybrid is necessary and does not degrade security when compared to getDisplayMedia.
This work is being funded by: Google

You should also know that...

A word of caution over a source of potential confusion:

The name getViewportMedia is a later conclusion. Initially, that API was offered under the name getCurrentBrowsingContextMedia. Chrome has an active origin-trial for getCurrentBrowsingContextMedia which accomplishes the same thing as preferCurrentTab, but uses a new method instead of a new dictionary member. See the explainer.

We'd prefer the TAG provide feedback as (please delete all but the desired option): 💬 leave review feedback as a comment in this issue and @-notify @eladalon1983

annevk commented 3 years ago

Why does this use browsing context in its name? Does this survive navigations somehow?

cc @jan-ivar

eladalon1983 commented 3 years ago

The capture does not survive navigation - the capturing app is unloaded on navigation. I am open to renaming. Any thoughts on what could be a good name for this hybrid?

annevk commented 3 years ago

I'm not sure, but from the proposed UI this seems like an option (which would have a name related to viewport to stay consistent) you would pass to getDisplayMedia().

eladalon1983 commented 3 years ago

I did consider the option of an additional constraint to getDisplayMedia, but that becomes less convenient if getViewportMedia is ever extended to receive additional parameters that don't make sense for getDisplayMedia - something which I do plan. In that case, the hybrid gCBCM can be meaningfully extended to accept that parameter and apply it only if the user chooses the current tab. (I suggest we migrate this discussion the WICG repo when that one is set up. I can @mention you when it's time, if you'd like - let me know.)

jan-ivar commented 3 years ago

Note: this is a Google-only request, not a successor to #609 which is the request from the WebRTC WG.

I've closed https://github.com/w3c/mediacapture-screen-share/pull/148#issuecomment-840636775 to avoid confusion, and requested #609 be reopened.

As the OP mentions, we are opposed to this "hybrid" API.

Mozilla and Apple have voiced the opinion that getViewportMedia should be sufficient, and were not interested in "weakened" version.

jan-ivar commented 3 years ago

A hybrid API - getCurrentBrowsingContextMedia - is deemed necessary in order to offer some of the benefits of getViewportMedia without its elevated security requirements. This hybrid API will allow the application to signal its preference for capturing the current tab. The browser will then offer the current tab as the first option to the user, but will still offer unlimited choice of capture sources (see image below). The unlimited choice of sources makes this new API compliant with the requirements of getDisplayMedia.

An application signal does not alleviate the "elevated security requirements" if the application is malicious, it defeats them.

The getDisplayMedia API deters social engineering: "User Agents are encouraged to warn users against sharing browser display devices as well as monitor display devices where browser windows are visible, or otherwise try to discourage their selection on the basis that these represent a significantly higher risk when shared." ¹

Providing malicious applications with a method that does exactly what they need seems like a bad idea.

I also worry it would undermine adoption of getViewportMedia which requires sites to isolate to have this ability, specifically to mitigate this threat, which Chrome Security agrees is significant.

_{1. See the questionaire.md and subsequent links for details of these unobvious treats on the same-origin policy from sharing web surfaces under attacker control.}

dontcallmedom commented 3 years ago

Since I was confused and created confusion in terms of the relationship with #609, I thought I would summarize what I understand about this particular design review (at the request of @LeaVerou and @kenchris I was chatting with this morning):

the proposal in this issue hasn't been discussed (let alone endorsed) by the WebRTC Working Group
the proposal in this issue addresses similar needs as the ones identified for the getViewportMedia API (on which #609 focuses) but proposes a different solution
the proposal in this issue is essentially the equivalent of the API defined by the WebRTC Working Group getDisplayMedia, but with a specific hint to suggest the current browser tab should be captured - which as @jan-ivar commented on probably reduces the effectiveness of the mitigation set by getDisplayMedia() to avoid giving too much control to the API-calling-page on what is being captured

The motivation I understand behind the proposal in this issue is that getting the security model being developed for getViewportMedia (which requires any embedded resources to adopt & deploy new HTTP headers) is likely to be very challenging. I'm mentioning this in case the TAG would like to chime in more generally on other approaches that might make it easier to deploy getViewportMedia.

alvestrand commented 3 years ago

The Chrome decision on the "need for elevated permission" for getDisplayMedia (which presents all the capture surfaces without calling out special considerations about their risks) was based on the understanding that the most common use cases would be displaying a tab or displaying the screen, so it did not make much sense to increase the cognitive overload by calling out cases that had lower risk than the common ones.

It is logical based on this standpoint that presenting the present tab as a capture option doesn't need any more elevated warning; the warning is already elevated.

torgo commented 3 years ago

So just to clarify - is there now going to be one consolidated proposal merging #609 and #625? If so, can we agree to close one of these issues and update the other with the consolidated and agreed proposal?

eladalon1983 commented 3 years ago

There is not going to be a consolidated proposal. (Btw, the current proposal - #625 - is going to be amended today/tomorrow, so if it's possible to hold off on reviewing it for 2 days, that'd be better.)

torgo commented 3 years ago

Hi @eladalon1983 can you please clarify this. It's highly unlikely that the TAG is going to endorse a single proposal when there are multiple competing proposals from different vendors and lack of consensus. Happy to postpone until our next design review week - which will be the 14th of June. Hoping we can have better news by then.

eladalon1983 commented 3 years ago

Glad to clarify. There are no competing proposals.

609 is a proposal for a capture-this-tab API. That API will be gated behind (a) Cross-Origin Isolation and (b) an opt-in header.
625 is a proposal for a second API that achieves something similar, but does requires neither Cross-Origin Isolation nor an opt-in header.

I think the two proposals can be judged independently.

eladalon1983 commented 3 years ago

I've updated the original comment to reflect our change from a method-based API to a new-dictionary-member-based API.

eladalon1983 commented 3 years ago

Spec added. Could the labels be adjusted, @cynthia and @LeaVerou?

eladalon1983 commented 3 years ago

The long-term path (getViewportMedia) has a standard consensus track, and that is what is tracked in TAG issue #609. But this solution has multiple complexities and non-trivial security aspects that we still need to iron out. Therefore -
preferCurrentTab is a short-term measure that solves some use cases to some degree, and doesn't have the security problems associated with getViewportMedia.
After months of discussion, there is no consensus on getViewportMedia with Mozilla, so Chrome gave up and shipped preferCurrentTab.
We are still committed to getViewportMedia.

jan-ivar commented 3 years ago

After months of discussion, there is no consensus on getViewportMedia with Mozilla, so Chrome gave up and shipped preferCurrentTab.

@eladalon1983 What is the disagreement on getViewportMedia?

We are still committed to getViewportMedia.

I'm glad to hear this. Mozilla is eager to engage on this.

torgo commented 3 years ago

Thanks for the update @eladalon1983. We are going to review this at our "f2f" coming up on the 13th. I hope we can resolve and close the review by then.

torgo commented 3 years ago

Just discussed in our virtual f2f breakout. Thank you for clarifying that getViewportMedia is the long term proposal, we will focus our efforts on reviewing that. Can you provide a roadmap for how you see transitioning people from use of preferCurrentTab to getViewportMedia once the issues are resolved? The concern we have is that the web is full of technologies that were designed as short term stop gaps until a longer term thing could be worked out. We're rather not see another one added to that list.

eladalon1983 commented 3 years ago

Once the security measures getViewportMedia requires are sufficiently rolled out, applications will naturally migrate from preferCurrentTab to getViewportMedia, because the latter offers a superior UX; namely, the user is presented with a clearer choice, and cannot choose anything but the current tab.

Chrome has UMA tracking calls to getDisplayMedia with/without preferCurrentTab (and the API invocation's result). getViewportMedia will be associated with similar UMA.

When we feel that adoption is sufficient, or that the challenges to it are no longer as significant, we can (a) communicate publicly that preferCurrentTab is about to be deprecate and (b) start printing deprecation warnings to the dev-console whenever it is used.

torgo commented 3 years ago

Ok this sounds good. We still have concerns about interoperability and strongly encourage convergence on one consensus-based solution as you have laid out above.

w3ctag / design-reviews