Revisit: Let getDisplayMedia() influence the default type choice in the picker

alvestrand commented 3 years ago

We continue to have a strong demand from Web developers for functionality that lets them influence what kind of display surface the user will capture; this is one of the core differences between the pre-standard "Chrome extension API" and the WG-defined getDisplayMedia() function.

Such a functionality is easy to add (allow a constraint on capture surface type). If it does not block the user from picking other things, but merely changes the default capture surface (currently "screen" on both Chrome and Safari), it doesn't seem to be a huge increase in user risk exposure.

Example comment: https://twitter.com/RickByers/status/1403349775387353089?s=19

jan-ivar commented 3 years ago

Sorry this is not ready for PR.

jan-ivar commented 3 years ago

This would revisit an existing WG decision. Has new information surfaced since https://github.com/w3c/mediacapture-screen-share/pull/32 to consider it? cc @martinthomson

Such a functionality is easy to add

As I recall, this was not among the concerns. The concerns, outlined in the Security and Privacy Questionnaire, were the security risks of sharing a web surface under attacker control: that it allows active attacks on the same-origin policy. The only hurdle is socially engineering users to select it.

it doesn't seem to be a huge increase in user risk exposure.

This stems partly from Chrome already violating the spec's recommendations by neither implementing elevated permissions for web sources specifically nor warning users about their elevated risk. See crbug 920752 for context.

jan-ivar commented 3 years ago

As I replied on twitter, we'd like to focus on https://github.com/w3c/mediacapture-screen-share-extensions/issues/9: a proposal to give web-pages that meet the security-criteria [agreed on with Chrome Security], preferential placement in the getDisplayMedia() picker. This seems like the safe and responsible way to proceed for the long-term.

eladalon1983 commented 3 years ago

As I replied on twitter, we'd like to focus on w3c/mediacapture-screen-share-extensions#9: a proposal to give web-pages that meet the security-criteria [agreed on with Chrome Security], preferential placement in the getDisplayMedia() picker. This seems like the safe and responsible way to proceed for the long-term.

When/where has Chrome Security given their blessing to w3c/mediacapture-screen-share-extensions#9? To preferential placement of certain documents in the media picker? AFAIK, Chrome Security has spoken for cross-origin isolation and an opt-in header for getViewportMedia. That's a different topic.

alvestrand commented 3 years ago

Yes, this is asking to revisit an existing WG decision. A PR is a perfectly adequate tool for showing what the resulting change would be.

alvestrand commented 3 years ago

The argument is that the WG decision was based on wrong information and an inadequate security evaluation, and that the WG decision has led to a lack of conformance to the WG specification in the market. We're asking to revisit it.

jan-ivar commented 3 years ago

@eladalon1983 They haven't. I said "Their view on a similar proposal [#155] for easy self-share is that it requires not only site-isolation but opt-in from targets in order to be safe", and then "We've put forth a proposal that would give web-pages that meet these security-criteria preferential placement in the getDisplayMedia() picker."

Chrome Security has spoken for cross-origin isolation and an opt-in header for getViewportMedia.

We're interpreting the scope of their advice differently. They said "this as a larger problem of APIs that might leak data from cross-origin resources at the page-level."

That's a different topic.

Different API, same topic: how to have webpages capture webpages safely.

jan-ivar commented 3 years ago

@alvestrand In order to not waste the WG's time, I believe it is customary to introduce new information with such requests, is it not? Simply asking for a re-vote doesn't seem productive, because what would make a different outcome likely?

The argument is that the WG decision was based on wrong information and an inadequate security evaluation, ...

A discovery that old information is wrong might qualify as new information, if it can be substantiated. Would you be able to point out prose in the spec or its security questionnaire that is wrong?

alvestrand commented 3 years ago

Part of the discussion showing developer interest is in https://crbug.com/904831 and bugs duplicated into it.

The current API is based on the presumption that in user story flows that involve capturing something, the user story flow is neutral as to what type of surface is to be handled, and that any input is going to be considered valid.

This presumption is clearly absurd; in nearly every user story that involves capturing something, the story involves capturing exactly one type of surface, and the idea that it should be impossible for the application to incorporate this information into its user flow is just not logical.

The idea that "an application shouldn't push the user towards sharing more dangerous surfaces" is only valid if the value of sharing a more dangerous surface and sharing a less dangerous surface has equal value to the user; this is wrong. The user wants to present what the user wants to present, and that's either a more dangerous surface or a less dangerous surface; putting obstacles in the way of the user for doing what the user needs to do can never be a good UI design.

Putting up dialog boxes and confirmation buttons has some value. Forcing the user to consider options that he is not going to choose anyway, because it is not what he wants to do, has none.

alvestrand commented 3 years ago

as to "what's in the spec is wrong": I was trying to find justification for why the lack of a "what you want to share can probably be found in this category" constraint was a security feature. I did not find it in either document.

This section:

"Not accepting constraints for source selection means that getDisplayMedia only provides fingerprinting surface that exposes whether audio, video or audio and video display sources are present. (This is a fingerprinting vector.)"

doesn't compute for me.

And this section (from the security questionnaire):

The decision of what to share (or whether to share anything at all) rests entirely with the end-user. Websites cannot influence this choice in any way

is listed as an answer to the question "Is this specification exposing the minimum amount of information necessary to power the feature?"

It is not at all clear that it is an answer to the question, and again, it does not reflect a reasoning behind it.

tsahilevi commented 3 years ago

Today the main complaint I see from vendors and users is the amount of clicking that are needed to get screen sharing done. At the moment, that bare minimum is 3 assuming you're aiming for full screen. 4 (or more) for anything else. This isn't user friendly to say the least.

Having the ability for the application to hint on the desired screen sharing default choice would be a good start to remedy that. I'd also feel better if that hint/selection would be "selected" in order to reduce yet another mouse click.

arnaudbud commented 3 years ago

We at RingCentral think this would be useful to us.

ajf101 commented 3 years ago

At Pexip, we think this would be a useful feature too. Giving the opportunity to save historic preferences is one benefit, but the most significant benefit for us would be to guide towards the appropriate option which supports audio sharing (i.e. go straight to tab capture specifically on Mac).

emcho commented 3 years ago

This looks good and I find it could prove useful to the Jitsi Meet app suite. Thanks for doing the work!

bbaldino commented 3 years ago

We'd be interested in this for Webex as well.

alper-teke commented 2 years ago

As Atos we're very much interested in this too

jan-ivar commented 2 years ago

@alvestrand and @eladalon1983 suggested some UX mitigations this morning that might let us move forward here.

The spec could strongly recommend that user agents:

Remove the requesting tab from the list of available "browser" sources, or hide/warn/discourage picking it.
Remove the requesting tab's window from the list of available "window" sources, or hide/warn/discourage picking it.

This would by no means be a catch-all — same-origin documents may lurk in other tabs and tabs' BFCache — but should preserve the social engineering obstacle to basic click-through active attacks.

Self-capture use cases typically don't want a picker anyway, and will be best served by getViewportMedia https://github.com/w3c/mediacapture-screen-share/issues/155.

alvestrand commented 2 years ago

If we follow the advice in 1. - should this apply to just the requesting tab, or to all tabs with the same origin? Same-origin tabs have the ability to manipulate each other, so a trivial workaround for this restriction would be to open up another tab in which to do the dastardly deeds before calling getDisplayMedia.

eladalon1983 commented 2 years ago

If we follow the advice in 1. - should this apply to just the requesting tab, or to all tabs with the same origin? Same-origin tabs have the ability to manipulate each other, so a trivial workaround for this restriction would be to open up another tab in which to do the dastardly deeds before calling getDisplayMedia.

The same workaround could be applied with tabs that only appear to be cross-origin. Namely:

evil.com runs in tab1 and opens collaborator.com in a new tab - tab2.
collaborator.com embeds a "mailman" iframe with an evil.com document.
Technically speaking, these tabs are not same-origin.
Practically speaking, collaborator.com, in tab2, can postMessage() to the "mailman" evil.com iframe, which can use a BroadcastChannel to shuttle these messages to the evil.com document in tab1.
Any concern we had about evil.com running in both tab1 and tab2 now apply, because collaborator.com does evil.com's bidding.

Because of this, I think the recommendation need not apply to same-origin other tabs.

youennf commented 2 years ago

It seems valuable to me to provide as precise as possible guidelines. For instance, it is problematic to have capturing document be the opener of captured document. Same-origin tabs is also problematic as noted by @alvestrand.

As of iframe communication, origin partitioning should help preventing example.com/evil.com iframe to communicate (through BroadcastChannel, IDB...) to example2.com/evil.com or to evil.com.

eladalon1983 commented 2 years ago

As of iframe communication, origin partitioning should help preventing example.com/evil.com iframe to communicate (through BroadcastChannel, IDB...) to example2.com/evil.com or to evil.com.

I've made a demo. Please launch these two tabs side by side and wait ~5s:

What you see here is that these two cross-origin tabs can talk to each other. So tabs that are not same-origin may nevertheless collude to produce behavior identical to capturing a same-origin other-tab.

youennf commented 2 years ago

What you see here is that these two cross-origin tabs can talk to each other

Right, and this is something that Safari prohibits. I believe this is also being worked on in other browsers, see https://privacycg.github.io/storage-partitioning/

youennf commented 2 years ago

See https://github.com/whatwg/html/issues/5803 for BroadcastChannel specifically.

eladalon1983 commented 2 years ago

Is Safari planning to prohibit cross-origin tabs talking to each other using a shared server exposing a RESTful API designed to facilitate this communication? Because evil.com and collaborator.com can try that, too.

youennf commented 2 years ago

I am not sure what you are referring to but third-party cookies are also highly restricted in Safari. Going back to the actual issue, given that same domain tab communication is not restricted, contrary to cross domain tab communication, it makes sense to me to mention @alvestrand point in the spec.

eladalon1983 commented 2 years ago

I am not sure what you are referring to

TL;DR: My point is that same-origin tabs aren't as special as they initially sound. Reasoning: If tab X opens and captures a tab Y, then there is a risk that they can communicate even if they are not technically same-origin. Currently you can use a "mailman" in the form of an iframe+BroadcastChannel; in the future you could use a PeerConnection, shared cloud infrastructure, etc. We will never sandbox these tabs enough to prevent the kind of attacks self-capture is concerned with. If they are both connected to the Internet, they can find each other, establish a connection, and collaborate to the point of being indistinguishable from a single app that's capturing itself.

jan-ivar commented 2 years ago

Right, and this is something that Safari prohibits.

@youennf Firefox Nightly appears to block it as well FWIW (just an observation).

If they are both connected to the Internet, they can find each other

@eladalon1983 True. Parties may also (co-)own both evil.com and collaborator.com outright, rendering communication between them unnecessary. This is why we should never pre-select a picker choice for instance.

There's no silver bullet here (that's https://github.com/w3c/mediacapture-screen-share-extensions/issues/9). All we can do is try making it more costly to exploit, and make the minimum necessary activity look slightly more suspicious.

Relatively speaking:

A site opening new tabs seems (marginally) more suspicious than one that doesn't
A site presenting the user with a different domain in the URL bar seems more suspicious than one that doesn't
A site that does both should raise more suspicion than 1 and 2.

This may not be a lot, but I also don't think that means we should allow self-capture of the requesting tab outright.

It seems valuable to me to provide as precise as possible guidelines.

I think looking at same-origin (minus port?) of both the requesting doc and its opener (chain?) are interesting ideas. But we're also discussing heuristics at this point, so I wouldn't prevent a UA from going further (e.g. using a deny list or other inputs besides these heuristics into some risk score that determines whether to trigger a warning (or block/hide) based on it).

alvestrand commented 2 years ago

Note to "A site presenting the user with a different domain in the URL bar seems more suspicious than one that doesn't" - this heuristic will trigger for any two domains that are both hosted at Amazon AWS, I think.

eladalon1983 commented 2 years ago

I believe some good progress was made yesterday at the WebRTC Working Group's September interim meeting. IIUC, the following issues remains:

1. Constraints

@youennf voiced a preference to use something other than constraint. @youennf, could you present the alternative you would favor?

2. Using "ideal"

Unless we use a mechanism other than constraints, we will only allow ideal constraints for displaySurface. Agreed?

3. Warning for self-capture

@jan-ivar expressed a strongly held belief that we must specify that the user agent MUST warn the user about self-capture. I am fine with that. Shall we jump on that?

I suggest adding this line to the spec, in a general place, and not specific to display-surface-type-influencing:

The user agent MUST strongly warn the user about the dangers of self-capture.

4. Risky surface types

I agree that the user agent should be allowed to stop the application from influencing the user towards riskier surfaces. I disagree with the notion that we should draw the line for the user agent. I propose this formulation:

Define as [the user agent's default displaySurface] that surface type which
the user agent presents most prominently when no displaySurface constraint
is specified.

The user agent MUST ignore the displaySurface constraint,
if adhering to this constraint would make the user agent present
as most prominent a surface type which is riskier than its default.

With this in place:

The text is future-proof for the introduction of future display surface types (think @jan-ivar's origin-isolated application for instance).
The text works equally well for UAs where the current default is screen and for those which have a different default.
The text allows screen to be meaningfully employed by user agents whose pickers differentiate the current screen from secondary screens. (Modulo difficulties arising from moving browser surfaces between screens - can be addressed in the future.)

jan-ivar commented 2 years ago

@youennf voiced a preference to use something other than constraint.

getDisplayMedia already uses a simplified version of constraints (it throws TypeError on advanced, min, or exact), so why not use it?

If the displaySurface constraint wasn't already defined then perhaps a different API might have value. But since it does, respecting it on input seems like the obvious choice here. Just throw TypeError on "monitor".

Unless we use a mechanism other than constraints, we will only allow ideal constraints for displaySurface. Agreed?

Yes, it's already the case today that exact throws TypeError. Though note the technical term is optional-basic-constraints, and the literal "ideal" keyword is superfluous in practice (plain values are already ideal).

3. Warning for self-capture

@jan-ivar expressed a strongly held belief that we must specify that the user agent MUST warn the user about self-capture. I am fine with that. Shall we jump on that?

I suggest adding this line to the spec, in a general place, and not specific to display-surface-type-influencing:
The user agent MUST strongly warn the user about the dangers of self-capture.

Personally I'd like that, but with my chair hat on I'd say we'd have trouble enforcing a normative MUST on UX. How would we test "strongly"?

This is why (with chair-hat off) I wanted to forbid self-capture outright, something we conceivably could test for, albeit manually (e.g. "MUST NOT return the document's viewport.")

jan-ivar commented 2 years ago

The user agent MUST ignore the displaySurface constraint,
if adhering to this constraint would make the user agent present
as most prominent a surface type which is riskier than its default.

This seems vague and hard to test. I prefer being specific here and throw on "monitor".

meaningfully employed by user agents whose pickers differentiate the current screen from secondary screens.

Users likely have means to introduce web surfaces on any monitor at their whim, so secondary screens are no safer.

Web surfaces aside, full screen capture should be discouraged for privacy reasons anyway, as they are the most oversharing.

youennf commented 2 years ago

If the displaySurface constraint wasn't already defined then perhaps a different API might have value

There are a few reasons:

The spec is talking about displaySurface as a setting and as a capability, not as a constraint, contrary to suppressLocalAudioPlayback say. Are browsers rejecting {displaySurface: { exact :...}}
displaySurface is not currently supported in implementations as a constraining value. The way we will treat it will be specific: we might need to add code to check for 'exact'.
We might want to not support screen. Again specific code for just that property.
Putting more of the type checks using WebIDL is a good pattern.
Why should we allow two way of writing the same things (with or without ideal?)
We might want to extend picker preferences in the future. As an example, if all tabs of a given domain opted in getViewportMedia protections, it might be ok to allow one of this tab to ask to push user towards selecting one of these tabs over other tabs.

The user agent MUST strongly warn the user about the dangers of self-capture.

We are in UX land, so I would go with guidelines and not mandatory statements that we will never be able to test/enforce.

jan-ivar commented 2 years ago

The spec is talking about displaySurface as a setting and as a capability, not as a constraint,

It is explicitly listed as a constrainable property and a constraint.

Are browsers rejecting {displaySurface: { exact :...}}

They would if any of them were implemented to spec: "If CS contains a member whose name specifies a constrainable property applicable to display surfaces, and whose value in turn is a dictionary containing a member named either min or exact, return a promise rejected with a newly created TypeError."

Which means this MUST throw TypeError:

await navigator.mediaDevices.getDisplayMedia({video: {displaySurface: {exact: "window"}}});

The spec also says: "While min and exact constraints produce TypeError on getDisplayMedia(), this specification does not alter the track.applyConstraints() method. Therefore, they may instead produce OverconstrainedError or succeed depending on values, ..."

Which means this MUST throw OverconstrainedError in practice (no browser offers "application" today):

const stream = await navigator.mediaDevices.getDisplayMedia();
const [track] = stream.getVideoTracks();
await track.applyConstraints({displaySurface: {exact: "application"}});

displaySurface is not currently supported in implementations as a constraining value. The way we will treat it will be specific: we might need to add code to check for 'exact'.

Browsers are already required to do so by the spec.

We might want to not support screen. Again specific code for just that property.

We can trivially add "monitor" to the list of TypeErrors.

Why should we allow two way of writing the same things (with or without ideal?)

You could say this about any constraint in mediacapture-main, so this seems like a larger issue, and out of scope for this issue.

if displaySurface weren't already a constraint, a novel API might have some merit, but the spec already mandates browsers implement it as a constraint, so why would we allow two ways of writing the same thing?

youennf commented 2 years ago

Here is a proposal:

Change getDisplayMedia parameter to take an optional DisplayMediaStreamConstraints dictionary
List explicitly the constraints supported by audio (restrictOwnAudio, suppressLocalAudioPlayback) and video (width, height, frameRate, maybe some others) in dedicated (boolean or dictionary)
Add a new property to the video constraints dictionary specifically for the surface, which would take a typed enum instead of the not-tightly-typed ConstrainDOMString.

This will make it clearer what makes sense in getDisplayMedia and what does not. I do not believe this will cause any compatibility issue as this would match browser existing behavior more closely than what the spec is doing.

Over time, we could further clean up things:

remove the min/exact fields from getDisplayMedia WebIDL parameters. This will make getDisplayMedia to ignore them instead of rejecting
remove the ideal syntax if we see it is not used in the field by migrating values from IDLConstrainBoolean to boolean for instance.

jan-ivar commented 2 years ago

Here is a proposal:

Change getDisplayMedia parameter to take an optional DisplayMediaStreamConstraints dictionary

What change? This is literally already specified.

List explicitly the constraints supported by audio (restrictOwnAudio, suppressLocalAudioPlayback) and video (width, height, frameRate, maybe some others) in dedicated (boolean or dictionary)

It already does that, as required by mediacapture-main [2].

Add a new property to the video constraints dictionary specifically for the surface, which would take a typed enum instead of the not-tightly-typed ConstrainDOMString.

This will make it clearer what makes sense in getDisplayMedia and what does not.

I don't see how it does. This seems inconsistent with all other constraints for no reason. It also seems redundant, given we already have the displaySurface constraint. How would this new property interact with the existing constraint? What are the user benefits?

I do not believe this will cause any compatibility issue as this would match browser existing behavior more closely than what the spec is doing.

Like it or not, the the Constrainable Pattern has been implemented quite faithfully in all browsers. There is some value in being consistent at this point, or at least require good reasons not to be, which I don't see here.

Over time, we could further clean up things:

remove the min/exact fields from getDisplayMedia WebIDL parameters. This will make getDisplayMedia to ignore them instead of rejecting

The working group chose to reject these inputs instead of silently ignoring them, in order to give early feedback to callers, so this is on purpose. I don't see new evidence here to reopen that decision.

remove the ideal syntax if we see it is not used in the field by migrating values from IDLConstrainBoolean to boolean for instance.

Across the board, or just for this constraint? This seems like a minor implementer inconvenience in order to stay consistent with the pattern web developers recognize from getUserMedia. It would also be a web compat issue at this point.

youennf commented 2 years ago

What change? This is literally already specified.

Sorry, I was meaning we would not reuse MediaTrackConstraints but define specific DisplayAudioTrackConstraints and DisplayVideoTrackConstraints dictionaries, with clearly identified constraints, meaningful to screensharing.

That would prevent throwing for instance if adding an audio constraint exact property added in the video constraints part of a getDisplayMedia call. I just tested it and browsers seem to consistently allow this.

Then, instead of adding a ConstrainDOMString, we would add an enum on displaySurface. Or we could name it preferredSurface to make it clear this is a preference, not a choice. We would get better WebIDL typing. And browsers would not have to start throwing on navigator.mediaDevices.getDisplayMedia({ video : { displaySurface : { exact : 'window' }}}).

jan-ivar commented 2 years ago

I don't think reinventing the constrainable pattern in gDM buys users anything. 99% of users of this API already have to learn gUM. Keeping things consistent for them trumps implementer convenience.

jan-ivar commented 2 years ago

That would prevent throwing for instance if adding an audio constraint exact property added in the video constraints part of a getDisplayMedia call. I just tested it and browsers seem to consistently allow this.

That seems fine to me, as I see no support for throwing in that specific case. With gDM, "an audio constraint ... added in the video constraints" is not hitting "a constrainable property applicable to display surfaces".

Instead, we generally ignore "any constrainable property inside of CS that are not defined for MediaStreamTrack objects of type kind. This means that audio-only constraints inside of "video" and video-only constraints inside of "audio" are simply ignored rather than causing OverconstrainedError."

IOW no use in throwing over validity of constraints already ignored.

Then, instead of adding a ConstrainDOMString, we would add an enum on displaySurface. Or we could name it preferredSurface to make it clear this is a preference, not a choice. We would get better WebIDL typing. And browsers would not have to start throwing on navigator.mediaDevices.getDisplayMedia({ video : { displaySurface : { exact : 'window' }}}).

Throwing is good, and browsers would still throw with your change. The only difference would be having WebIDL taking care of it for the browser, which seems a big API surface change for a very small benefit solely to implementers.

alvestrand commented 2 years ago

I also agree that reusing constraints is less horrible than adding another way of doing this.

jan-ivar commented 2 years ago

My attempt to summarize from the September interim (which was the last mention of this issue I found):

General agreement that user agents looking at a surface type preference passed into getDisplayMedia to optimize UX toward this application preference is OK, provided it steers users away from monitor capture, and provided it doesn't remove any choices.
We mostly agree this surface type preference is the existing displaySurface constraint with a plain (ideal) value (one participant stated a preference for a new API instead)
General agreement that user agents SHOULD warn users that self-capture is not safe, if they choose self-capture.

I think this is ready for PR.

eladalon1983 commented 2 years ago

I agree that it's ready for PR, if the PR says that the user agent MAY reject/ignore any type it deems risky. (I have no strong preference between reject and ignore.)

As mentioned, Chrome currently defaults to offering screens first, and this is something that I would very much like to change. However, changing established patterns has the effect of ruffling many feathers, and applications are not less upset when we tell them that the new behavior is "WAI, see spec." The reality is that Chrome rolls back changes that prove overly unpopular. To cite one recent example, colleagues of mine have run into such issues when aligning Chrome implementation with the spec with respect to making getUserMedia wait for focus, and this ended up being rolled back.

Specs are much more useful when they're in-line with actual implementations.

We don't want browsers to default to screens-first when no-ideal-screen-specified, but Chrome might be pushed to do so if the Working Group insists on a puritan stance that mandates "MUST reject" instead of "MAY reject" for "monitor". For the sake of the Web, let's allow each other room to maneuver.

youennf commented 2 years ago

The reality is that Chrome rolls back changes that prove overly unpopular.

This is fine. If browsers cannot ship something, specs should be aligned. If some browsers are able to ship it but not others, we need to discuss what to do.

colleagues of mine have run into such issues when aligning Chrome implementation with the spec with respect to making getUserMedia wait for focus, and this ended up being rolled back.

I think it is worth filing an issue. Maybe we should update the spec.

if the Working Group insists on a puritan stance

I don't think this comment helps moving the discussion forward.

As mentioned, Chrome currently defaults to offering screens first, and this is something that I would very much like to change.

I like this. We already discussed in a past meeting the possibility for Chrome to do this change without waiting for either this PR or any additional spec change by introducing Chrome specific APIs. As an example, Chrome did a great work moving the default RTCPeerConnection from PlanB to Unified Plan (congrats on that!) without requesting any change from WebRTC specs. Similarly, Chrome can extend values provided to getDisplayMedia to do this migration. One approach would be to extend DisplayMediaStreamConstraints with a Chrome-specific 'defaultingToScreen' member, that could be true initially, then false, then removed.

As of the debate of reusing constraints or not, I would like to summarise why I do no think we should use constraints for this:

Using constraints is enforcing a model that is not great for getDisplayMedia: we do not like exact constraints, we do not like some constraint values ('screen'). We would need to add specific rules that people will need to read and understand. This is harder than it should: just looking at a self explanatory type declaration should be all that is needed here.
Using constraints is not future proof as it restricts the type of hints we might use to a fixed set of values. In the future, we might want to extend the selection hints: prefer capture 'self' tab, prefer capture 'same-origin isolated tab'... A separate property allows greater flexibility.
There is consensus that constraints are overly complex (see comment from media capture depth) and that we should try to not extend usage of constraints if there is a nice alternative.
There is no real downsides to introducing a new property in DisplayMediaStreamConstraints AFAIK, which makes it a nice alternative.

eladalon1983 commented 2 years ago

if the Working Group insists on a puritan stance

I don't think this comment helps moving the discussion forward.

I think my comment is polite and relevant. It discusses an important principle - the trade-off between what the Working Group wants (crisp-clear spec that mandates user agents MUST provide the best thing for the user, with no regard for the constraints of historical decisions), and what the Working Group can reasonably expect to have (an ever-shifting compromise between multiple entities with various constraints).

We already discussed in a past meeting the possibility for Chrome to do this change without waiting for either this PR or any additional spec change

Could you point to precedents where Apple or Mozilla have knowingly accepted spec-changes that would put them in violation of the spec, where a minor change would have eliminated this problem? (Clarification - this refers to MUST reject 'monitor' vs. MAY reject 'monitor'. When it comes to constraints, I am not a fan either, but I leave this discussion in your most capable hands.)

youennf commented 2 years ago

I think my comment is polite and relevant.

The comment dismisses the 'reject screen' position as if there is no other ground than purity. Let's focus on finding some common ground instead.

Could you point to precedents where Apple or Mozilla have knowingly accepted spec-changes that would put them in violation of the spec

Sure, this happens all the time, for instance when renaming properties. In those cases, backward compatibility is what we look at to understand whether we can make that change and how we can do it. It is fine if, for a given period of transition, the implementation is not fully aligned with the spec as long as there is a practical path to match the spec that people want to follow.

AIUI, that is what you are doing here: identify backward compatibility issues and evaluate how to solve them. Given there seems to be solutions that Chrome used successfully in the past (needs validation from your side), and given you expressed interest in fixing those issues, I am thinking we have a path forward. Wdyt?

Also, I do not think the PR would say that a UA MUST NOT select 'screen' as default surface (which would outlaw Chrome). The PR would say that a web page can use a property to hint at selecting tab or window, not screen. The PR would not prevent implementing another property that would enable/disable a legacy 'pick screen first' behavior.

eladalon1983 commented 2 years ago

The comment dismisses the 'reject screen' position as if there is no other ground than purity.

The discussion has taken place over many threads, comments, editors' meetings and WG interim meetings. I remember a strong objection to "MAY reject" in favor of "MUST reject." I remember but one reason, and I have addressed it. If my recollection is faulty, I welcome correction.

Given there seems to be solutions that Chrome used successfully in the past (needs validation from your side), and given you expressed interest in fixing those issues, I am thinking we have a path forward. Wdyt?

I think events will unfold as follows, if we merge a PR that says "MUST reject monitor":

We merge the PR.
Chrome implements it verbatim, but retains the default behavior of screens-first when no ideal surface is specified.
Chrome introduces changes in a certain Chrome version that simultaneously (i) change the default behavior to tabs-first but also (ii) allow ideal: 'monitor' to trigger the old behavior. Part (ii) is intended to be a temporary off-ramp.

And here the path diverges.

If there is no loud opposition, the off-ramp is dismantled and 'monitor' is removed. Everybody is happy (me, WG, Chrome Security, Chrome Privacy).
If there is sufficiently loud opposition, the off-ramp becomes a permanent fixture, in direct violation of the spec.

The second possibility would be less painful if we just s/MUST/MAY.

The PR would not prevent implementing another property that would enable/disable a legacy 'pick screen first' behavior.

It is unlikely that Chrome would implement the off-ramp as anything other than ideal: 'monitor'. What would be the point?

youennf commented 2 years ago

3. but also (ii) allow ideal: 'monitor' to trigger the old behavior.

Question: why are you assuming the solution should be to use the same standard property but with a non standard 'monitor' value? If this is expected to be a temporary thing, using a separate property has benefits:

Using a non standard value might become a compatibility issue as other browsers may start breaking pages by rejecting instead of ignoring.
A separate property makes it clear this is non standard.
A separate property will not violate any spec.
A separate property allows more flexibility in terms of API shape and feature detection (navigator.mediaDevices.defaultToScreenCapture?).

If, in the long run, Chrome identifies it needs 'monitor', this might well have an impact on other browsers: they might need to implement some form of 'monitor' as well for the exact same reasons. At that point, the spec & API could be updated. But I hope we agree that our current evaluation is that 'monitor' will be a temp thing and that we should design API based on this assumption.

dontcallmedom commented 2 years ago

The second possibility would be less painful if we just s/MUST/MAY.

FWIW, if MUST is not realistic, and MAY too weak, this sounds like SHOULD would be a good representation of our intent while recognizing the reality of the world.

youennf commented 2 years ago

MAY/SHOULD if applied to promise rejection is not great due to potential browser compat. SHOULD ignore would leave that to UA territory, which is more flexible (though a strange API).

eladalon1983 commented 2 years ago

Question: why are you assuming the solution should be to use the same standard property but with a non standard 'monitor' value?

Because it minimizes the logic that has to be written. Why would Chrome want to code two parallel codepaths that accomplish virtually the same thing?

Using a non standard value might become a compatibility issue as other browsers may start breaking pages by rejecting instead of ignoring.

Is there a good reason for the spec to mandate the the user agent MUST reject? Can't we mandate MAY ignore? After all, the user can always select a surface type other than the ideal one, so ignoring the ideal surface should be reasonable from the POV of all entities (UA, application, user).

SHOULD ignore would leave that to UA territory, which is more flexible (though a strange API).

I think that MAY ignore is less strange than SHOULD ignore, but I'd accept either.

youennf commented 2 years ago

Because it minimizes the logic that has to be written. Why would Chrome want to code two parallel codepaths that accomplish virtually the same thing?

On Chrome side maybe, probably not much though. Other browsers might have to write code to actually accept but ignore this value.

w3c / mediacapture-screen-share