w3c / mediacapture-screen-share

Media Capture Screen Capture specification
https://w3c.github.io/mediacapture-screen-share/
Other
86 stars 28 forks source link

Revisit: Let getDisplayMedia() influence the default type choice in the picker #184

Closed alvestrand closed 2 years ago

alvestrand commented 3 years ago

We continue to have a strong demand from Web developers for functionality that lets them influence what kind of display surface the user will capture; this is one of the core differences between the pre-standard "Chrome extension API" and the WG-defined getDisplayMedia() function.

Such a functionality is easy to add (allow a constraint on capture surface type). If it does not block the user from picking other things, but merely changes the default capture surface (currently "screen" on both Chrome and Safari), it doesn't seem to be a huge increase in user risk exposure.

Example comment: https://twitter.com/RickByers/status/1403349775387353089?s=19

eladalon1983 commented 2 years ago

I have posed a question to you. ("Is there a good reason for the spec to mandate the the user agent MUST reject? Can't we mandate MAY ignore?") I'd like to remind you of this question.

I think understanding the objection to "MAY ignore" is the core issue we have left. (You and @jan-ivar might have the issue of constraints left, but I think that's orthogonal.)

youennf commented 2 years ago

Is there a good reason for the spec to mandate the the user agent MUST reject?

When you define a type in WebIDL as an enumeration, rejection is happening when the value passed to a method is not valid. To implement the ignore rule with enum + current WebIDL, implementors would be required to add 'screen' as a valid value but ignore it, thus putting burden to those User Agents that have no use of this value. And putting burden on developers to understand that 'screen' is a valid but ignored value...

What would be nice is if we had an open-ended enumeration: if value is understood, use it and if it is not, ignore it. Spec would define all values except 'screen', Chrome might understand 'screen' hopefully for a limited time. This probably requires WebIDL changes, https://github.com/whatwg/webidl/issues/893.

This would also allow to more easily extend the enumeration should we have the desire to do so.

  • You have insinuated that there are more reasons than purity.

Insinuated? I think specific reasons have been provided.

  • I don't think that the few lines of code to ignore 'monitor' are the reason.
  • Is compatibility the reason? I think I have addressed that too...?

This argument goes both side: some lines of code to ignore monitor by all UAs vs. some lines of code to add a separate property (that would do nothing than override the value to monitor in native code) by one UA.

eladalon1983 commented 2 years ago

I think specific reasons have been provided.

My apologies for missing those. Please link me to the relevant comment, so that I may re-read it.

When you define a type in WebIDL as an enumeration, rejection is happening when the value passed to a method is not valid. To implement the ignore rule with enum + current WebIDL, implementors would be required to add 'screen' as a valid value but ignore it, thus putting burden to those User Agents that have no use of this value.

And putting burden on developers to understand that 'screen' is a valid but ignored value...

I've made an argument as to why that is a non-issue here. It starts with "After all, the user can always select a surface type other than the ideal one..."

youennf commented 2 years ago

I've made an argument as to why that is a non-issue here. It starts with "After all, the user can always select a surface type other than the ideal one..."

What I was meaning is that, as a web developer, I set 'monitor', which is a valid value as per spec/WebIDL, but the picker is not defaulting to 'monitor', or it does but only in specific Chrome versions. Or, as a web developer, I set 'monitor' and getDisplayMedia throws. This would be surprising. As a web developer, I might have to put some specific UI to help user selects the monitor picker, which may depend on the support of the 'monitor' hint or not.

To try summarising the discussions, there are a few options available. Do you agree with the assessment? Any additional option I may have missed?

  1. Use a dedicated DOMString property Value restricted to application, browser or window in the spec (could be updated to an open enum when WebIDL supports it).

Pros: Easiest to understand. Can be easily extended to additional values (self-tab maybe) in the future. Chrome can extend it to other values, should there be a need. Cons: No way for web app to identify which hints are actually understood by browser (in particular if we add new values in the future or Chrome adds a proprietary one).

  1. Use a constraint based property, reject if value is 'monitor'

Pros: rejection makes it clear 'monitor' is not supported. Cons: potential compatibility issue if Chrome starts to accept 'monitor'. Adding additional values to the underlying enum might not make a lot of sense (though self-tab may?).

  1. Use a constraint based property, ignore if value is 'monitor'

Pros: No compatibility issue like for 2 (but web page cannot learn the hint is ignored similarly to 1). Cons: Surprising effect that 'monitor', even though valid, is ignored. Adding additional values to the underlying enum might not make a lot of sense (though self-tab may?).

eladalon1983 commented 2 years ago

So that I may answer succinctly and to the point, could you please clarify explicitly what your objections to "MAY ignore" are? Is it that developers might attempt something and it ends up being no-op? Is it something else? Is it one of multiple objections?

I think it's important to understand the answer to the above questions before deep-diving into alternatives. For example, if your chief concern is indeed that developers might incorrectly expect 'monitor' to have an effect, then I'd point out that even supported surface-type hints might end up having no effect, because (i) ideally, the user agent MAY regard the hint or disregard it, and (ii) the user is always free to choose a non-hinted surface type.

But this is really premature, as I am answering an issue which might be marginal in your eyes. I first ask again - what are your chief objections to my proposed solution, where user agents MAY regard or disregard any hint?

jan-ivar commented 2 years ago

User agents SHOULD steer the user away from monitor capture, regardless of displaySurface value provided as input.

eladalon1983 commented 2 years ago

We are not going to specify "the user agent MUST respect the hint." We are going to specify "the user agent MAY respect the hint." So we may as well just leave it at that, and add an explanation of how we encourage the UA to ignore dangerous hints, e.g. monitors, e.g. suspicious sites, etc.

jan-ivar commented 2 years ago

I see nothing wrong with stating UAs MAY respect the hint, but SHOULD steer users away from monitor capture in spite of that hint.

jan-ivar commented 2 years ago

That's a narrowing of the MAY, not an expansion of it.

youennf commented 2 years ago

My preference:

jan-ivar commented 2 years ago

I do not think there is a need to state 'monitor' selection is dangerous, the spec probably already says that.

I didn't say we should say it is "dangerous", which is a characterization. I said we should say what the UA SHOULD do when interpreting the hint, which is a prescription.

youennf commented 2 years ago

I said we should say what the UA SHOULD do when interpreting the hint, which is a prescription.

This assumes the spec would define hint values that would point user towards monitor capture. I think we should avoid defining such hint values.

jan-ivar commented 2 years ago

I'm assuming we use the existing displaysurface constraint. No need to define new surface here IMHO.

eladalon1983 commented 2 years ago

If we use constraints, monitor is already there. If we use something else, is it going to be reusable for anything? If so - it will likely gain monitor at some point.

Also, I don't think anyone else would currently accept an approach other than constraints...? Nobody other than Youenn has been positive on that possibility so far.

youennf commented 2 years ago

Reusing displaysurface has some drawbacks:

Hence why I would go with a new attribute extending DisplayMediaStreamConstraints as an open enum when supported by WebIDL and a USVString in the meantime. The PR should anyway be small, whatever the option taken. I think this approach might be smaller in fact.

On the other hand, I do not really see what displaysurface gains us. Can you express the benefits of reusing displaysurface? So far what I could think of is PR size or implementation size. I believe that both approaches will be roughly equally small.

it will likely gain monitor at some point.

This is only a possibility at this point, who knows what future will be. If that ever happens, we can very easily update the spec accordingly.

alvestrand commented 2 years ago

as far as I can tell, again the monitor issue is only a concern if you want to use IDL to enforce a prohibition against preferring the monitor. As Elad has pointed out, this approach is unlikely to gain consensus, given that we don't have consensus that prohibiting "monitor" as preferred display surface is never appropriate. (FWIW - personal opinion - I wouldn't want this on the open web, but it is sometimes appropriate in managed deployment contexts.)

Youenn, I think you are alone in your taste. getDisplayMedia({ audio: true, video: true, prefer: 'window'}) seems to indicate that the preference affects both audio and video, while getDisplayMedia({ audio: true, video: { displaySurface: 'window'} }) indicates clearly that displaySurface affects the video track only.

eladalon1983 commented 2 years ago

The PR should anyway be small, whatever the option taken.

This looks quite minimal to me:

  The user agent MAY allow the application to use the {{displaySurface}} constraint to
  signal a preference for a specific [=display surface=] type. The user agent MUST still
  offer the user unlimited choice of any [=display surface=], but MAY order the sources
  offered to the user according to this constraint.

Or even shorter:

  While user agents MUST always offer the user an unlimited choice of any [=display surface=],
  user agents MAY change the order or prominence of offered choices in response to an
  application's preference, as indicated by the {{displaySurface}} constraint.

Or shorter still:

  User agents MAY change the order or prominence of offered choices in response to an
  application's preference, as indicated by the {{displaySurface}} constraint.
jan-ivar commented 2 years ago
  • The 'monitor' issue

We can throw on "monitor" in prose regardless of API (if we decide to), so this seems like a separable discussion.

  • A picker might like hints that reduce the sets of surfaces, or allows reordering them. Say: I prefer secured pages, or I prefer self tab, or I prefer the tab that is playing audio right now, or I prefer the non browser application that just opened me and happens to be code-signed by a related party...

These ideas go further wrt influencing user choice than what I'm comfortable with. They also seem orthogonal to looking at the existing displaySurface constraint passed into getDisplayMedia for a default category to show...

  • It creates some minor backward compatibility issues (either reusing directly the enum, or reusing the whole constraint structure) while I think a hint of this sort should never trigger reject.

I don't see how. displaySurface uses DOMString, not enum, so {video: {displaySurface: "foo"}} cannot reject.

Style speaking (maybe this is just my taste), getDisplayMedia({ audio: true, video: true, prefer: 'window'}) reads better than getDisplayMedia({ audio: true, video: { displaySurface: 'window'} }). Clearer API is better for the web.

I disagree. I'm no fan of constraints, but consistency with getUserMedia and MST wins here in my book.

eladalon1983 commented 2 years ago

@hta:

I also agree that reusing constraints is less horrible than adding another way of doing this.

@jan-ivar:

Style speaking (maybe this is just my taste), getDisplayMedia({ audio: true, video: true, prefer: 'window'}) reads better than getDisplayMedia({ audio: true, video: { displaySurface: 'window'} }). Clearer API is better for the web.

I disagree. I'm no fan of constraints, but consistency with getUserMedia and MST wins here in my book.

Constraints are currently used throughout. Deviating from the established course requires stronger consensus IMHO. With Harald and Jan-Ivar in favor of consistency, and me generally agreeing with them, and not a lot of other people participating in the discussion, I think we should proceed under the assumption that we've decided to use constraints to resolve the current issue. That said, if Youenn formulates an independent plan to move us generally away from constraints, I will likely be very interested.

youennf commented 2 years ago

Sorry in advance for this very long message but there are lots of different points.

As Elad has pointed out, this approach is unlikely to gain consensus

@alvestrand, to be clear, I am not proposing we reject based on WebIDL. I am proposing we define the property as a plain USVString with some special values.

We can throw on "monitor" in prose regardless of API (if we decide to), so this seems like a separable discussion.

I was referring to the issue that we are exposing a value ('monitor') we actually do not want to expose to the open web (discussion has been about potentially using this value for transition).

These ideas go further wrt influencing user choice than what I'm comfortable with. They also seem orthogonal to looking at the existing displaySurface constraint passed into getDisplayMedia for a default category to show

I think you liked the idea to influence the prompt based on the fact that tabs would be site isolated. As of orthogonal or not, how would you add a new 'hint' value if reusing displaySurface constraint? Would you add two properties? As of 'existing', current browsers do not support displaySurface in existing getDisplayMedia implementations.

I don't see how. displaySurface uses DOMString, not enum, so {video: {displaySurface: "foo"}} cannot reject.

{video: { displaySurface: { exact: "monitor" }} does not reject right now in Chrome/Firefox AFAIK. It probably also applies to other constraints given to getDisplayMedia like cursor. My understanding would be that, after the 'constraint' proposal, implementations would start to reject, which is a minor compatibility issue. I would rather change the spec to match the implementations and explicitly list the constraints that are supported today. And move away from this MUST-reject-on-exact model when adding new features.

but consistency with getUserMedia and MST wins here in my book.

Consistency with getUserMedia is not making it easy to understand getDisplayMedia. As an example, let's go to the definition of getDisplayMedia to find out how to call it. We click on DisplayMediaStreamConstraints, great we stay in the same document and we learn about this structure. We then want to understand what to put in the video object. We are now going to mediacapture-main spec which defines getUserMedia properties. We learn about 'advanced' (which ultimately cannot be used for getDisplayMedia). We learn about MediaTrackConstraintSet for which most properties are useless to getDisplayMedia. And we have to manually go back to mediacapture-screen-share to look at the getDisplayMedia properties and all the specific rules. This is not a great journey. I'd prefer if we could make life easier than the current state, and describe in the spec what implementations are supporting (video width and height constraints and that is probably it).

Getting back to displaySurface as a constraint, it also makes it possible for web apps to do something like: getDisplayMedia({video: { displaySurface: ["tab", "window"]}}). In getUserMedia, this is fine and unambiguous as we are using a distance and only one of the value can have a distance of 0. In getDisplayMedia case, it is not clear what UA should do there (prefer tab or prefer window). I would guess the spec would state that the first hint would be used? With a USVString, there is no need to deal with this. I think a plain USVString is sufficient and simpler.

if Youenn formulates an independent plan to move us generally away from constraints, I will likely be very interested.

There is no plan to move away from constraints, simply to not extend their use over what is implemented in browsers. Plus a plan to describe more what is supported using WebIDL so that the specification is made clearer. For instance, DisplayMediaStreamConstraints would be defined without referring to MediaTrackConstraints. Instead we would list all the properties that are useful in the context of getDisplayMedia calls for audio and video explicitly within two WebIDL definitions (one for audio, one for video). And we would add the 'picker' hint, either within DisplayMediaStreamConstraints or within the WebIDL dedicated to video.

jan-ivar commented 2 years ago

I was referring to the issue that we are exposing a value ('monitor') we actually do not want to expose to the open web

Sites shouldn't be able to query track.getSettings().displaySurface == "monitor"?

I think you liked the idea to influence the prompt based on the fact that tabs would be site isolated.

Yes, for user agents to highlight some tabs over others (no spec changes needed for that), whereas here we're discussing an app signal on a default category (it's up to user agents how to apply this signal to UX).

As of orthogonal or not, how would you add a new 'hint' value if reusing displaySurface constraint? Would you add two properties?

That's hypothetical, as I'd prefer no more hints from apps.

{video: { displaySurface: { exact: "monitor" }} does not reject right now in Chrome/Firefox AFAIK. It probably also applies to other constraints given to getDisplayMedia like cursor.

Seems irrelevant. exact is categorically a TypeError in getDisplayMedia (for all implemented constraints due to WebIDL), so those are implementation bugs. E.g. {video: { width: { exact: 640 }} fails with TypeError in all browsers = don't use.

My understanding would be that, after the 'constraint' proposal, implementations would start to reject, which is a minor compatibility issue.

Firefox would start rejecting as soon as we fix bug 1732122, so this is unrelated. It's also not really a compatibility issue since it doesn't do anything. By this standard, every new feature is a compatibility issue for people who guessed a future API name expecting it to not be there.

Consistency with getUserMedia is not making it easy to understand getDisplayMedia. As an example, let's go to the definition of getDisplayMedia to find out how to call it.

Consistency = similarity in API shape and pattern (which users recognize from examples, MDN, or the explainer.md) Specs = blueprints for implementers, not docs for users. Improving their readability is an editorial issue, which should not influence API design.

getDisplayMedia({video: { displaySurface: ["tab", "window"]}}). In getUserMedia, this is fine and unambiguous as we are using a distance and only one of the value can have a distance of 0. In getDisplayMedia case, it is not clear what UA should do there (prefer tab or prefer window). I would guess the spec would state that the first hint would be used?

I don't think we have to say anything. The app has declared a preference for tab or window. We don't specify UX.

There is no plan to move away from constraints, simply to not extend their use over what is implemented in browsers.

Specs represent consensus, and right now, displaySurface is in the spec. If you feel it should be taken out, please file a separate issue, with new information worthy of reconsideration by the WG.

Plus a plan to describe more what is supported using WebIDL so that the specification is made clearer.

That sounds editorial. I don't think it's reasonable to block API progress on editorial cleanup.

I'd like to get back to debating this on the merits of the proposed change.

eladalon1983 commented 2 years ago

Based on the 2022-03-15, I believe we converged on:

User agents MAY change the order or prominence of offered choices in response
to an application's preference, as indicated by the {{displaySurface}} constraint.
It is advised that allowing applications to nudge users towards sharing a monitor
poses risks to user privacy.

Wdys? @aboba, @jan-ivar, @youennf?