w3c / mediacapture-main

Media Capture and Streams specification (aka getUserMedia)
https://w3c.github.io/mediacapture-main/
Other
121 stars 61 forks source link

Bug in spec: circular dependency for enumerateDevices() #709

Closed hills closed 3 years ago

hills commented 4 years ago

If the default device fails to open (even with permissions) then it has now become impossible to use any other device.

This is because of this new condition in enumerateDevices (summarised in commit c15a432b, March 2020):

if the browsing context did not capture (i.e. getUserMedia() was not called or never resolved successfully), the MediaDeviceInfo object will contain a valid value for kind but empty strings for deviceId,

Previously this read:

if no such access has been granted [...]

Here's how this plays out in practice:

Chromium attempted to follow the new spec but reverted the change.

In practice there are reasons a device may not be able to be opened, such as exclusive use by another application, or cannot fulfil some criteria, or just a fault. These may be platform or hardware dependent.

It looks like the summary in the commit is based on 9.2.2 "Device information exposure" which has been adjusted in commit e159c60, also in March.

I am not a spec author, I am afraid, and I would need time to fully understand the detailed steps described in the spec. But if I may suggest that it seems like the spec embodies a lot of policy that means existing special cases are causing new ones.


A proposal for what the user or developer experience should be that would make a lot of this simpler, whilst avoiding fingerprinting/probing issues:

Calls to getUserMedia that do not specify a device ID (or specify "default") would be governed by a "permission to use your camera/microphone" dialogue provided by the browser:

And then, independently a permissions flag (looks like [[canExposeDeviceInfo]]?):

What my goals are in the above proposal:

It is good to remember that not all apps are standard video conferencing apps, and increasingly there are WebAudio apps for producivity will use multiple devices concurrently.

hills commented 4 years ago

The tail is wagging the dog here?

Access to APIs is being restricted to when the capture indicator is on screen. Better to have a clear API design, and derive the capture indicator from it.

The side effects are being tested in this ticket: apprehension around failure cases in case they reveal information without indicating; and a 'guaranteed to succeed' codepath is a burden for both the spec and developers, but is needed to satisfy the problem which opened this ticket.

Your goals are increasingly clearer, why not just implement those goals?

To achieve the above:

With this, no complexity or change is pushed on the developer compared to their current experience in eg. Chromium. When returning to a web page, the intial call can be to either enumerateDevices() or getUserMedia(deviceID: xxx) and get the intuitive result (no need to quash failure cases, ignore 'strict' requirements, or restrict acccess)

But, crucially, all of the goals of the capture indicator are achieved as well (and clearly defined)

And the other benefits are:

youennf commented 4 years ago

Let's get back to the initial request:

@hills to make sure I understand clearly your request: if permission to a device is granted, but opening that device fails for other reasons, then you argue that enumerateDevices()̀ permission should be granted?

Are we good now in the fact that this has benefits but no identified drawbacks?

Your goals are increasingly clearer, why not just implement those goals?

I just illustrated some of the benefits, another major benefit is consistency between browsers. Browsers have different permission model and permission persistency. Exposing permission model/persistency browsers differences to the web page is bad for the web developer that wants to support all browsers with a single code path.

  • the capture indicator ("camera" icon in URL bar)

This is specific to Chrome and not specified in any spec. This is very UI territory land so I doubt we will be able to specify that.

hills commented 3 years ago

Let's get back to the initial request:

@hills to make sure I understand clearly your request: if permission to a device is granted, but opening that device fails for other reasons, then you argue that enumerateDevices()̀ permission should be granted?

Are we good now in the fact that this has benefits but no identified drawbacks?

I'm having trouble parsing this, and other parts of the message. sorry. As it sounds as if you are asking if I agree with my own (older) point which, of course, I do. But since then we discussed your privacy concerns, and incorporated them, so I think it is helpful not to go backwards.

This is the current concern: (full context). Can this be implemented?

  • Access to both APIs, enumerateDevices() and getUserMedia() based on the permission check with no extra conditions
  • Activate the capture indicator on any reveal of information including:
    • device opened successfully
    • device failed to open
    • reveal of any deviceID (enumerateDevices)

This outlines an, overall, much better fix to this ticket than that which was merged; and better direction in general. It has benefits, and no identifiable drawbacks, as you say.

You say the capture indicator ("camera" icon, typically) falls outside of the spec, that is even better. The capture indicator can be oriented to achieve the desired privacy goals; the spec focuses on maintaining a clear API without quirks (it also happens to be in line with the historical API so does not break existing code)

jan-ivar commented 3 years ago

This is the current concern: (full context). Can this be implemented?

@hills You're describing a solution here not a concern (concerns cannot be implemented). We need to start with problem-statements, not solutions, but to save time, putting enumerateDevices behind any kind of permission prompt has been suggested in the past and soundly rejected because of the difficulty of wording such a prompt to users.

From what I can tell from a read-through, all concerns with the current model that have been backed up by examples have been addressed with https://github.com/w3c/mediacapture-main/pull/717 and https://github.com/w3c/mediacapture-main/issues/724 (Firefox bug here) and an explanation from @youennf that exact constraints allow for some device triage before prompt, which means this was a productive discussion. Thanks!

I'm going to close this thread as it has gotten too long. It'd be more productive to open new issues on specific unresolved items.

To summarize the broader issue for people who land here: The WG consensus is that the enumerate-first strategy wrt device discovery is no longer feasible in the current privacy climate. While the spec previously implicitly supported this, it no longer does. What remains is the device-first model that most sites already follow. e.g.:

  1. Open the same camera/mic from last session using deviceId (system defaults on initial visit, within app constraints)
  2. Add an ⚙️ options panel where users can change their camera/mic preference during live capture.

Long term we hope to move away from enumerateDevices even further, by deprecating in-content device pickers in favor of in-browser ones as seen in Firefox & other APIs like getDisplayMedia.