Bug in spec: circular dependency for enumerateDevices()

If the default device fails to open (even with permissions) then it has now become impossible to use any other device.

This is because of this new condition in enumerateDevices (summarised in commit c15a432b, March 2020):

if the browsing context did not capture (i.e. getUserMedia() was not called or never resolved successfully), the MediaDeviceInfo object will contain a valid value for kind but empty strings for deviceId,

Previously this read:

if no such access has been granted [...]

Here's how this plays out in practice:

getUserMedia({audio: true}); // no device ID is known at this stage
user is prompted and grants permission to use the primary device
device open failure: an exception happens and getUserMedia fails
enumerateDevices(); // blocked from listing alternative devices
dead end for the user; they have no way to select an alternative device

Chromium attempted to follow the new spec but reverted the change.

In practice there are reasons a device may not be able to be opened, such as exclusive use by another application, or cannot fulfil some criteria, or just a fault. These may be platform or hardware dependent.

It looks like the summary in the commit is based on 9.2.2 "Device information exposure" which has been adjusted in commit e159c60, also in March.

I am not a spec author, I am afraid, and I would need time to fully understand the detailed steps described in the spec. But if I may suggest that it seems like the spec embodies a lot of policy that means existing special cases are causing new ones.

A proposal for what the user or developer experience should be that would make a lot of this simpler, whilst avoiding fingerprinting/probing issues:

Calls to getUserMedia that do not specify a device ID (or specify "default") would be governed by a "permission to use your camera/microphone" dialogue provided by the browser:

The browser should provide option to choose a device here
The persistence of this permission does not need to be spec'd
- Browser or user policy can device: eg. forever; this session
- Or every call to getUserMedia()
- Spec may demand some implicitly permitted operations; eg. if the device is already open by that page
This means that most web developers can just call getUserMedia, once, and not worry about enumerateDevices

And then, independently a permissions flag (looks like [[canExposeDeviceInfo]]?):

Browser prompts for "permission to use a range of media devices on your system"
Governs access to anything involving a deviceID: eg.
- enumerateDevices()
- getUserMedia use with a deviceID
- setSinkID
- "new device available" event
Pop up the permissions dialogue on the first of any of these API uses
- Even if the page remembers a deviceID in a cookie, it must still have this permission to make an API call with it
Browser policy or user sets how long this permission is granted for: eg. forever; this session; next 5 minutes
This dialogue can provide access control
- eg. "default devices only", "microphones only", or a specific allowlist
These deviceIDs now act as a pass on getUserMedia(), so not subject to the checks above.

What my goals are in the above proposal:

Clarity of not having permissions based on events ordering:
- enumerateDevices() needing to happen after getUserMedia()
- avoid 'fake' data now, complete data later
Tackle fingerprinting issues; no deviceIDs granted without user permission
User is clear which page can access which devices, not implicit based on history (#703)
Retain compatibility with exisiting code
Prevent most developers having to build device selection UIs unnecessarily:
- Chromium presently omits device selection on getUserMedia, which means most developers need to call enumerateDevices at present to give good user experience, and do this after an early getUserMedia() to call for permissions.
Leave policy to the browser
- Spec can be simpler and less bug footprint
- Policy is variable in different browsing environment; eg. mobile vs. desktop vs. pro user/producitivity app
- Policy can evolve without breaking existing sites

It is good to remember that not all apps are standard video conferencing apps, and increasingly there are WebAudio apps for producivity will use multiple devices concurrently.

The tail is wagging the dog here?

Access to APIs is being restricted to when the capture indicator is on screen. Better to have a clear API design, and derive the capture indicator from it.

The side effects are being tested in this ticket: apprehension around failure cases in case they reveal information without indicating; and a 'guaranteed to succeed' codepath is a burden for both the spec and developers, but is needed to satisfy the problem which opened this ticket.

Your goals are increasingly clearer, why not just implement those goals?

the capture indicator ("camera" icon in URL bar)
- present from the moment the page accesses any personalised information
- personalised information could be: noise on the soundcard, a video frame, or device IDs
- persists for the lifetime of the tab/page
the record indicator ("red blob" icon in the tab)
- directly related to when audio or video device is streaming
- removed when streaming completes
explicit user permission step
- if no permissions is given, do not reveal any personalised information
- browser policy may remember this between session
no change needed to existing JavaScript code

To achieve the above:

Access to both APIs, enumerateDevices() and getUserMedia() based on the permission check with no extra conditions
Activate the capture indicator on any reveal of information including:
- device opened successfully
- device failed to open
- reveal of any deviceID (enumerateDevices)

With this, no complexity or change is pushed on the developer compared to their current experience in eg. Chromium. When returning to a web page, the intial call can be to either enumerateDevices() or getUserMedia(deviceID: xxx) and get the intuitive result (no need to quash failure cases, ignore 'strict' requirements, or restrict acccess)

But, crucially, all of the goals of the capture indicator are achieved as well (and clearly defined)

And the other benefits are:

Developer does not open an alternative device (eg. getUserMedia({audio:true})) to acquire access enumerateDevices() or test if a device is present; which is inefficient and quite arbitary.
The permissions relationship is very straightforward for a developer to understand and work with
Reduction in edge cases for the spec maintainers; less complexity
Reduction in edge cases for developer, especially on failures
Intuitive API response when there is eg. no camera or no audio interface.

Let's get back to the initial request:

@hills to make sure I understand clearly your request: if permission to a device is granted, but opening that device fails for other reasons, then you argue that enumerateDevices()̀ permission should be granted?

Are we good now in the fact that this has benefits but no identified drawbacks?

Your goals are increasingly clearer, why not just implement those goals?

I just illustrated some of the benefits, another major benefit is consistency between browsers. Browsers have different permission model and permission persistency. Exposing permission model/persistency browsers differences to the web page is bad for the web developer that wants to support all browsers with a single code path.

the capture indicator ("camera" icon in URL bar)

This is specific to Chrome and not specified in any spec. This is very UI territory land so I doubt we will be able to specify that.

Let's get back to the initial request:

@hills to make sure I understand clearly your request: if permission to a device is granted, but opening that device fails for other reasons, then you argue that enumerateDevices()̀ permission should be granted?

Are we good now in the fact that this has benefits but no identified drawbacks?

I'm having trouble parsing this, and other parts of the message. sorry. As it sounds as if you are asking if I agree with my own (older) point which, of course, I do. But since then we discussed your privacy concerns, and incorporated them, so I think it is helpful not to go backwards.

This is the current concern: (full context). Can this be implemented?

Access to both APIs, enumerateDevices() and getUserMedia() based on the permission check with no extra conditions

Activate the capture indicator on any reveal of information including:

device opened successfully

device failed to open

reveal of any deviceID (enumerateDevices)

This outlines an, overall, much better fix to this ticket than that which was merged; and better direction in general. It has benefits, and no identifiable drawbacks, as you say.

You say the capture indicator ("camera" icon, typically) falls outside of the spec, that is even better. The capture indicator can be oriented to achieve the desired privacy goals; the spec focuses on maintaining a clear API without quirks (it also happens to be in line with the historical API so does not break existing code)

This is the current concern: (full context). Can this be implemented?

@hills You're describing a solution here not a concern (concerns cannot be implemented). We need to start with problem-statements, not solutions, but to save time, putting enumerateDevices behind any kind of permission prompt has been suggested in the past and soundly rejected because of the difficulty of wording such a prompt to users.

From what I can tell from a read-through, all concerns with the current model that have been backed up by examples have been addressed with https://github.com/w3c/mediacapture-main/pull/717 and https://github.com/w3c/mediacapture-main/issues/724 (Firefox bug here) and an explanation from @youennf that exact constraints allow for some device triage before prompt, which means this was a productive discussion. Thanks!

I'm going to close this thread as it has gotten too long. It'd be more productive to open new issues on specific unresolved items.

To summarize the broader issue for people who land here: The WG consensus is that the enumerate-first strategy wrt device discovery is no longer feasible in the current privacy climate. While the spec previously implicitly supported this, it no longer does. What remains is the device-first model that most sites already follow. e.g.:

Open the same camera/mic from last session using deviceId (system defaults on initial visit, within app constraints)
Add an ⚙️ options panel where users can change their camera/mic preference during live capture.

Long term we hope to move away from enumerateDevices even further, by deprecating in-content device pickers in favor of in-browser ones as seen in Firefox & other APIs like getDisplayMedia.

w3c / mediacapture-main

Bug in spec: circular dependency for enumerateDevices() #709