w3c / mediacapture-main

Media Capture and Streams specification (aka getUserMedia)
https://w3c.github.io/mediacapture-main/
Other
121 stars 61 forks source link

Bug in spec: circular dependency for enumerateDevices() #709

Closed hills closed 3 years ago

hills commented 4 years ago

If the default device fails to open (even with permissions) then it has now become impossible to use any other device.

This is because of this new condition in enumerateDevices (summarised in commit c15a432b, March 2020):

if the browsing context did not capture (i.e. getUserMedia() was not called or never resolved successfully), the MediaDeviceInfo object will contain a valid value for kind but empty strings for deviceId,

Previously this read:

if no such access has been granted [...]

Here's how this plays out in practice:

Chromium attempted to follow the new spec but reverted the change.

In practice there are reasons a device may not be able to be opened, such as exclusive use by another application, or cannot fulfil some criteria, or just a fault. These may be platform or hardware dependent.

It looks like the summary in the commit is based on 9.2.2 "Device information exposure" which has been adjusted in commit e159c60, also in March.

I am not a spec author, I am afraid, and I would need time to fully understand the detailed steps described in the spec. But if I may suggest that it seems like the spec embodies a lot of policy that means existing special cases are causing new ones.


A proposal for what the user or developer experience should be that would make a lot of this simpler, whilst avoiding fingerprinting/probing issues:

Calls to getUserMedia that do not specify a device ID (or specify "default") would be governed by a "permission to use your camera/microphone" dialogue provided by the browser:

And then, independently a permissions flag (looks like [[canExposeDeviceInfo]]?):

What my goals are in the above proposal:

It is good to remember that not all apps are standard video conferencing apps, and increasingly there are WebAudio apps for producivity will use multiple devices concurrently.

guest271314 commented 4 years ago

From perspective here the order of operation should be enumerateDevices() (with accurate devices being listed https://github.com/w3c/mediacapture-main/issues/693#issuecomment-623489944, which is currently not the case when workarounds are used to select monitor devices "What-U-Hear", Chromium refuses to start monitor devices deliberately, for which there is no specified solution, only workarounds) => getUserMedia(), et al. https://github.com/w3c/mediacapture-main/issues/703#issuecomment-653822239.

Something like getSystemDevices() (permission for all devices, not just what implementers might arbitrarily decide to list, or not) => [<all_devices_as_a_list_that_user_selects_one_or_more_from>] => getUserMedia() can access any of the listed devices selected, not un-selected devices, enumerateDevices() enumerates selected devices - exactly, not listing "audiooutput" though actually the device is just default microphone, not really audio output at all, and not listing devices not selected at getSystemDevices(), with implementers using the same language for descriptions or identifiers of devices.

For backward compatibility users should be able to get a devices directly with something like

getUserMedia({audio:{kind:{exact:'audioouput'}}}) and for that to be reliable and consistent between implementations. First, clearly define and agree on what kinds are, then if that device does not exist we can expect the ssame result at all implementations, we do not want the microphone, throw an error or exception, did not ask for audio input.

So, there are means to solve these device select issues, even within the scope of not breaking the web and existing users of `getUserMedia() - as long as the changes are consistent. Else, do not overload methods with functionality not originally intended. Create new methods to meet the requirements.

youennf commented 4 years ago

This is a lot of input, thanks @hills! It seems there is one precise identified issue we should try to address quickly (NotReadableError case) and more longer term proposals (design thoughts and potentially new API). It might be easier to discuss the proposal/new API as a specific issue and keep discussing the NotReadableError case here.

Diving specifically about the NotReadableError case, I can see two possibilities:

I tend to prefer option 1. It does not seem great that a web page would have to iterate through all devices if getUserMedia({ video : true }) returns ReadableError. It should really be the responsibility of the User Agent to try as much as possible to fulfill the page request.

fippo commented 4 years ago

@youennf problem here was that the web page could not iterate through all devices since enumerateDevices only returned a single pair of devices. Which probably was broken since Chrome changed to that model for the case where there was no permission initially but it was rare enough nobody complained...

Option (1) would cover this as well.

alvestrand commented 4 years ago

The spec says something like "once you have the set of devices that satisfy the criteria, the UA picks one". It could say something like "once you have the set of devices that satisfy the criteria, the UA tries to open each one in turn until one succeeds or all fail". That would preserve the model, and get a working device. (Mostly Youenn's suggestion. Made on call.)

hills commented 4 years ago

This is a lot of input, thanks @hills! It seems there is one precise identified issue we should try to address quickly (NotReadableError case) and more longer term proposals (design thoughts and potentially new API). It might be easier to discuss the proposal/new API as a specific issue and keep discussing the NotReadableError case here.

I agree, there is a short term vs. long term proposals here. I just wasn't sure that posting 'ideas' in ticket form is good etiquette, but I am very happy to do that if preferred.

Diving specifically about the NotReadableError case, I can see two possibilities:

  • User Agent should be able in the middle of the getUserMedia algorithm to select another matching device if the one that is selected is NotReadableError. If all devices have issues, return "NotReadableError", no need to expose device info. It is unclear to me whether this requires a spec change. At least a note clarifying this is allowed would be good.

  • In case user granted permission and device is NotReadableError, set device information exposure to true. This requires a spec change. This also weakens a bit the model, in terms of privacy.

I tend to prefer option 1. It does not seem great that a web page would have to iterate through all devices if getUserMedia({ video : true }) returns ReadableError. It should really be the responsibility of the User Agent to try as much as possible to fulfill the page request.

I don't think the two are mutually exclusive?

Frankly, the moment the user grants permission, that should be it (and I believe that would 'solve' the present ticket). That action should relate to device exposure on its own. It's very strong as it is, because of explicit action by the user. I'm afraid I don't understand how waiting for the device to open successfully strengthens privacy. I think it's actually the opposite effect; the user's understanding is complicated and weakened if it's coupled to some other future action, rather than a simple 1:1 relationship with them clicking 'allow'.

Now for the other issue. Can I clarify, you would propose to ignore 'exact' deviceID constraint if the device can't be opened? Because if I say it like that it surely can't be justified!

Even if it's not an 'exact' constraint or not, device selection logic is going to perform unusually for users in some example cases:

I think it's a noble aim to feel the API is simpler if deviceID as just another constraint. But reality some of the constraints tell us something about the source itself (ie. which way the camera/mic is pointing) vs. how we wish to capture it.

And so in my long term proposal (part 2) attempts to embody this cases without suprises for users or developers.

guest271314 commented 4 years ago

Re long-term, at the front-end getUserMedia({audio:true}) at Chromium is useless relevant to initial device selection

Screenshot_2020-08-12_16-24-41

The user does not gain any knowledge about the device being captured other than "microphone".

The order should be enumerateDevices() => getUserMedia(<selected_device(s)>) for the user to be fully aware of the specific devices selected before the capture actually commences at getUserMedia().

Attempting to massage clarity from getUserMedia({audio: true}) => enumerateDevices() requires calling getUserMedia() at least twice when the initial selected device (Default) is not the device intended to be captured.

As you pointed out, attempting to select a device the implementation refuses to support capture of leads to a DOMException not even described in the specification, see https://bugs.chromium.org/p/chromium/issues/detail?id=931749

We use pulse audio to create an audio sink and configure the monitor of that sink as the default/fallback source of that device. E.g. a pa configuration like this:

load-module module-null-sink sink_name=main_mix
set-default-source main_mix.monitor

In Chrome we then try to access this source. The default audio input device is listed in the result of navigator.mediaDevices.enumerateDevices, but when we try to access it we get the following error: DOMException: could not start audio source

To avoid such restrictions implementers might decide to arbtririly incorporate into their version of getUserMedia() enumerateDevices() should precede getUserMedia() so that users can evaluate the list of available devices and either select a device the implementation decides to expose, or not use getUserMedia() at all for the use case.

Ideally, there should not be any restrictions at all on which devices can be selected for capture, whether the device identifier is 'speech-dispatcher', a monitor device, or virtual device.

The change would break existing web applications, though since this specification is active and users of getUserMedia() are well-suited to changing code in response to changes, the users at large should be able to adjust code accordingly rather swiftly.

guest271314 commented 4 years ago

Mozilla browsers do not have the same issue as Chromium, Chrome browsers, as Nightly and Firefox provide a drop-down list of devices available for capture at the UI prompt, including monitor devices.

However, that still requires calling getUserMedia({audio: true}) twice if we are using only code and not the prompt because we do not have the deviceId at point and we have not yet called enumerateDevices(), and the specification does not mandate any uniform UI for initial device selection.

Thus, if the specification is adjusted to enumerateDevices() being the entry point for device selection and capture, once permission is granted to select all device(s) initially we can then proceed with actually capturing media.

This can be accomplished by using a very basic <select multiple> HTML element or equivalent using GTK or Qt, etc.

    <select multiple>
      <option>a</option>
      <option>b</option>
    </select>

which is simple enough to be specified and uniformly implemented.

Flow-chart:

  1. enumerateDevices()
  2. Select one or more devices, audio or video
  3. Only the selected devices can be captured at subsequent calls to getUserMedia(<constraints>) during the session unless enumerateDevices() is executed again to expand or limit the selected exposed devices. Constraints including exact are limited to only the selected devices, else, if necessary throw exception then, when the device to capture is not within the list selected by the user.

The above algorithm should resolve any ambiguities as to which devices are selected by user and which devices are exposed for the session.

guest271314 commented 4 years ago

The prompt for Mozilla getUserMedia() is an example of how enumerateDevices() would work to both grant permissions (or not) and select any and all devices at a multiple select element (HTML; GTK; Qt, etc.)

Screenshot_2020-08-12_16-48-06

where in code then filters the list from enumerateDevices() and sets the filtered list, something like

enumerateDevices()
.then(devices => {
  const filteredDevices = devices.filter(device => <conditions>);
  const orderedDevices = devices.sort((a, b) => <conditions>);
  filteredDevices is essentially user permission for the device(s), or not when not selected
  return navigator.mediaDevices.getUserMedia(filteredDevices <converted_to_constraints>);
})
.then(stream => {
   // devices selected at filteredDevices in order set by orderedDevices, 0 of audio and/or video default
   // remainder of list not primary MediaStreamTrack's yet still exposed and capable of being adjusted in MediaStream
   // using enabled, removeTrack, addTrack
  console.log(stream);
})
// implementation refuses to capture device, etc.
.catch(console.error);
hills commented 3 years ago

Hi! I don't think this should have been closed. The patch doesn't address the core issue, especially as this text remains:

if the browsing context did not capture (i.e. getUserMedia() was not called or never resolved successfully), the MediaDeviceInfo object will contain a valid value for kind but empty strings for deviceId,

Which IMO should be changed to what it was previously:

if no such access has been granted, the MediaDeviceInfo object will ...

with whatever changes to the spec to create this summary.

The merge patch seems to have little or no implication for the issue raised, and no explaination of how it aims to fix the issue.

So please can the issue be re-opened.

youennf commented 3 years ago

As discussed before, your initial post touched on several points and we decided to focus this issue on the specific problem of device failing. For other points, it might be best to file new focused issues.

What this PR is doing is making sure that the browser will try to fulfil as much as possible the user request. If user granted access, even if the first device being selected failed, getUserMedia will succeed as long as there is a device that can be started.

If all devices fail, getUserMedia will fail and no information on the device setup will be given to the page. In that case, I do not see how exposing device setup info would actually help the web developer.

Can you clarify which scenario you think is not covered after this change?

hills commented 3 years ago

getUserMedia(deviceId: { exact: 'xyzxyzxyzxyz' })

Which of these happens:

youennf commented 3 years ago

With the above request, only one device can be used by getUserMedia. If device is not there, there will be a constrain error. If device is there and failed, a hardware error will happen.

hills commented 3 years ago

And in these errors, no access to enumerateDevices will be available?

So how do I gain access to enumerateDevices?

youennf commented 3 years ago

And in these errors, no access to enumerateDevices will be available?

Yes

So how do I gain access to enumerateDevices?

Rewrite your request to: getUserMedia(deviceId: 'xyzxyzxyzxyz')

hills commented 3 years ago

So now I make that calll, I have no idea whether the device that was opened was the one the user requested.

So here's the minimum practical code to open a user's previous device and present a device selection:

  1. Device ID originates from cookie from previous session
  2. getUserMedia(deviceId: 'xyzxyzxyzxyz')
  3. wait for success
  4. close the device
  5. getUserMedia(deviceId: { exact: 'xyzxyzxyzxyz' })
  6. now call enumerateDevices
youennf commented 3 years ago

Not really, in step 4, you can check capture is using the device with the given id through the provided MediaStreamTrack.

If not, you can check enumerateDevices to know whether the device is there. If it is not there, you keep the track. If it is there, you make another getUserMedia call with exact constraints.

Although the spec does not require it, I think getUserMedia({ video : { deviceId: 'xyzxyzxyzxyz' } }) browser current implementations will always pick the device with the corresponding id, if the device is there and functional. So you should be able to always stop at step 4.

In browsers that have pickers like Firefox, it might be actually better to stick with what the user selected (or ask the user if they would prefer to use the past device explicitly).

hills commented 3 years ago

Not really, in step 4, you can check capture is using the device with the given id through the provided MediaStreamTrack.

I don't think its a good sign that a problem is introduced and now the solution is to use more API :)

Although the spec does not require it, I think getUserMedia({ video : { deviceId: 'xyzxyzxyzxyz' } }) browser current implementations will always pick the device with the corresponding id, if the device is there and functional. So you should be able to always stop at step 4.

"Should" and "always" are not compatible here :)

With the above clarification, developers will only act defensively, and are reduced to:

  1. Device ID from a previous session
  2. A dummy call to getUserMedia(audio: true)
  3. Prompt the user, get it out of the way
  4. close the device
  5. now continue with the rest of the program as normal

And this still provides a poor user experience requiring restarts if the primary device is unavailable. And it will be especially poor if any browser is prompting per-call to getUserMedia.

youennf commented 3 years ago

I was probably not clear. I do not think that you need to call enumerateDevices at all. My recommendation is to simply call getUserMedia({ video : { deviceId: 'previous ID'} }) and use the provided track. No need to do anything else.

If that is not working for you, this might be a browser bug or a scenario we have not thought about.

guest271314 commented 3 years ago

Subsequent calls to getUserMedia() is part of the problem, along with

Although the spec does not require it, I think getUserMedia({ video : { deviceId: 'xyzxyzxyzxyz' } }) browser current implementations will always pick the device with the corresponding id, if the device is there and functional.

where the corresponding deviceId does not necessarily mean the device intended to be selected, e.g., Chromium label "audiooutput" where audio output capture is not supported at Linux.

As suggested, the user needs to be able to select devices before getUserMedia(), and have clear and unequivocal notification of the device being capable of being captured by getUserMedia() or getDisplayMedia().

For that reason the order needs to be enumerateDevices() first, then getUserMedia(), where if the device the user intends to capture is not available per implementation, then the user need not even proceed with any code related to getUserMedia() at all, where the result will be for naught in any event if the device intended to be captured is not accessible. Thus, resulting in the same issue regarding having to call getUserMedia() - and potentiall call MediaStream and MediaStreamTrack processing code as well, only to find that is not the device intended to be captured. enumerateDevices() => getUserMedia(), is a substantive change that will provide clarity as to which devices are available and whether to continue or not with further code that coud ultimately be futile.

youennf commented 3 years ago

@hills, can we close that issue?

ShikChen commented 3 years ago

As @hills said, I think developers would likely just add a getUserMedia({audio: true}) to get the access for enumerateDevices(). (We actually did that before Chrome reverted the change).

An example scenario (simplified from a real case):

What's the expected flow for the above scenario?

youennf commented 3 years ago

I am unclear with some details in your scenario, like whether only one webcam is connected (720p) and web page wants user to connect the 1080p camera. Here is a potential flow:

const previousDeviceId = await getDeviceIdFromIDB();
let stream = await navigator.mediaDevices.getUserMedia({ video : { deviceId : previousDeviceId, width : 1920 } });
// Browser will try using the previous device, if not possible, it will try selecting any 1080p camera.
if (stream.getVideoTracks()[0].getSettings().width < 1920) {
    // Chances are high there is no 1080p camera otherwise it would have been selected in the first place. Let's still check just in case.
    const devices = await navigator.mediaDevices.enumerateDevices();
    const newDeviceId = select1080pCamera(devices);
    if (!deviceId) {
        // Ask user to connect a 1080p camera through some UI.
        ....
        navigator.mediaDevices.ondevicechange = trySelecting1080pCamera;
        return;
    }
    // Optional step: switch immediately to the 1080p camera. It might be bad if the user selected the other camera explicitly through a device picker (say Firefox picker).
    stream = await navigator.mediaDevices.getUserMedia({ video : { deviceId : newDeviceId, width : 1920 } });
}
// Proceed with using the stream
...

Another approach:

try {
    const stream = await navigator.mediaDevices.getUserMedia({ video : { deviceId : { exact : await getDeviceIdFromIDB() } } });
    stream.getVideoTracks()[0].applyConstraints({ width : 1920 });
    return stream;
} catch (e) {
    return navigator.mediaDevices.getUserMedia({ video : { width : 1920 } })
}
hills commented 3 years ago

@hills, can we close that issue?

For me, I'm afraid not, because I don't see the solution as workable.

To me the best solution is the obvious and previous one: once a user has granted permission, enumerateDevices() is allowed.

Perhaps I would be more amenable if you could explain why we are not just doing this?

We are only starting to explore the negatives of requiring a device to be successfully opened first; it sounds like you are hoping this adds some value in some way and working hard to retain this. Can you explain what value that is?

youennf commented 3 years ago

I don't see the solution as workable.

Can you point to a website or a jsfiddle that is broken with this change and that we would not be able to rewrite without some big refactoring and/or different user UI?

Can you explain what value that is?

enumerateDevices is widely abused by trackers on the web for several reasons:

As part of privacy enhancement, it was decided to limit leaking to the minimum by default. Currently, default leakage is limited to whether there are not cameras/microphones.

For instance, it would be easy for a website to ask camera access once to take a picture or as part of a game. It would be bad if that website could track user setup from now on without any limitation. Capture icons will most probably deter trackers from opening even for a second capture devices to get that information.

We are only starting to explore the negatives of requiring a device to be successfully opened first;

I understand this is a change of behavior and that websites might want to update to optimise their flow. I fail though to understand the hard limitations this change triggers.

I actually think this change is bringing improvements outside privacy improvements.

Before the change, a website would have to handle the case of new users, or users that did capture but revoked permissions, or users that did capture but cleared web site data including IDB. A website would also have to handle multiple browsers with various permission models and prompts, leading to different enumerateDevices results. This change and the proposed flow of using ideal constraints simplifies things by making the model more consistent across browsers and across user states.

The proposed change and proposed usage of getUserMedia is also future proof with the in-chrome device picker for getUserMedia.

hills commented 3 years ago

You've explaned why device IDs may be used for tracking, or why specs help to unify behaviour; but that was not in question.

What is the value of the additional requirement to successfully open a device?

youennf commented 3 years ago

What is the value of the additional requirement to successfully open a device?

Capture indicators are usually tied to successfully opening the device. If the device cannot be opened, capture indicators will not kick in and a web page could potentially silently get all capture setup information without the user knowing anything about it.

The reverse question is also interesting: what is the practical value for not adding this requirement?

hills commented 3 years ago

What is the value of the additional requirement to successfully open a device?

Capture indicators are usually tied to successfully opening the device. If the device cannot be opened, capture indicators will not kick in and a web page could potentially silently get all capture setup information without the user knowing anything about it.

But all of this happens after the user has already "allowed" access to their media device(s) explicitly.

Are proposing that enumerateDevices() can only be called whilst a device is streaming?

youennf commented 3 years ago

But all of this happens after the user has already "allowed" access to their media device(s) explicitly.

Explicitly or implicitly.

Are proposing that enumerateDevices() can only be called whilst a device is streaming?

Spec allows a user agent to do so but this is not mandatory. I believe it would be too strong if implemented as is. I could see some heuristics like that, for instance in case page is not capturing for some time say an hour.

hills commented 3 years ago

Are proposing that enumerateDevices() can only be called whilst a device is streaming?

Spec allows a user agent to do so but this is not mandatory. I believe it would be too strong if implemented as is.

I agree it would be too strong, with wide reaching breakage.

You state that a user agent could allow this, but that is not how it reads. In the "access control model" the most recent wording is:

if [conditions] then [restrict access]. Otherwise the MediaDeviceInfo object will contain meaningful values".

and this meaning persists in the historical version too.

Therefore there can be no correlation between capture indicators and enumerateDevices().

hills commented 3 years ago

Forgive me for pressing the issue; if I may summarise:

The upsides of the change:

The downsides:

Thank you for being patient with my questions. But I feel this is quite a robust case for the previous behaviour.

youennf commented 3 years ago

The upsides of the change:

Which change are you referring to? As of capture indicator benefit, I am specifically talking of https://github.com/w3c/mediacapture-main/pull/717. If you are referring to enumerateDevices sanitisation (the fact that enumerateDevices should not leak until the page starts capturing), the benefits have been described in https://github.com/w3c/mediacapture-main/issues/709#issuecomment-686409911.

Therefore there can be no correlation between capture indicators and enumerateDevices().

There is a strong correlation: a page will get full enumerateDevices access at a time where capture indicators will be visible. Past that point, the page will most likely continue getting full enumerateDevices access for its whole lifetime.

  • change in API behaviour affects existing code in the wild

Are you specifically referring to https://github.com/w3c/mediacapture-main/pull/717 or to enumerateDevices sanitization? If the former, this should really be an edge case (a device that fails to open should be an edge case). Or am I missing something?

If it is the latter, I agree this is an important change and I am more than happy to discuss how to best migrate. Please provide links to specific code bases or web sites. Note also that if this behavior breaks an application, it means it is broken in Safari for a few years now.

  • the new behaviour is not efficient (examples given require multiple device opens to workaround new behaviour)

Are you specifically referring to https://github.com/w3c/mediacapture-main/pull/717 or to enumerateDevices sanitization?

Would you be able to file an issue specifically for that? Based on that, we might want to tighten the rules of device selection, how to go from finalSet to the actual selected track.

Guessing it is about enumerateDevices sanitization, the example should have the following requirements:

  1. Before sanitization change, the page should do one successful getUserMedia call and get the expected device.
  2. After sanitization change, there is no way for the page to get the expected device with just one successful getUserMedia call.
hills commented 3 years ago

"The change" which I refer is the subject of this bug report; the very first paragraphs. Pull request #717 is inconsequential to the summary as it merely complicates the state of affairs.

For the avoidance of doubt, here's the same summary with context:


This new condition in enumerateDevices (summarised in commit c15a432, March 2020):

if the browsing context did not capture (i.e. getUserMedia() was not called or never resolved successfully), the MediaDeviceInfo object will contain a valid value for kind but empty strings for deviceId,

Previously this read:

if no such access has been granted [...]

The upsides of the change:

The downsides:

This is a robust case for the previous behaviour.

youennf commented 3 years ago

The upsides of the change:

  • a perceived "privacy enhancement" around capture indicators

No. The upsides are noticeable privacy enhancements that have been positively welcomed by privacy experts. See https://github.com/w3c/mediacapture-main/issues/709#issuecomment-686409911 for some benefits.

A behavior change is always painful and it is much easier to adapt to the change if you know why it was done. I am happy to continue discussing and describing these benefits.

But our discussion shows this change cannot be realised in practice.

I do not know how the discussion shows this. This change is implemented and the benefits have been realised.

The downsides:

  • change in API behaviour affects existing code in the wild

True. As I said above, I am more than happy to help mitigating the pain of migrating to the new behavior. There is a proposed API usage pattern to ease migration. AFAIK, this pattern works with all major browsers, implementing the change or not.

  • the new behaviour is unusual and not straightforward (I am not the only person already asking for clarification)

This is a subjective statement.

  • the new behaviour is not efficient (examples given require multiple device opens to workaround new behaviour)

I haven't seen any proof the new behavior is less efficient. As asked previously, please provide precise examples to demonstrate this claim.

ShikChen commented 3 years ago

I am unclear with some details in your scenario, like whether only one webcam is connected (720p) and web page wants user to connect the 1080p camera. Here is a potential flow:

const previousDeviceId = await getDeviceIdFromIDB();
let stream = await navigator.mediaDevices.getUserMedia({ video : { deviceId : previousDeviceId, width : 1920 } });
// Browser will try using the previous device, if not possible, it will try selecting any 1080p camera.
if (stream.getVideoTracks()[0].getSettings().width < 1920) {
    // Chances are high there is no 1080p camera otherwise it would have been selected in the first place. Let's still check just in case.
    const devices = await navigator.mediaDevices.enumerateDevices();
    const newDeviceId = select1080pCamera(devices);
    if (!deviceId) {
        // Ask user to connect a 1080p camera through some UI.
        ....
        navigator.mediaDevices.ondevicechange = trySelecting1080pCamera;
        return;
    }
    // Optional step: switch immediately to the 1080p camera. It might be bad if the user selected the other camera explicitly through a device picker (say Firefox picker).
    stream = await navigator.mediaDevices.getUserMedia({ video : { deviceId : newDeviceId, width : 1920 } });
}
// Proceed with using the stream
...

Another approach:

try {
    const stream = await navigator.mediaDevices.getUserMedia({ video : { deviceId : { exact : await getDeviceIdFromIDB() } } });
    stream.getVideoTracks()[0].applyConstraints({ width : 1920 });
    return stream;
} catch (e) {
    return navigator.mediaDevices.getUserMedia({ video : { width : 1920 } })
}

The issue here is that if a wrong camera is opened, or a wrong setting is used, re-open/update it is very costly. It's strongly desired to open the correct camera with correct settings with one shot, since

For example, if the app want to open "Cam A with setting X", or "Cam B with setting Y", depends on which camera is available now. There is an always available "Cam C which supports both setting X and Y", but the app/user don't want to use it for some reason (such as a built-in USB camera on a laptop with quality or wrong facing).

Note that getUserMedia({audio: true}) (~10ms level) is much faster than getUserMedia({video: ...}) (~1s level) on many platforms. So just run getUserMedia({audio: true}) to grant the permission for enumerateDevices() is a very attractive simple mitigation for this spec change, which is not a recommended way to use this API if I understand correctly.

youennf commented 3 years ago

The issue here is that if a wrong camera is opened, or a wrong setting is used, re-open/update it is very costly.

Opening a wrong camera is slow and painful. Ideally, applyConstraints would fix all issues with wrong settings but there are indeed system limitations that may trigger LED blinking for instance.

  • opening a camera is a slow operation and might pop-up a dialog every time
  • blinking wrong camera LED is bad UX

exact deviceId contraints should prevent that (both prompt and LED). If deviceId is not matching, the exact constraint will make the getUserMedia call fail quickly (assuming user agent enumerated camera devices already) without a camera LED blink. Do you know of configurations where this is not the case?

  • The app would need to try getUserMedia({video: A + X}) and getUserMedia({video: B + Y}), which is far from ideal imo.

Let's say page sets A and B as exact deviceId constraints:

Would that work for your use case? Do you know of configurations where this approach would be slower?

hills commented 3 years ago

The upsides of the change:

  • a perceived "privacy enhancement" around capture indicators

No. The upsides are noticeable privacy enhancements that have been positively welcomed by privacy experts. See #709 (comment) for some benefits.

It's a regression in the conversation to again refer to the benefits of not exposing device IDs for fingerprinting. That is not what is in question here; nobody has, or is, questioning that.

We must be more precise of the exact benefit if we are to further this discussion; because it's not possible to respond to "privacy enhancements" by "privacy experts".

I previously asked if you could clarify the additional privacy of requiring a web page to successfully open some device (an additional event which happens after permission is given by the user). You centred on this being that a capture indicator would be present in the browser.

But subsequently we both agreed a web page could do some capture and then use enumerateDevices() after. So the possibilty of a web page doing enumerateDevices() without a capture indicator is always there. This is what I meant by "benefits cannot be realised in practice".

(And, elsewhere the 'workaround' is exactly that: a generic getUserMedia({audio:true}) and then close the stream)

I am trying to advance the conversation by demonstrating (for now) that, given these steps:

  1. getUserMedia
  2. permissions check: user may be prompted
  3. device gets successfully opened

there is no upgrade in privacy happens after the completion of step 2. Waiting for step 3, which the spec forces, has no privacy benefit.

But you are saying there is? Can you clarify exactly what the specific benefit is?

dontcallmedom commented 3 years ago

@hills to make sure I understand clearly your request: if permission to a device is granted, but opening that device fails for other reasons, then you argue that enumerateDevices()̀ permission should be granted?

(iow, your step #2 is really "permission granted", right?)

hills commented 3 years ago

@hills to make sure I understand clearly your request: if permission to a device is granted, but opening that device fails for other reasons, then you argue that enumerateDevices()̀ permission should be granted?

Yes.

Not so much my final request, just trying to tangibly demonstrate the core issue.

(iow, your step #2 is really "permission granted", right?)

Perhaps, but I didn't put it like that as it implies user input, and step 2 may not involve any. But if you're clear on that then it's fine :)

guest271314 commented 3 years ago

@hills to make sure I understand clearly your request: if permission to a device is granted, but opening that device fails for other reasons, then you argue that enumerateDevices()̀ permission should be granted?

This already occurs in practice at Chromium, deliberately, ostensibly for the claim that PulseAudio selects a monitor device when no input device is found; see https://bugs.chromium.org/p/chromium/issues/detail?id=931749#c6, https://chromium.googlesource.com/chromium/src/+/4519c32f528e079f25cb2afc594ecf625f943782

PulseAudio: Filter out unavailable inputs and refuse to open monitor inputs

If there are no available inputs, PulseAudio will, for some reason, select the monitor of the current default sink as the default source. Chrome will only add a Default input to the device enumeration if there is at least one valid input found. Currently, we also include unavailable inputs in the device enumeration, which means Chrome will show a Default device, which in that case would likely be a monitor device, rather than a proper input.

This CL stops this happening by not enumerating inputs that don't have an available, active port. In case PulseAudio would still pick a monitor device as the default device, this CL also explicitly returns invalid AudioParameters for monitor devices, and explicitly fails to open them as inputs - to be on the safe side.

where if there are no available inputs am not sure what the user is expecting other than a monitor device to be captured in that case when no microphone is connected to the machine.

Chromium simply refuses to recognize monitor devices, captures microphone. Of course, if the user has no input devices, or connection to the device fails, that commit does not change anything - the user still has no microphone connected to the device - so the rationale for the change is at best unclear.

Relevant to this topic that change means enumerateDevices() still has permission in that case, though the actual device captured is not what is expected to be captured.

Further, if a user, in kind, refuses to accept that monitor devices will not be captured, and creates workarounds to actually capture the monitor device, since Chromium implementation refuses to list monitor devices, enumerateDevices() currently reports the incorrect devices, and makes it impossible to identify the device that is actually being captured https://github.com/w3c/mediacapture-main/issues/693, which is perhaps an unintended consequence of refusing to capture certain devices at the implementation level, while users in the field still have the requirement to capture such devices.

Capture of monitor devices throws an error when set to default at OS. enumerateDevices() still has permissions, even though device capture fails for a specific device.

Users should not be limited by specification authors initial use case conceptions. Video conferencing is not now the sole use case for getUserMedia() or getDisplayMedia(). The technology advances at least every 18 months; users might have multiple USB, Bluetooth, or other devices connected to their machine - and are not expecting getUserMedia() to refuse to connect to a device simply because the initial idea for getUserMedia() is for developers to conference. Media creation and editing use cases are not limited to just input microphone.

youennf commented 3 years ago

@hills to make sure I understand clearly your request: if permission to a device is granted, but opening that device fails for other reasons, then you argue that enumerateDevices()̀ permission should be granted?

Yes.

Let's concentrate solely on this point in this issue. And let's look at the pros and cons you mentioned above.

The upsides of the change: a perceived "privacy enhancement" around capture indicators. But our discussion shows this change cannot be realised in practice.

Let's say a device is always broken and page somehow knows it. A page could try to call getUserMedia with exact constraints on that device. Without the change, the getUserMedia call would fail and the page would be granted enumerateDevice permission. User has no way to notice this information leakage.

With the change, the web page will not have access to that info and will have to open a functional device, which will trigger the capture indicator. This makes it highly unlikely that pages that want that information for learning about user (but not call getUserMedia) will actually take the risk to be discovered.

The benefits are realised in practice. If we widen the scope of the issue, we increase the privacy benefits.

The downsides: change in API behaviour affects existing code in the wild the new behaviour is unusual and not straightforward (I am not the only person already asking for clarification) the new behaviour is not efficient (examples given require multiple device opens to workaround new behaviour)

I do not see how these downsides are related to the case you mention above (capture fails due to a hardware issue). Note that the spec is further reducing the edge case to the case of all selectable devices are failing, not just the one. Can you clarify what benefits you see in granting enumerateDevices permission in case all selectable devices fail to open?

hills commented 3 years ago

Let's say a device is always broken and page somehow knows it. A page could try to call getUserMedia with exact constraints on that device. Without the change, the getUserMedia call would fail and the page would be granted enumerateDevice permission. User has no way to notice this information leakage.

In any sane system this case does not exist, because returning failure (or success) must take place after the permissions check.

Leaving the user well aware; because they either confirmed then and there, or specifically asked the browser to remember this choice.

So, before I continue, it seems you are clarifying that the failure case you are concerned about happens before permissions check?

youennf commented 3 years ago

it seems you are clarifying that the failure case you are concerned about happens before permissions check?

No, the failure happens after the permission check. To be as accurate as possible: User has no way to notice this information leakage in case the permission check does not trigger a prompt. This case (permission check without a prompt) tends to happen a lot in Chrome but can also happen in MacOS Safari.

hills commented 3 years ago

User has no way to notice this information leakage in case the permission check does not trigger a prompt.

But if the permissions check does not trigger a prompt, that is because the user has already declared that the action is allowed.

Are you trying to protect the user, even in cases where they have given their permission?

youennf commented 3 years ago

Are you trying to protect the user, even in cases where they have given their permission?

Yes, this is explained at https://github.com/w3c/mediacapture-main/issues/709#issuecomment-686409911. Here is a more thorough example:

The spec change forbids this scenario and makes enumerateDevices much less useful for trackers. The spec change keeps enumerateDevices useful for pages that want to call getUserMedia with the updates we talked about: use deviceId constraints in getUserMedia without first calling enumerateDevices to validate deviceId values.

hills commented 3 years ago

Ok. So we have asserted that the failure is after the permissions check. And the actual privacy in the above hinges on the assumption that trackers "would not risk a prompt" (quoted from #697) to try for permissions.

So with that in mind, it is the context for my previous question, which went unanswered but is especially relevant now:

given these steps:

  1. getUserMedia
  2. permissions check: user may be prompted
  3. device gets successfully opened

there is no upgrade in privacy happens after the completion of step 2. Waiting for step 3, which the spec forces, has no privacy benefit.

Step 2 is the "prompt" in the quote above.

It is passing of step 2 which is meaningful -- tracker (or genuine web page) took risk on a prompt; user accepted it.

What, if any, increased privacy happens by completing step 3?

youennf commented 3 years ago

And the actual privacy in the above hinges on the assumption that trackers "would not risk a prompt" (quoted from #697) to try for permissions.

Either prompt or capture indicator.

What, if any, increased privacy happens by completing step 3?

Example provided in https://github.com/w3c/mediacapture-main/issues/709#issuecomment-688472076 identifies some benefits (basically capture indicator will prevent trackers to try this approach). There is no identified benefit to do that before step 3 after https://github.com/w3c/mediacapture-main/pull/717.

hills commented 3 years ago

And the actual privacy in the above hinges on the assumption that trackers "would not risk a prompt" (quoted from #697) to try for permissions.

Either prompt or capture indicator. [...] basically capture indicator will prevent trackers to try this approach

No, not the capture indicator.

Any software (malicious or otherwise) can call getUserMedia() and immediately close the stream == no capture indicator. It does not provide a counterpoint here.

And let's not forget, the precondition to all of this is that the site took the risk on the privacy prompt (making it is out of scope of the chosen definition of a "tracker"); and furthermore the user granted that permission.

The completion of the permissions check is the point at which enumerateDevices() should be allowed (with no adverse consequences), and if we can accept that then perhaps then...

There is no identified benefit to do that before step 3 after #717.

... the benefits of doing so will become relevant. But for now I agree to focus one one point at a time.

youennf commented 3 years ago

Any software (malicious or otherwise) can call getUserMedia() and immediately close the stream == no capture indicator. It does not provide a counterpoint here.

No. For instance, Chrome is adding a camera/microphone icon in its address bar if getUserMedia is called successfully, even after all tracks got stopped. In general, capture indicators should at all cost prevent an attacker to capture even very small amount of images or short audio samples without the user able to notice it. I agree the spec could add more guidance about how much indicators should stay live so that a reasonably-observant user will notice them. Note also that a getUserMedia call requires the page to have focus so the capture indicators will be visible.

And let's not forget, the precondition to all of this is that the site took the risk on the privacy prompt

No. The web site can use the permission API to know whether a user will be prompted or not. The prompt mitigation may not work in all configurations.

hills commented 3 years ago

Any software (malicious or otherwise) can call getUserMedia() and immediately close the stream == no capture indicator. It does not provide a counterpoint here.

No. For instance, Chrome is adding a camera/microphone icon in its address bar if getUserMedia is called successfully, even after all tracks got stopped. [...]

Ok, my apologies, we have some crossed-wires on the words "capture indicator".

I was referring to, in Chromium 87, the red circle which appears on the tab during capture. Firefox has its analogue of a microphone icon to the left of the URL bar. Both disappear when capture completes.

You are referring to the grey camera icon inset in the URL bar, which appears at some point and, in Chromium 87 at least, persists.

I'll assume your definition is the agreed one (I guess I'll just say "red blob" for the other). But perhaps it's now clearer why I would be so assertive about this indicator. I'm out of time for today, there are further points I think should be made but that will have to be later.

youennf commented 3 years ago

I was referring to, in Chromium 87, the red circle which appears on the tab during capture. Firefox has its analogue of a microphone icon to the left of the URL bar. Both disappear when capture completes.

Thanks for raising that issue. The spec says: "Any false-to-true transition indicated MUST remain observable for a sufficient time that a reasonably-observant user could become aware of it." but implementations are somehow lagging there. I filed https://github.com/w3c/mediacapture-main/issues/724 maybe we need to provide more guidance.

I'll assume your definition is the agreed one (I guess I'll just say "red blob" for the other).

I think both definitions are good.