No way to reliably choose correct camera & microphone upfront

w3c / mediacapture-extensions

Extensions to Media Capture and Streams by the WebRTC Working Group

https://w3c.github.io/mediacapture-extensions/

Other

19 stars 15 forks source link

No way to reliably choose correct camera & microphone upfront #5

Open jan-ivar opened 4 years ago

jan-ivar commented 4 years ago

Visit a new web site in Chrome, Safari, or Edge, & do something requiring camera + microphone:

They'll say the site wants to "use your camera and microphone", without saying which ones (USB webcams, headsets, modern phones https://github.com/w3c/mediacapture-main/issues/655). If it's wrong, you'll need to correct it after the fact.

Browsers may not even choose the same camera and microphone, a web compat issue (e.g. headset detection).

Firefox is different, showing which camera and microphone will be used, even letting you change it (within the constraints of the app):

But not everyone with multiple devices use Firefox.

It would be better if users got to choose based on how many devices they have, not what browser they use, and maybe regardless of permission if an app is this indecisive on subsequent visits.

It also feels like this should be an app decision, not a browser trait.

Proposal A: https://github.com/w3c/mediacapture-main/pull/644#issuecomment-566248295 would fix this, prompting on indecision or lack of permission. But it may not be web compatible at this point.

Proposal B: Add a new getUserMedia boolean that enables the https://github.com/w3c/mediacapture-main/pull/644#issuecomment-566248295 behavior:

await navigator.mediaDevices.getUserMedia({video: true, chosen: true});

"chosen" means both tracks must be chosen by the user (or app), not the user agent.

In the interest of web compat, Firefox would remove its picker unless chosen is true, giving web users the same experience across browsers.

Proposal C: Same as B, but with new method:

await navigator.mediaDevices.chooseUserMedia({video: true});

Incidentally, this would be the same API used to replace in-content device selection https://github.com/w3c/mediacapture-main/issues/652.

q-alex-zhao commented 4 years ago

Proposal A makes the most sense to me, and it achieves the quickest / most intuitive setup flow from the user's perspective.

B and C seem less desirable -- I'm not sure how the application should choose which API to use, since if they already have device ID stored somehow, shouldn't they just specify it right in the GUM call?

guest271314 commented 4 years ago

Can this

In the interest of web compat, Firefox would remove its picker unless chosen is true, giving web users the same experience across browsers.

be described in detail?

Do you mean removing the picker from Firefox altogether?

Firefox provides a means to select "Monitor of <device>" at the picker, which is a plus for users and is not possible at Chromium. Would suggest to nudge other implementers to model their UI pickers based on the Firefox model, instead of removing Firefox picker, if that is what is meant.

await navigator.mediaDevices.getUserMedia({video: true, chosen: true});

and

await navigator.mediaDevices.chooseUserMedia({video: true});

if new features are being proposed to be added to the specification to enhance which specific devices can be selected, why were the issues closed (https://github.com/w3c/mediacapture-main/issues/629, https://github.com/w3c/mediacapture-main/issues/650) relevant to selecting "Monitor of <device>" which essentially ask for the same capability described in this issue?

If this issue will provide a means to select "Monitor of <device>" at Chromium am all for it. If this issue will reduce the devices that can be seleccted at a picker or programmatically, akin to Chromium behaviour, FWIW, am against such a limitation on user options.

jan-ivar commented 4 years ago

if they already have device ID stored somehow, shouldn't they just specify it right in the GUM call?

@q-alex-zhao Yes, and if they specify {deviceId: {exact: id}} then there's no difference.

The concern is whether it's common to use {deviceId: id} on revisit to mean "use device from last time if available, or the default device if not". https://github.com/w3c/mediacapture-main/pull/644 would change that to "if user has > 1 cams/mics always show picker with device from last time as the default choice, even if you have permission".

Those are incompatible semantics.

Same with {facingMode: "user"} instead of {facingMode: {exact: "user"}}.

We also got anecdotal feedback yesterday of "sites calling getUserMedia all over the place", using the constraints object as a lazy handle, which today won't prompt while live tracks are in use (not actually guaranteed by the spec, but works in all browsers). If widespread, this may mean even more redundant prompts from this change.

I suppose opinions vary on whether this counts as "breaking the web" or merely nudging sites toward better patterns...

I'm not sure how the application should choose which API to use

Use {chosen: true} to replace your in-content device selection code.
Inital gUM prompt: {chosen: true} = picker like Firefox. false = user agent chooses.

jan-ivar commented 4 years ago

In the interest of web compat, Firefox would remove its picker unless chosen is true, giving web users the same experience across browsers.

be described in detail?

@guest271314 All proposals would mandate pickers in the other browsers akin to Firefox's. If we go with the {chosen} opt-in, then for consistency, one could argue Firefox should only show its picker if opted in, to align with the other browsers. We don't have to though.

This is less a "new feature" than a privacy fix to pass PING review.

guest271314 commented 4 years ago

All proposals would mandate pickers in the other browsers akin to Firefox's.

is far more concrete and meets the requirement.

Leaving room for compliance to implementers is room for non-compliance.

Do not gather the {chosen} option. That appears to be exactly what occurs now at Chromium: Select camera or microphone (singlular). To change that requires going in to Settings and selecting a different device. Still further to select "Monitor of <device>" for Chromium input/output requires going outside of the browser altother and using pavucontrol at *nix.

All of the devices (virtual, e.g., "Fake" device and non-virtual, e.g., actual device) should be available at the initial prompt, instead of one or the other.

{chosen} is not in the specification. .chooseUserMedia({video: true}); is not in the specification. That means new features. If the concern is actually privacy, then allowing users to select any device the OS has registered should be the goal, without arbitrary limitations imposed by specification authors or implementers, to avoid the necessity to create "hacks" to get around whatever barrier to functionilty specification authors or implementers are imposing on that given day (e.g., transferControlToOffscreen() vanished from Nightly without so much as a courtesy post on a Mozilla blog - after an author had spent time fixing at least two bugs re OffscreenCanvas https://bugzilla.mozilla.org/show_bug.cgi?id=1609238#c2).

Expose the gamut of options technically possible and let the front-end do what it pleases.

youennf commented 4 years ago

If it's wrong, you'll need to correct it after the fact.

Note that this might not be as big a deal as one might think.

First, the spec states that default devices should be preferred if possible and users would probably expect those devices to be used. I guess in most cases, these default devices are actually selected.

Second, this only happens for the first visit of a user to that website. Afterwards, either the website or the browser could actually remember whose devices are captured.

Proposal C: Same as B, but with new method:

It seems to me a picker based API is different from a permission request API as is somehow getUserMedia. A new API might be a better fit and would not need to be consistent with the permission request API. The role of the constraints mechanism becomes very important for the selection of the device. Do we want exact/ideal distinctions? Do we want to allow a web page to say something like "I really want audio and if possible, camera as well, from the same grouped device if possible"? How do we handle exposing newly plugged-in devices and whether site should call again the API or not?

Also, a new API for roughly the same functionality of an existing API needs a good transition plan to deprecate the existing API. I guess we could weaken progressively the power of getUserMedia to induce transition to the new API, something like only capturing with default devices.

jan-ivar commented 4 years ago

It's not a big ideal if we're fine with permission being granted to all devices upfront. But it's incompatible with a per-device permission model, because the user doesn't know which device + label they're revealing https://github.com/w3c/mediacapture-main/issues/640#issuecomment-549540203, which can be confusing even if it ends up right in the end. "Hey, I got vanilla, that's what I wanted!"

To flip what you're saying, consider this thought experiment: If browsers were to reveal this, e.g.:

Allow site X to access your front camera and microphone?

...some subset of users will say: "No, that's wrong. I want to use my back camera", and want to change it. If the prompt doesn't allow them, they'll probably to hit Deny, not Allow, and ugh.

We could say that's not a big deal, because it only happens to a small subset of people. But I think it's a big deal to them, because it will likely happen over and over to them, and become a frustration as they learn to work around it. If we go out and buy a better camera, we join that group.

It seems to me a picker based API is different from a permission request API as is somehow getUserMedia.

I don't think so. Look at getDisplayMedia. Let's separate API from UX for a moment: I've found nothing in the API that needs to be different. It's a simple accessor API: JS requests track(s) given constraints, later returns track(s) or throws error if user said no.

The differences I've highlighted are all on the UX side, and AFAICT stem from context, not how the call was made. The overly aggressive "Deny" button? Already a bug in Firefox.

youennf commented 4 years ago

some subset of users will say: "No, that's wrong. I want to use my back camera", and want to change it. If the prompt doesn't allow them, they'll probably to hit Deny, not Allow, and ugh.

I think you are arguing for a picker that the web app might influence for the default device to pick, but not constraining the device the user may pick. This makes sense to me.

I don't think so. Look at getDisplayMedia.

getDisplayMedia does not have all these constraints allowing to reduce the 'devices' that the user can select. getUserMedia has these constraints.

getDisplayMedia allows to capture audio but this is optional, getUserMedia does not have such 'optional' choice.

guidou commented 4 years ago

What if instead of changing the API, we just state with more details what the prompt may/should/must do?

We can say that the first step of getUserMedia() is to present a prompt that allows the user to select which devices the document is authorized to use. This selection remains valid for the rest of the session unless a device change is detected, so that no further prompts are needed. getUserMedia() and enumerateDevices() should behave as if the authorized devices were the only ones that exist.

Authorizing all devices may be allowed in addition to allowing only a specific device. It may be possible to persist the permissions for a domain so that the prompt can be skipped in future sessions.

If the set of devices changes, the prompt appears again in the next getUserMedia call (unless the user gave permission to use all devices).

We would need to discuss more about the details, but I think this approach can address most of the privacy issues that have been presented without breaking existing applications and with only minimal changes to the API. In terms of implementation, current browsers would need to update their existing prompts to comply with some extra privacy requirements and make the corresponding changes in the set of visible devices for a given document.

Some of the behavior changes introduced by this approach are that enumerateDevices() would return an empty list if no devices have been previously authorized, and NotAllowedError would be replaced by NotFoundError. I think these changes should manageable by most existing applications.