Closed pes10k closed 3 years ago
Privacy-by-default flow:
Initially site has access to no devices or device labels
1) site asks for category (or categories) of device 2) browser prompts user for one, many or all devices 3) site gains access to only the device, and device label, of the hardware the user selects.
I like this flow. Sadly, I do not think this is web-compatible anymore for camera/mic given the current widespread usage of camera/mic pickers in websites. For output speakers, plan is for https://github.com/w3c/mediacapture-output/issues/83 to follow the pattern you describe.
A few thoughts:
(Speaking to the subject, not the justification)
To be more private than Chrome, Firefox needs those labels for in-content device selection.
Currently, sites get access to either all devices and all labels (Chrome) or the "granted" device and all labels (Firefox) (not sure what others do).
To me, the headline here is: "Only grant device user has given permission to".
Unless the spec can reign in Chrome's model, I see no net privacy gains from making it harder for user agents to make more private choices, only harm.
Privacy-by-default flow:
Initially site has access to no devices or device labels
1) site asks for category (or categories) of device 2) browser prompts user for one, many or all devices 3) site gains access to only the device, and device label, of the hardware the user selects.
Are you saying all browsers need to implement this, or are these hoops only for those that do?
Mozilla argued for this 7+ years ago and lost.
Some use-cases that can benefit from getting the full list of devices:
Some use-cases that can benefit from getting the full list of devices:
Yes, knowing if there are more devices to ask for; some limited version of enumerateDevices()
with number of devices/kind after permission seems reasonable.
Some storable deviceId
is probably still needed to support multi-device setups in web apps.
Privacy-by-default flow:
(With my co-chair hat off) I like this flow too. Mozilla would be supportive if the PING wants to pursue this and mandate it for all browsers.
The WebRTC WG long ago went with in-content camera/mic selection over in-chrome picker designs championed by Firefox. In hindsight, this may have been a mistake given the fingerprinting abuse it's received. The privacy climate has changed as well.
More recent picker-based APIs like getDisplayMedia()
seem well-received, and Safari's plans for a picker for speakers seems reasonable to us. Given this, it seems sensible to consider pushing camera & mic selection in-chrome, if we can get buy-in from vendors.
If we are able to make progress there, we probably also need to think of how to handle grouped devices (groupID concept), for instance a camera or headset also having a microphone.
Are you saying all browsers need to implement this, or are these hoops only for those that do? Mozilla argued for this 7+ years ago and lost.
and
(With my co-chair hat off) I like this flow too. Mozilla would be supportive if the PING wants to pursue this and mandate it for all browsers.
Yes, the goal is to make this mandatory behavior in the spec, so that it would be consistent across implementors. The goal isn't to make life harder for folks already trying to protect privacy, but to make sure that correctly and throughly implementing the spec itself ensures a privacy-preserving environment (w/o having to resort to additional, non-standardized behaviors).
If we are able to make progress there, we probably also need to think of how to handle grouped devices (groupID concept), for instance a camera or headset also having a microphone.
I'm not familiar with this, but if I can be helpful in coming up with a privacy-friendly approach here, please let me know; happy to help however i can
We should assign this issue to somebody. It seems this is pending in-chrome selection proposal, which would happen in an extension spec.
Most browsers use labels given by the OS. We could probably mention in the spec what Firefox (and Safari now) are doing for some labels, i.e. sanitise them so that they do not leak important user information.
Apologies for possible redundancy, but given that there are several similar issues going around, I wanted to clarify before further comment:
Is this issue now being used to track what labels are available:
For 1, the spec is clear: no label is leaked. For 3, even if the user granted access to a device, label information might not be great to expose anyway. While we do not have a way to suppress this information today, we could provide some guidelines in the spec to mention that labels could be sanitized by the user agent to reduce the leakage. For 2, once we are happy with in-chrome picker, the plan would be to either expose only labels for authorised devices or stop exposing meaningful labels. Note that we are talking about camera and microphones. Hopefully, we can do better for speakers more easily in mediacapture-output spec.
In addition to in-chrome picker, we are also looking at breaking useful information out of label, see https://github.com/w3c/mediacapture-main/issues/698.
Thanks much @youennf , this is all terrific!
For 1, the spec is clear: no label is leaked.
Thats fantastic! Is that whats covered by "browsing context did not capture" in the second paragraph in section 9.2.1? If so, thats great, I just did not realize "did not capture" was ~= "no permission for any device".
For 3… we could provide some guidelines in the spec
I think that'd be great, especially if there was specific guidance that could be shared from vendor experience so far. FWIW, Brave will be adding some mild randomness to these labels, in such a way that we think will fluxom at least naive fingerprinting scripts, but still be useful to people (we haven't implemented yet since we're still looking to see the final state of this spec).
Also, just to say again, I appreciate that you all are partially constrained by web compat concerns, and how willing the WG has been to work through privacy-preserving solutions given those difficulties / constraints.
For 2, once we are happy with in-chrome picker, the plan would be to either expose only labels for authorised devices
Thats terrific. Is there a place PING or other interested parties could support in the in-chrome picker work, to help raise and address privacy concerns earlier in the process?
If understand this Issue correctly the only way for this to be applicable is for the devices to be presented at getUserMedia()
permission prompt. Chromium permission prompt is generic, listing only camera and microphone, yet the setSinkId()
example illustrates that granting permission to camera can result in audio output to speakers, see https://github.com/w3c/mediacapture-record/issues/196#issuecomment-653055850.
This specification needs to be honest and transparent about the fact that getUserMedia()
, as of the current date, is capable of selecting devices other than default microphone and camera. If the original charter only deals with microphone and camera that charter needs to be amended or repealed and re-enacted to handle the state of the art today.
A combination of Firefox getUserMedia()
(which lists Monitor of <device>
for audio device) and Chromium getDisplayMedia()
(which provides selection of "Entire Screen", "Application", and "Chromium/Chrome Tab") UI prompt as a multi-select list will solve this issue. None of the devices not listed should be listed at enumerateDevices()
or label
and deviceId
set to empty strings -during that permission granted session - if not explicitly selected at the initial prompt by the user.
The above solution requires Media Capture and Streams to officially acknowledge that capturing only microphone and camera while being the original design pattern, is obsolete, or at least needs to be updated to conform what is really occurring in the field, and to not force users to create workarounds https://github.com/w3c/mediacapture-main/issues/693#issuecomment-643283729 to achieve the expected result.
When getUserMedia({audio: true, video: true})
is executed the user should be able to select multiple devices. Simply drag and drop or arrow selection can be used to move selections in a list to the top of the list for initial capture of primary video and audio device (singular, respectively) as to MediaStream
resolved by getUSerMedia()
- though still simultaneously have access to the other selected devices (plural) via enumerateDevices()
. Only those initially selected devices will be listed and thus available for selecting and listed by enumerateDevices()
during the granted permission session. Otherwise enumerateDevices()
returns an empty list, or list with empty strings for values, if accessing devices without permission is an issue.
A minimal example of a UI which provides a means to select _multiple_ devices, in this case applications to be added to an panel, and to move the item up or down on the selection list using basic arrow icons for buttons.
The current prompt is based on selecting 1 video and, or 1 audio, yet enumerateDevices()
can select devices other than the 1 video and 1 audio selected at the prompt, which makes the current UI potentially misleading as to the scope of permissions the user is actually granting to access devices.
I just did not realize "did not capture" was ~= "no permission for any device".
@pes10k It's not; it's stricter: You must be actively capturing to see labels now. Persistent permission is insufficient.
This avoids web compat issues we'd otherwise see on revisits from the dominant browser implicitly persisting permission.
I've filed crbug 1101860 on Chrome over this. The Firefox bug is here.
I think thats great @jan-ivar . I appreciate that this issue, and the surrounding ones, have been a lot of work, but i think its gotten to a really good place. Thanks to you all for working all this stuff out :)
I'm curious of what kind of sanitization is or will be done, and whether there should be some guidance on what's reasonable.
Specifically:
1) Should sanitization allow multiple instances of the same model of device to be distinguishable? On some systems they may have the same label today. When a user sees "Generic Camera" and "Generic Camera", how do they know which one to pick, without trying them all? a) If the user agent stores the last-used or preferred device, as in the in-chrome device selection proposal, should sanitization preserve the above distinction as well across restarts?
2) Should sanitization allow apps apply some heuristics of their own in how to best use the device? For example "XYZ Cam 3000" may advertise that it's capable of doing 1080p at 30fps. But one may find it actually only doing 10fps reliably well. The application can easily add this label (or prefix) to its banlist and constrain it to lower resolution or lower fps. Ideally much of this problem should be solved the the media capabilities API, maybe? But what if the user agent is wrong? The application can mitigate this very quickly. The user agent would take much longer time to change and deploy.
Thanks.
When a user sees "Generic Camera" and "Generic Camera", how do they know which one to pick, without trying them all?
@q-alex-zhao Most sites show previews on their ⚙️ page, which seems like the best way to be sure the right camera is chosen and points the right way (I don't memorize my cameras' 8-digit Chrome codes) - Some modern devices (e.g airPods) let users rename them in the OS, something I think we'll see more of with IoT, so this seems solveable outside the web platform.
The user agent would take much longer time to change and deploy.
Specs necessarily take the long view, and we want to get away from overloading labels, because they're bad for web compat and privacy. If sites end up relying on their own banlists to do basic camera work, then we're doing something wrong.
I'm curious of what kind of sanitization is or will be done, and whether there should be some guidance on what's reasonable.
Specifically:
- Should sanitization allow multiple instances of the same model of device to be distinguishable? On some systems they may have the same label today. When a user sees "Generic Camera" and "Generic Camera", how do they know which one to pick, without trying them all? a) If the user agent stores the last-used or preferred device, as in the in-chrome device selection proposal, should sanitization preserve the above distinction as well across restarts?
- Should sanitization allow apps apply some heuristics of their own in how to best use the device? For example "XYZ Cam 3000" may advertise that it's capable of doing 1080p at 30fps. But one may find it actually only doing 10fps reliably well. The application can easily add this label (or prefix) to its banlist and constrain it to lower resolution or lower fps. Ideally much of this problem should be solved the the media capabilities API, maybe? But what if the user agent is wrong? The application can mitigate this very quickly. The user agent would take much longer time to change and deploy.
Thanks.
Evidently the only way to achieve consistency on the front-end is for the front-end to create a uniform UI.
Most sites show previews on their gear page, which seems like the best way to be sure the right camera is chosen and points the right way
Is that possible without executing getUsreMedia()
more than once?
@jan-ivar That's fine. Then I think maybe the spec should clarify that the label is for display purposes only and there is no guarantee that the label is tied to the "real" name of the device.
Thanks.
Can we close this issue?
It seems we could try providing some guidance on potential device label sanitization. If so, let's open a new issue specifically for providing non normative guidance.
We will present this issue at TPAC 2020 meeting with PING and explain why we feel it's done.
Presented and closed.
Long-term solution for removal of labels in enumerateDevices
is tracked in https://github.com/w3c/mediacapture-extensions/issues/2.
@jan-ivar one thing that came up during that conversation is for more details on whats needed by "sanitization", and more generally what implementors should do in response to "clarify label is for display purposes; don’t rely on == model/manufacturer."
Pointing to some satisfactory or guiding examples or existing implementations of "sanitization" functions / tools would be a great way to clarify and finish the issue up i think
(the text below assumes that by default sites get no labels until permission, as that issues is already being addressed here: https://github.com/w3cping/tracking-issues/issues/10)
Currently, sites get access to either all devices and all labels (Chrome) or the "granted" device and all labels (Firefox) (not sure what others do).
As device labels are, at the least, high value FP vectors, and may possibly be even more sensitive (prototype devices, "adult" cameras and devices, etc etc etc), sites should only have access to the labels of devices that the user has given permission to.
i.e. if the site requests access to a webcam, and the user has two plugged in but grants access to just one, the site should learn only one label, not all.
This is an important aspect of not letting the site pierce browser isolation except for the minimal set of information / access needed to achieve user goals