Provide guidance on potential device label sanitization

jan-ivar commented 4 years ago

(From https://github.com/w3c/mediacapture-main/issues/640) Add non-normative guidance on removing any privacy sensitive information from exposed device labels.

While heuristics for this are not well-established, this would include personally identifying information beyond model and make. Things like serial numbers, customizable OS or Bluetooth device name labels that may include a user's login name or their machine network name. If such information is suspected or detected, best practice is to remove it and replace it with generic text containing no more information than type of device and model of manufacture.

bradisbell commented 3 years ago

Just to clarify, from the comments at https://github.com/w3c/mediacapture-extensions/issues/2#issuecomment-726288843, this label sanitization only applies to device info, and not the track label which can contain the full label, as configured by the user or OS or device.

jan-ivar commented 3 years ago

@bradisbell no, sanitation is different from removal. E.g. you should still be able to identify make.

DeviceInfo and track labels are supposed to match. The spec could be clearer about this, but that's my reading of "MUST return the label of the object's corresponding source".

In order to be effective, sanitation would need to apply to all exposures. I don't think users would expect browsers to leak personally identifiable information like their full name in a camera/microphone API, even behind permission.

Thanks for checking. I'll try to consolidate the language around label to make this clearer.

bradisbell commented 3 years ago

Thanks @jan-ivar. While I haven't encountered a device labeled with its serial number or network name, I don't doubt that these cases exist, and I understand the concern.

However, it is clear that users expect web-based applications to identify their media devices in the same way that other applications do, which is in-conflict with sanitizing device labels.

Here is a common example. My system has multiple sound devices, and I have configured some of them with custom labels. Main would have originally been named DVS Transmit 1-2, and Monitor would have been named DVS Transmit 3-4. (The "original" labels I'm referring to here come from the driver and its accompanying software.)

Windows Sound Control Panel

Channel Name Control Panel

It's important to note that in these cases, the only distinguishing factor is this device label. It's standard built-in capability to rename these devices. All other audio applications I know of are capable of using these devices by their configured names. Users expect to see audio devices labeled the same across all applications, whether they are web-based or native. Sanitized/modified labels are damaging to usability.

I don't think users would expect browsers to leak personally identifiable information like their full name in a camera/microphone API, even behind permission.

I disagree. If I have labeled an input, "Brad's Isbell's Microphone", and want to use it with a web application, I expect that web application to be able to display, "Brad Isbell's Microphone". This is just an anecdote though... I don't want to assume what other users may want, which is the key point:

This decision should be left to the user, and not predetermined by a specification.

jan-ivar commented 3 years ago

@bradisbell I appreciate discussing multi-device use cases. The points you raise (combined with localization) are why the heuristics of sanitation will likely be hard to standardize beyond general guidelines. E.g. "Main" and "Monitor" seem innocuous.

If I have labeled an input, "Brad's Isbell's Microphone"

What if the OS did this by default? Would that change expectations of exposure? Not everyone visits the dialogs you screenshot. Many may be unaware of the label entirely.

However, it is clear that users expect web-based applications to identify their media devices in the same way that other applications do, which is in-conflict with sanitizing device labels.

True. Competing concerns.

The level of sanitation being done today seems low, but this may change over time. Over time, moving to "user-chooses" and in-browser device pickers should help ameliorate this. E.g. Firefox does not sanitize in its built-in microphone picker, but does in the in-content one.

An adjacent use case is having two identical cameras labeled e.g. "Logitech c920", and you've positioned them to point at specific things for a specific purpose, e.g. streaming. Relying solely on users labeling these in the OS ahead of time seems inferior to also previewing in the app and even allowing users to change the labels within the app, e.g. "Me" and "My cat", or "Player 1" and "Player 2".

So long-term I see multi-device apps allowing users to pick from in-browser pickers, seeing default/custom labels as well as preview, and being able to update labels of their chosen devices in-app as needed.

Short-term I see browsers navigating this as best they can weighing the tradeoffs, maybe even on a case by case basis.

pes10k commented 2 years ago

I see that there is mention in the text now that device labels can be sanitized, but that doesn't quite address the request here, which is to give guidance on what a practical and useful stanatization strategy would be. That seems like something that'd greatly benefit from the expertise of this group, and which would be very relevant to an implementor.

@dontcallmedom suggested I give a proposal, so here is a straw suggestion:

remove serial numbers or device identifiers that are specific to the specific device (as opposed the model of the device)
When there is only one device with the same "{manufacturer} - {device type}" present, omit the model name (if present) from the device, and present only "{manufacturer} - {device type}". If there are multiple devices with the same "{manufacturer} - {device type}", extend to "{manufacturer} - {device type} - {model}"

Again i mean the above to only be a starting suggestion, and not a concrete proposal. You could maybe generalize the above by saying, for each device, prefer "{device type}", then "{device type} - {manufacturer}", then "{device type} - {manufacturer} - {model}", then "{device type} - {manufacturer} - {model} - {serial #}", in that order, to prevent any device collisions.

w3c / mediacapture-main

Provide guidance on potential device label sanitization #747