In-content device selection a mistake. Too complicated, leaks info

jan-ivar commented 4 years ago

In hindsight, in-content device selection was a mistake. It's

Too permissive—assumes all devices granted up-front to work effectively (wo/reprompt)
Too complicated—having every site write a decent picker compatibly has been a failure:
- Exhibit A: Changing camera or mic in webrtc samples re-prompts both in Firefox; flickers
- Mobile devices typically can't open more than once device at a time ("stop-then-pick")
- Stop-then-pick is an inferior user experience on desktops
- Dealing with different browser permission models severely limits design (no previews)
Leaks private info—fails PING review https://github.com/w3c/mediacapture-main/issues/640
Too limiting—no path to privacy (can't avoid redundant re-prompts after user selection)

The PING outlines the way forward in https://github.com/w3c/mediacapture-main/issues/640#issuecomment-549540203:

Privacy-by-default flow:

Initially site has access to no devices or device labels

site asks for category (or categories) of device

browser prompts user for one, many or all devices

site gains access to only the device, and device label, of the hardware the user selects.

That's an in-chrome picker ("in-chrome" = implemented in the browser). In-chrome pickers

Are proven successful—in getDisplayMedia
The way forward—for speakers https://github.com/w3c/mediacapture-output/pull/86
Remove the need to grant all devices ❤️
Let UAs solve mobile platforms that can’t open multiple devices (mute temporarily)
Let UAs solve camera previews maybe even in a (non-creepy) Brady Bunch grid!

~https://github.com/w3c/mediacapture-main/pull/644 is my proposal for reshaping getUserMedia to serve this need, as well as solve https://github.com/w3c/mediacapture-main/issues/648.~

guest271314 commented 4 years ago

If gather the issue correctly, could not a <select multiple> or equivalent UI be used to select multiple devices, where such implementation can be uniform across implementations?

  <form>
    <select multiple>
      <option value="camera_1">Camera 1</option>
      <option value="camera_2">Camera 2</option>
      <option value="audio_input">Audio input</option>
      <option value="audio_output">Audio output</option>
    </select>
    <input type="submit">
  </form>
  <pre></pre>
  <script>
    document.forms[0]
    .onsubmit = e => {
      e.preventDefault();
      e.target.nextElementSibling
      .textContent = JSON.stringify(
        [...e.target.elements[0].selectedOptions].map(({value}) => value)
      , null, 2);
    }
  </script>

For direct request for permission of multiple devices using getUserMedia() alone, one option would be to allow an array of constraints to be passed

getUserMedia({audio:true, video:true})` // default
.then(permission_stream => enumerateDevices()) // permission required w3c/mediacapture-main#640
.then(devices => {
   getUserMedia([
    {audio:{deviceId:{exact:<device_id>}}}
  , {audio:{deviceId:{exact:<other_device_id>}}}
  , {video:{deviceId:{exact:<specific_device_id>}}}
  ])
})

jan-ivar commented 4 years ago

No, the goal is not multiple device selection. I'd like to focus on replacing what apps do today. E.g.

...except when the user clicks Logitech BRIO it brings up a browser-specific selector where "Logitech BRIO" is selected by default, and the user can change it to something else, or cancel.

This solves the all the problems I outline in the initial description.

jan-ivar commented 4 years ago

https://github.com/w3c/mediacapture-main/pull/644#issuecomment-566248295 accomplishes this by tweaking the existing getUserMedia request method a bit, and relying on browsers to figure out the rest from context. E.g.:

camera.innerText = cameraTrack.label;

camera.onclick = async () => cameraTrack = (await navigator.mediaDevices.getUserMedia({
  video: {deviceId: cameraTrack.getSettings().deviceId}
})).getVideoTracks()[0];

...brings up a selector with the existing device as default (like Firefox does on initial prompt), except instead of "Don't Allow"/"Allow", users might see "Cancel"/"Allow" so they can back out safely.

Details are up to browsers. The context here is the application is using a device already.

juberti commented 4 years ago

Generally not persuaded by "too complicated" arguments - developers have consistently asked for more control rather than less.

guest271314 commented 4 years ago

@juberti Re "too complicated" am reminded of a post, if recollect correctly, from you, where, paraphrasing, WebRTC implementation in Chrome source code exceeds the source code of a U.S. space shuttle mission? Given that fact, there does not appear to be any hinderance to achieving any use case or requirement provided the appropriate will to do so.

(Recently read or listeneded to some article or interview, am not able to cite the source at this moment, where an individual basically stated that the people who build browsers are essentially the smartest people in the world.)

juberti commented 4 years ago

Yes, there is a lot of code in Chrome, but this has benefits and drawbacks. One drawback of pushing more work into Chrome is that application developers have less control.

jan-ivar commented 4 years ago

@juberti Applications have had this power for 7 years, and all that appears to have come from it are rudimentary pickers that mostly don't work smoothly across all browsers and platforms.

To support this model, users have had to give up permission to all their devices just to use one. This cost/benefit doesn't seem reasonable at face value, and would appear to violate our priority of constituencies.

But let's be specific: what use case would be hampered by an in-chrome picker? Note that https://github.com/w3c/mediacapture-main/pull/644 does not remove deviceIds or the ability to constrain on them. So I don't see the power-loss.

juberti commented 4 years ago

Headset detection and auto-selection?

jan-ivar commented 4 years ago

To clarify, https://github.com/w3c/mediacapture-main/pull/644 doesn't actually remove in-content selection, it removes 1 of 3 decision makers:

application
~user agent~
user

The application still has full control. Only when it is indecisive, instead or "browser chooses" we'd say "user picks". This removes implementation dependency, improving web compat.

In-content selection would continue to work on browsers that grant all devices. But browsers would also be required to implement a picker when constraints don't reduce to 1 choice (<5% of users?)

Why? Firefox might drop labels to satisfy https://github.com/w3c/mediacapture-main/issues/640 so its in-content selector would be inferior by default:

Camera 1
Logitech BRIO
Camera 3

This is where a spec-mandated picker would be valuable, giving sites A) a choice about whether they want to handle all this complexity or not, and B) a more compatible way to ask users the same thing that works with stricter-permission browsers (like Firefox). A win for privacy.

I'm happy to have in-chrome compete with in-content. I think we can make it superior.

Headset detection

@juberti How is that done today? Do you have an example? By the browser or the app? By browser is not web compatible. By app we still have groupId.

and auto-selection?

Not sure what that is, but sites could still select the system default by specifying the first deviceId from enumerateDevices() like today if not having a selector up front is truly important to them:

getUserMedia({video: {deviceId: {exact: deviceInfo.deviceId}}}); // No selector

See https://github.com/w3c/mediacapture-main/pull/644#issuecomment-566248295 for full table.

q-alex-zhao commented 4 years ago

Re. headset & auto-selection: maybe the application wants to do the following when the user plugs in some new devices:

if the new audio input & output devices belong to the same headset, automatically switch to use them, because that's most likely what the user wants
if we didn't have a camera before but now the user plugs in one, automatically start using it, because that's mostly likely what the user wants
otherwise, do nothing

Maybe the application has more complicated logic... I don't think the user agent should be expected to implement things like these. As long as the application can still achieve the same result, I think delegating the baseline device choosing task to the user agent sounds good.

jan-ivar commented 4 years ago

maybe the application wants to do the following when the user plugs in some new devices:

Apps will still be able to do that like today using:

navigator.mediaDevices.ondevicechange = async () => {
  const devices = await navigator.mediaDevices.enumerateDevices();
  const [cams, mics, speakers] = ["videoinput", "audioinput, "audiooutput"]
    .map(type => devices.filter(({kind}) => kind == type));

if the new audio input & output devices belong to the same headset, automatically switch to use them, because that's most likely what the user wants
const [mic] = subtract(mics, oldMics);
const speaker = speakers.find(({groupId}) => mic.groupId == groupId);
if (mic && speaker) {
video.srcObject = new MediaStream([
...video.srcObject.getVideoTracks(),
await navigator.mediaDevices.getUserMedia({
audio: {deviceId: {exact: mic.deviceId}}
})
]);
await element.setSinkId({sinkId: speaker.deviceId});
}
if we didn't have a camera before but now the user plugs in one, automatically start using it, because that's most likely what the user wants

  const [cam] = cams;
  if (cam && !oldCams.length) {
    video.srcObject = new MediaStream([
      ...video.srcObject.getAudioTracks(),
      await navigator.mediaDevices.getUserMedia({
        video: {deviceId: {exact: cam.deviceId}}
      })
    ]);
  }
  oldMics = mics;
  oldCams = cams;
};

The only difference is we won't have label (all this presumes some initial permission btw).

I think delegating the baseline device choosing task to the user agent sounds good.

Great! 🙂

bradisbell commented 3 years ago

Removing device labels is damaging to usability for anyone using multiple sound devices. It is critical for the application to show to the user what inputs/outputs correlate to which physical devices.

Suppose I have a typical audio mixer application with 8 inputs. The user shares these devices with the application. What are we to display in the UI? "Microphone 1", "Microphone 2", "Microphone 3"... we don't even know that they're microphones. At least with device labels, I can say "Line In 1/2", "Logitech Camera L/R". Otherwise, the user has no way of really seeing what goes to what. If a user gives permission to a device, they should be able to fully utilize that device with the web application, which includes seeing what device it is.

I think a way to solve the privacy concern is to hide the label for a "default" device, while still providing useful labels when specific devices are chosen.

This is an example of some recent privacy-related decisions that have significantly damaged the usefulness of web-based media applications. I'd urge everyone making these decisions to consider the broader uses of these APIs, beyond video calling. I think that privacy concerns can be adequately addressed while still empowering users to use their own hardware on the web.

jan-ivar commented 3 years ago

Suppose I have a typical audio mixer application with 8 inputs. The user shares these devices with the application. What are we to display in the UI?

@bradisbell Thanks for bringing up this example, as multi-device use is a supported, if still emerging, use case. This spec (so far) only deprecates deviceInfo.label not track.label for this reason. Sites retain access to labels of devices that are live.

So this issue didn't get rid of labels entirely, which has its problems, but we're tackling those in separate issues like https://github.com/w3c/mediacapture-main/issues/747 and https://github.com/w3c/mediacapture-extensions/issues/1. Hope that helps clarify.

bradisbell commented 3 years ago

@jan-ivar Thank you for that clarification! I've done some testing, and I think that keeping the track label solves the issue for my use case.

w3c / mediacapture-extensions

In-content device selection a mistake. Too complicated, leaks info #2