w3c / mediacapture-output

API to manage the rendering of audio on any audio output device
https://w3c.github.io/mediacapture-output/
Other
26 stars 25 forks source link

The first "audiooutput" `MediaDeviceInfo` returned from `enumerateDevices()` is not the default device when the default device is not exposed #133

Open karlt opened 1 year ago

karlt commented 1 year ago

https://github.com/w3c/mediacapture-output/issues/113#issue-704418273 proposed output device ordering in enumerateDevices() results to identify the default audio output device. It also said "This also probably means it should always be exposed since '' allows to be used by setSinkId."

While the first position for the default device was specified in https://github.com/w3c/mediacapture-main/pull/757, there was no change to always expose the default device and no discussion around this. The result is that a client app does not know whether or not the first audio output device returned from enumerateDevices() is the default device.

karlt commented 1 year ago

This is causing trouble on some web conferencing sites that appear to be assuming that the first device is the default device. When the default device is not exposed, they don't offer a way to switch to the default device nor (sometimes) to switch to the first exposed device. The biggest problem is that sites present UI to (sometimes) allows switching audio output away from the default device, but there is then no way to switch back to the default device. This would be a regression on adding support for setSinkId().

karlt commented 1 year ago

Proposal: Always include, in enumerateDevices() results, a (single) MediaDeviceInfo object for the default device if there are any audio output devices exposed. If the physical audio output device that is currently the default device would not be exposed if it were not the default, then remove fingerprinting surface from the attributes of this MediaDeviceInfo.

karlt commented 1 year ago

For a MediaDeviceInfo representing a default device that would not otherwise be exposed, perhaps the simplest solution in terms of consistency with the current spec might be to initialize only the kind attribute, as for dummy "videoinput" and "audioinput" devices when their respective "information can be exposed" flags are false. Some variations on this may also be considered:

A generic minimal-fingerprint label meaning something like "System default audio output device for this user-agent" may be useful to provide the user with some indication of what the option represents should a client app present this device in UI constructed from unfiltered set of MediaDeviceInfos. This would preferably be localized, so the precise wording would not be specified.

Possible variations might have deviceId that is empty or that of the physical device (defaultDeviceId). Exposing a deviceId not otherwise available might be seen as an advantage or disadvantage.

Either an empty deviceId or an empty label would be sufficient to indicate that the label does not describe a physical device.

Perhaps similar options exist for groupId.

I'm favoring empty deviceId and groupId, and allowing a non-empty label.

dontcallmedom commented 1 year ago

A generic minimal-fingerprint label meaning something like "System default audio output device for this user-agent" may be useful to provide the user with some indication of what the option represents should a client app present this device in UI constructed from unfiltered set of MediaDeviceInfos. This would preferably be localized, so the precise wording would not be specified.

Should this be adopted, it would need support for localization in the label attribute, which would be new input to https://github.com/w3c/mediacapture-main/issues/665

jan-ivar commented 1 year ago

I'm favoring empty deviceId and groupId, and allowing a non-empty label.

This SGTM.

guidou commented 2 weeks ago

Chromium's solution is to use "default" as device ID. Given that no other browser currently exposes the system default audio output device and that Chromium has been exposing it for years using this device ID, this would be a much more Web-compatible solution. There is little difference between using "default", "" or any other constant string.

Wrt to the label, I think we should let browsers decide what label to use. I don't think an empty label would be useful since it would require applications to give special treatment to such a label.

Wrt groupId I see no problem with allowing groupId to be the group ID of the physical device that the system default is currently pointing to, as long as it gets updated when the system default device starts to point to a different device. I also have no problem with also allowing it to be empty or uniquely nonempty.

jan-ivar commented 1 week ago

The proposal in https://github.com/w3c/mediacapture-output/issues/133#issuecomment-1271122304 is a neutered entry instead of the physical "device that would not otherwise be exposed", and only when it is the OS default.

E.g. if the user selectAudioOutputs their AirPods AND they're the current OS default (which is common after they put them on in macOS), then no other entry is added to enumerateDevices:

 {label: "AirPods", deviceId: "234", ...}

If they then go to macOS's System Settings→Sounds and change the OS default to "MacBook Pro Speakers", you'll see:

 {label: "System default audio output device", deviceId: "", ...}
 {label: "AirPods", deviceId: "234", ...}

If they then selectAudioOutput their MacBook Pro Speakers, you'll see:

 {label: "MacBook Pro Speakers", deviceId: "123", ...}
 {label: "AirPods", deviceId: "234", ...}

@karlt is that right? This solves finding the default device in the spec:

const defSpkr = (await mediaDevices.enumerateDevices())
                                   .find(d => d.kind == “audio-output”);

Chromium's solution is to use "default" as device ID.

Unfortunately, this poses a competing model for finding the default device:

const defSpkr = (await mediaDevices.enumerateDevices())
                                   .find(d => d.kind == “audio-output” && d.deviceId == "default");

There is little difference between using "default", "" or any other constant string.

setSinkId("") is special and unsets the sinkId. What would adding a second special value accomplish?

karlt commented 1 week ago

E.g. if the user selectAudioOutputs their AirPods AND they're the current OS default (which is common after they put them on in macOS), then no other entry is added to enumerateDevices:

 {label: "AirPods", deviceId: "234", ...}

If they then go to macOS's System Settings→Sounds and change the OS default to "MacBook Pro Speakers", you'll see:

 {label: "System default audio output device", deviceId: "", ...}
 {label: "AirPods", deviceId: "234", ...}

If they then selectAudioOutput their MacBook Pro Speakers, you'll see:

 {label: "MacBook Pro Speakers", deviceId: "123", ...}
 {label: "AirPods", deviceId: "234", ...}

@karlt is that right? This solves finding the default device in the spec:

const defSpkr = await mediaDevices.enumerateDevices()
                                  .find(d => d.kind == “audio-output”);

That's correct, and this finds a deviceId that can be passed to setSinkId() or compared with a sinkId. To distinguish between a virtual and physical device and so whether the physical default device is exposed, the deviceId would be compared with "".

Chromium's solution is to use "default" as device ID.

Unfortunately, this poses a competing model for finding the default device:

const defSpkr = await mediaDevices.enumerateDevices().
                                  .find(d => d.kind == “audio-output” && d.deviceId == "default");

There is little difference between using "default", "" or any other constant string.

setSinkId("") is special and unsets the sinkId.

setSinkId("") and sinkId are the main reason for re-using "", to which the spec already gives meaning as the user-agent default device. This is also consistent with the empty deviceId on the single "audioinput" (or "videoinput") MediaDeviceInfo provided to indicate that at least one microphone exists before "audioinput" devices are exposed.

Chrome is using deviceId: "" and deviceId: "default" in different situations. When no "audiooutput" devices are exposed, Chrome provides

{label: "", deviceId: "", groupId: ""}

The spec (and the proposal here, which is Gecko's behavior now) would expect no "audiooutput" devices.

When the default "audiooutput" device is exposed via getUserMedia(), Chrome provides (on Linux)

{label: "Default", deviceId: "default", groupId: "default"}
{label: "Built-in Audio Analog Stereo", deviceId: "38cf59402979c7c92a7dfe139b46a07932288d562b1dce9bca711b1e7d2097bc", groupId: "fc5673e12055618bb6840857ef5ffaf81ae28949a50d518734c167530922a3bf"}

https://jan-ivar.github.io/dummy/enumerate.html is useful for testing browser behavior.

In Chrome currently find(d => d.kind == “audio-output”) returns the MediaDeviceInfo with deviceId: "default", so find(d => d.kind == “audio-output” && d.deviceId == "default") would return the same.

AFAIK there is currently no way to create a situation where Chrome would expose a non-default "audiooutput" device without exposing the user-agent default "audiooutput" device. If there were, then Chrome's virtual deviceId: "default" device would be a solution to the problem of sites that assume that the first "audiooutput" device is the default device, because in Chrome setSinkId("default") switches output to the user-agent default if the default is exposed.

Chrome's deviceId: "default" on its virtual device is not a solution to the issue of providing a client app with a means to determine which exposed physical device, if any, is the user-agent default device.

guidou commented 1 week ago

That's correct, and this finds a deviceId that can be passed to setSinkId() or compared with a sinkId. To distinguish between a virtual and physical device and so whether the physical default device is exposed, the deviceId would be compared with "".

If "default" is the ID for the system default device as exposed in enumerateDevices() then you just need to compare with "default" instead of "".

Chromium's solution is to use "default" as device ID.

Unfortunately, this poses a competing model for finding the default device:

const defSpkr = await mediaDevices.enumerateDevices().
                                  .find(d => d.kind == “audio-output” && d.deviceId == "default");

What is exactly the problem with this? And the model is competing with what? There is nothing in the spec as currently written that specifies what the deviceId field should be for a system-default device.

There is little difference between using "default", "" or any other constant string.

setSinkId("") is special and unsets the sinkId.

setSinkId("") and sinkId are the main reason for re-using "", to which the spec already gives meaning as the user-agent default device. This is also consistent with the empty deviceId on the single "audioinput" (or "videoinput") MediaDeviceInfo provided to indicate that at least one microphone exists before "audioinput" devices are exposed.

The spec uses the empty string for several different things. And the deviceId field for the system-default device in the output of enumerateDevices is not one of those things.

Chrome is using deviceId: "" and deviceId: "default" in different situations. When no "audiooutput" devices are exposed, Chrome provides

{label: "", deviceId: "", groupId: ""}

In this case "" is not the ID of the default device. It is an entry with that signals that the system has output devices, but the information about them cannot be exposed (because gUM hasn´t been called or because there are no permissions).

The spec (and the proposal here, which is Gecko's behavior now) would expect no "audiooutput" devices.

When the default "audiooutput" device is exposed via getUserMedia(), Chrome provides (on Linux)

{label: "Default", deviceId: "default", groupId: "default"}
{label: "Built-in Audio Analog Stereo", deviceId: "38cf59402979c7c92a7dfe139b46a07932288d562b1dce9bca711b1e7d2097bc", groupId: "fc5673e12055618bb6840857ef5ffaf81ae28949a50d518734c167530922a3bf"}

https://jan-ivar.github.io/dummy/enumerate.html is useful for testing browser behavior.

In Chrome currently find(d => d.kind == “audio-output”) returns the MediaDeviceInfo with deviceId: "default", so find(d => d.kind == “audio-output” && d.deviceId == "default") would return the same.

AFAIK there is currently no way to create a situation where Chrome would expose a non-default "audiooutput" device without exposing the user-agent default "audiooutput" device.

This is correct.

If there were, then Chrome's virtual deviceId: "default" device would be a solution to the problem of sites that assume that the first "audiooutput" device is the default device, because in Chrome setSinkId("default") switches output to the user-agent default if the default is exposed.

I haven´t thought about the details of what Chrome would do if it had a per-device permission model, but if the system default device is exposed by enumerateDevices, its ID (whatever it is) should be accepted by setSinkId and should send the output to the system default device.

Chrome's deviceId: "default" on its virtual device is not a solution to the issue of providing a client app with a means to determine which exposed physical device, if any, is the user-agent default device.

Why would using the empty string (which has different meanings in different contexts) as ID for the default device in enumerateDevices solve this problem and not "default"?

The way Chrome solves that problem right now is by setting the groupId to the same groupId of the physical device currently considered the default.

jan-ivar commented 1 week ago
const defSpkr = (await mediaDevices.enumerateDevices())
                                   .find(d => d.kind == “audio-output” && d.deviceId == "default");

What is exactly the problem with this?

It will only work in Chrome, not the spec. E.g. in the first "AirPods" example above.

And the model is competing with what?

The spec, which says the way to find the default is to assume it's the first one listed:

const defSpkr = (await mediaDevices.enumerateDevices()).find(d => d.kind == “audio-output”);

These are competing ways to learn the same thing, and we want web developers to adopt the interoperable way.

Chrome's virtual (large "D") Default device is listed first, so why do we need another way?

I haven´t thought about the details of what Chrome would do if it had a per-device permission model, but if the system default device is exposed by enumerateDevices, its ID (whatever it is) should be accepted by setSinkId and should send the output to the system default device.

There's no all-speakers permission model in https://w3c.github.io/mediacapture-output

Chrome's deviceId: "default" on its virtual device is not a solution to the issue of providing a client app with a means to determine which exposed physical device, if any, is the user-agent default device. ... The way Chrome solves that problem right now is by setting the groupId to the same groupId of the physical device currently considered the default.

That's clever, but means web developers need to write additional code to work around Chrome's virtual Default device:

const speakers = (await mediaDevices.enumerateDevices()).filter(({kind}) => kind == "audio-output");
const defSpkr = speakers.reverse().find(({groupId}) => groupId == speakers[0].groupId);
guidou commented 1 week ago
const defSpkr = (await mediaDevices.enumerateDevices())
                                   .find(d => d.kind == “audio-output” && d.deviceId == "default");

What is exactly the problem with this?

It will only work in Chrome, not the spec. E.g. in the first "AirPods" example above.

What I'm saying is, what would be the problem with this if we say in the spec that the deviceId of a system default device is "default". So far, Chromium is the only browser that exposes a system default device. Safari doesn't expose output devices at all and Firefox exposes output devices, but not a system default device.

In my experience with Firefox 129, it shows the physical device currently marked as system default device as the first element in the output of enumerateDevices, but if the system default devices changes, the enumerateDevices result doesn't change, which is correct if we interpret this device to be a user-agent default (not system default).

guidou commented 1 week ago

It will only work in Chrome, not the spec. E.g. in the first "AirPods" example above.

And the model is competing with what?

The spec, which says the way to find the default is to assume it's the first one listed:

Chromium lists the default as the first device, so no contradiction there. I think I misunderstood the original problem because I conflated system default device with UA default device. In Chromium they are the same and it's listed first as per the spec, but this doesn't need to be the case.

const defSpkr = (await mediaDevices.enumerateDevices()).find(d => d.kind == “audio-output”);

These are competing ways to learn the same thing, and we want web developers to adopt the interoperable way.

Chrome's virtual (large "D") Default device is listed first, so why do we need another way?

We don't need another way. So, my initial understanding was that it was not possible to know if the first exposed device is a default device, but I was conflating system default with UA default, which are not necessarily the same. The actual problem is that when you have a per-device permission model there is no way to know if the first exposed device is the UA default.

There's no all-speakers permission model in https://w3c.github.io/mediacapture-output

There is no mandated permission model. IIUC now, the problem here is specific to a per-device permission model. In that case, I wouldn't be opposed to a field indicating if an entry is UA default.

Chrome's deviceId: "default" on its virtual device is not a solution to the issue of providing a client app with a means to determine which exposed physical device, if any, is the user-agent default device. ... The way Chrome solves that problem right now is by setting the groupId to the same groupId of the physical device currently considered the default.

That's clever, but means web developers need to write additional code to work around Chrome's virtual Default device:

const speakers = (await mediaDevices.enumerateDevices()).filter(({kind}) => kind == "audio-output");
const defSpkr = speakers.reverse().find(({groupId}) => groupId == speakers[0].groupId);

I agree. That would be a solution to the problem of knowing if an entry for a physical device corresponds to the current system default device if you expose a system default device like Chromium does, but it's not a solution for the case when the UA default is one of the physical entries.

jan-ivar commented 1 week ago

System default = OS default. Firefox has a bug where it doesn't update enumerateDevices when the OS default changes. We hope to fix this soon. Sorry for any confusion.

There is no mandated permission model.

setSinkId says: "If sinkId ... does not match any audio output device identified by the result that would be provided by enumerateDevices(), reject p ...".

enumerateDevices's exposure decision algorithm for devices other than camera and microphone says:

  1. If deviceInfo.deviceId is in [[explicitlyGrantedAudioOutputDevices]], return true.
  2. If deviceInfo.groupId is the same as the groupId of any microphone in microphoneList, return true.
  3. return false.

IOW, only deviceIds exposed in enumerateDevices() are valid inputs, and the only way to expose them there is through selectAudioOutput() or getUserMedia({audio: true}) for speakers with a microphone groupId.

I don't see any allowance for different behavior here.

guidou commented 1 week ago

I don't see any allowance for different behavior here.

I don't think an explicit allowance is needed. The spec says that if you perform those actions you should be able to access the devices indicated there. I don't see anything in the spec saying the UA cannot choose which devices to expose based on its own policies once the user grants permissions via a prompt. Similarly, the spec doesn't say which previously exposed devices should no longer be exposed when a permission revocation algorithm runs. That is also up to the UA.

jan-ivar commented 1 week ago

I don't see anything in the spec saying the UA cannot choose which devices to expose based on its own policies once the user grants permissions via a prompt.

What permission? What prompt?

The exposure decision algorithm for devices other than camera and microphone contains no implementation-defined steps, which means the UA cannot choose which devices to expose based on its own policies.

Chrome's exposure of non-miked speakers through enumerateDevices() violates this.

guidou commented 1 week ago

The current language of the spec indeed doesn't allow exposure of non-miked speakers based on the gUM() permission, but Chromium's implementation predates the new spec language by several years and for a long time it was the only implementation of audio output devices.

The language before the addition of selectAudioOutput() to the spec, and which Chromium's implementation follows explicitly said:

The user agent may explicitly obtain user consent to play audio out of non-default output devices; the details of this process are left to the implementation.

I think we should put back some of this language in the spec for compatibility reasons.

karlt commented 1 week ago

The actual problem is that when you have a per-device permission model there is no way to know if the first exposed device is the UA default.

Yes, that is the problem tracked in this issue. Thank you for identifying the source of some of our differences here.

Media Capture and Streams uses the term "system default device for kind", while Audio Output Devices API more often uses "user agent default device". The OS has a default output device, which might be configurable per application (depending on the OS) including for each browser / user agent (depending on the OS), so I have assumed that both specs mean the system default device for the user agent. That is what I intended to describe here. I don't feel a need to distinguish unless some user-agents would like to distinguish between different kinds of OS default output devices.

Why would using the empty string (which has different meanings in different contexts) as ID for the default device in enumerateDevices solve this problem and not "default"?

Perhaps the answer might already be clearer now in light of the different problem, but I'll elaborate.

IIUC Chrome's MediaDeviceInfo with deviceId: "default" serves these purposes.

  1. The primary purpose is that client apps that present the list of devices from enumerateDevices() to the user for output device selection will provide the user with a means of selecting a virtual output device that will follow changes in the system default output device.
  2. A client app that uses the first "audiooutput" device from enumerateDevices() happens to get this virtual device and so is using the default device, even though Chrome does not reorder physical devices when the user-agent default device changes.

Chrome always adds the MediaDeviceInfo with deviceId: "default" in addition to the default physical device. User agents are free to construct whatever virtual devices they choose and make these available under any unique deviceId.

The primary purpose of the MediaDeviceInfo with deviceId: "" proposed here (and used in Firefox) is different. It serves these purposes.

  1. The primary purpose is that is a placeholder so that the first audiooutput device returned by enumerateDevices() corresponds to the default output device. It signals that information about the default audiooutput device cannot be exposed, in a similar way to the single audioinput device returned before information about audioinput devices can be exposed.
  2. It happens to be backward compatible with client apps that assume that the first audiooutput device is the default device, unaware of this bug in the spec. Its deviceId is accepted by setSinkId() and sends audio output to the system default device. selectAudioOutput() should be used instead of constructing a chooser from enumerateDevices() because enumerateDevices() does not necessarily list all available devices, but some clients are constructing their own choosers.

The MediaDeviceInfo with deviceId: "" would be in place of the physical default device. There would be no such MediaDeviceInfo with deviceId: "" if the system default device is exposed.

IIUC now, the problem here is specific to a per-device permission model. In that case, I wouldn't be opposed to a field indicating if an entry is UA default.

An additional attribute might be a possible solution, but it is quite a different mechanism to the current first-is-default mechanism. I proposed the placeholder device for similarity with the placeholder empty audioinput device and for its backward compatibility with client apps that assume the first-is-default mechanism. The placeholder would make an additional attribute unnecessary.

The way Chrome solves that problem right now is by setting the groupId to the same groupId of the physical device currently considered the default.

A user agent is somewhat free to choose the groupId of a virtual device and "Two devices have the same group identifier if they belong to the same physical device."

FWIW Chrome is inconsistent on this. I have heard reported the behavior you describe on MacOS, but the Linux behavior that I have seen is that the virtual default device has a different groupId to its current physical device.

For a placeholder device, I feel an empty groupId would be more consistent with the signal that information about the default audiooutput device cannot be exposed.

jan-ivar commented 1 week ago

I think we should put back some of this language in the spec for compatibility reasons.

That seems like a separate issue. The spec was tightened to satisfy a PING privacy review 5 years ago.

dontcallmedom-bot commented 1 week ago

This issue was discussed in WebRTC August 27 2024 meeting – 27 August 2024 (Issue #133: The first "audiooutput" MediaDeviceInfo returned from enumerateDevices() is not the default device when the default device is not exposed)