Should Apple's "mic mode" be reflected in the API somehow?

alvestrand commented 2 years ago

A new Apple feature called "mic mode" apparently permits specifying some kinds of audio processing for microphone devices; this was called to Chrome's attention in https://bugs.chromium.org/p/chromium/issues/detail?id=1282442

Is this something that could be useful to expose in the WebRTC API? Are there adaptations that could be made to accomodate this without changing the API?

Assigning to @youennf for comment.

fippo commented 2 years ago

there is a noise suppression constraint which defaults to true. IIRC the implementation is internal to libwebrtc so probably not as good as the fancy new APIs.

Similar issue for video + background blur: https://bugs.chromium.org/p/chromium/issues/detail?id=1281408& Note that this isn't specific to Apple's new APIs but also applies to applications like Krisp or Maxine

The API issue here is discoverability -- the application might not want to try enhancing the audio if it is already enhanced or not blur the background if it is already blurred (or recommend the user to turn off the built-in behavior)

bradisbell commented 2 years ago

Other systems have these settings as well. Windows has an option for enabling/disabling "effects", which I believe is what is used to control the various driver's microphone array in the same/similar way Apple does.

https://docs.microsoft.com/en-us/windows-hardware/drivers/audio/windows-11-apis-for-audio-processing-objects

For example, Lenovo laptops ship with software where you can choose whether to optimize the microphone array for one person in front, multiple, or the whole room. I think this software just configures the Realtek driver's audio "effects".

henrikand commented 2 years ago

Note that the new support does only enable an application to show the new settings in the Control Center. It is then up to the user to do the actual settings manually. Hence, this new feature does not match the existing WebRTC APIs well imho since all the web application can do is to enable the user to manually change a setting. Also, it would require different native implementations for devices released before 2018 and those released after. The difference is not trivial and can't be changed with a simple flag since usage of a new audio unit is required.

Has it been shown that the new settings adds any value?

youennf commented 2 years ago

Thanks for filing this issue @alvestrand.

In addition to mic mode, there is a corresponding camera mode, which might be useful to discuss jointly.

I think it is worth exposing this value (no need to do background blur if it is already done by the OS, ask user to change its microphone setting if the goal is to do high fidelity audio recording).

Depending on the OS, the value can change based on user interaction with the OS, but not by applications. Exposing such value would probably require adding a way to notify the web application that this value has somehow changed. I am not a fan of constraints, this applies here as well given this is a not-controllable/not-settable property.

alvestrand commented 2 years ago

I know that at least on Windows, there's been quite a bit of engineering effort spent on turning off this type of functionality - the functions increased CPU usage and decreased sound quality because the same kind of processing was being done inside WebRTC, doing it twice worsened the sound quality, and the WebRTC algorithms, perhaps because being more frequently updated, had superior performance characteristics.

So there are really multiple dimensions here:

Controlling whether or not the OS should allow the user to toggle random settings that might or might not improve things
Detecting whether or not such effects have been applied (and, if true, what effects have been applied)
Possibly controlling the effects from the application

It seems to me that WebRTC ought to be able to get a "clean path" in a reliable way, so that the effects applied are only the ones that WebRTC introduces; it's less clear to me that there's a good way to manipulate platform effects - it's hard to standardize when they vary so much by platform.

henrikand commented 2 years ago

To maintain an as clean and stable path as possible, I suggest that we don't expose the new mic modes but stick with the existing implementation. Allowing the user to make changes in this area may affect the existing AEC, NS etc. in a negative way. In any case, if changes in this are are made, a signal processing team should study the implications carefully first.

jan-ivar commented 2 years ago

It seems to me users should be able to pick from both OS-provided features and application provided ones. This suggests we should focus on solving any interference or double processing problems without limiting choice (i.e. let apps detect pre-processing choices the user has made, but not let apps deny them).

huibk commented 2 years ago

If the browser wants to leave users with a choice it should also provide the means to make that choice in a convenient way for it not to be a false choice. Basically there are two approaches: 1) We let the app decide because it knows the use case the best. It can request a raw stream with no processing, or a default stream with OS/user configured processing. 2) The browser offers users the means to resolve interference: an app can detect pre-processing and request disabling of pre-processing. The user can accept or reject on session/site basis.

youennf commented 2 years ago

If the browser wants to leave users with a choice it should also provide the means to make that choice in a convenient way for it not to be a false choice

Capabilities offer a way to support both type of OSes. When the OS is changing the value from false to true, there would be two cases:

OS allows UA to overridie the value: capabilities remain [true, false], setting is changed to true.
OS does not allow UA to override the value: capabilities change to [true], setting is changed to true.

huibk commented 2 years ago

In which cases would the OS not allow override of the value by the UA? It doesn't seem to be a good solution that web applications have to make OS specific user instructions how to manually turn on and off processing before and after usage. For instance users may want voice isolation always on for their Jitsi meetings but for their JamKaZam sessions it must be turned off.

youennf commented 2 years ago

iOS and macOS do not allow applications to override background blur AFAIK. Users that turn on background blur will know how to disable it. The important thing is that web apps know about it to either update their pipeline or provide adequate information, should the setting be not optimal (ideally the user+OS should be able to have the setting right after some limited learning).

huibk commented 2 years ago

The challenge for applications will be to explain to end-users why similar effects are incompatible and that they may have to toggle it manually on a session basis. Ideally there is a more convenient way to preserve both privacy/control and utility. What is the situation for audio processing on Mac? Can that be altered by the application?

w3c / mediacapture-extensions

Should Apple's "mic mode" be reflected in the API somehow? #47