w3c / mediacapture-extensions

Extensions to Media Capture and Streams by the WebRTC Working Group
https://w3c.github.io/mediacapture-extensions/
Other
19 stars 14 forks source link

Voice Isolation - beyond Noise Cancellation #62

Closed alvestrand closed 1 year ago

alvestrand commented 2 years ago

We know that noise cancellation can be quite effective in many scenarios. However, noise cancellation is, by default, somewhat restrictive in what it considers "noise", in order to lessen the chance that it is damping stuff that the recipient wants to hear.

There are quite powerful algorithms out there that allow better noise removal if we're more sure what the recipient wants to hear - such as removing anything that does not form part of a human voice.

This behavior is sometimes desirable (such as in person to person conversation), and sometimes very undesirable (such as when playing music to each other).

Suggestion: Add a new constraint "voiceIsolation" (values true & false) that, when true, tries to isolate the human voice and remove all other parts of the audio signal. This may also enable features such as directionality (beam-forming) that attempt to take signal only from the direction from which a human voice is detected.

alvestrand commented 2 years ago

Discussed in WG meeting April 26, 2022. Issue raised: Should it be a string (for extensibility) rather than a boolean?

alvestrand commented 2 years ago

Presentation link to April 26 presentation: https://docs.google.com/presentation/d/15iAIhzpaA6reKJBL-ecgYtic6ZKHEpKL5OK_sExTllc/edit#slide=id.g1233c72d2fa_0_18

alvestrand commented 1 year ago

so far, discussion about extensibility has not provided arguments that warrant the added complexity. Leaving it as a boolean.