w3c / webrtc-stats

WebRTC Statistics
https://w3c.github.io/webrtc-stats/
Other
128 stars 47 forks source link

Measuring background noise (energy) #465

Open chemmagate opened 5 years ago

chemmagate commented 5 years ago

As more and more offices are adopting open work places, it is important to measure the background noise in open office environments and measure the effect on real-time communication. Is it possible for browsers to measure this metric?

vr000m commented 5 years ago

@henbos Is the browser (audio stack) able to tell the difference between speaking and non-speaking audio energy?

aboba commented 5 years ago

@vr000m Lots of ML work on separating "noise" from speakers. For example, see: https://devblogs.nvidia.com/nvidia-real-time-noise-suppression-deep-learning/

Related: Issue https://github.com/w3c/webrtc-stats/issues/383

aboba commented 5 years ago

@vr00m @henbos Is this worth discussing at TPAC?

henbos commented 5 years ago

Implementations may be doing several things to improve quality, like echo cancellation, synthesizing samples to conceal packet loss, and noise suppression. Strategies are, however, implementation-specific. This can make standardizing metrics around it difficult.

We could add a metric to say that this is the implementation's estimate of the current background noise levels. But different implementations may do different things to attempt to calculate this, which could potentially yield different numbers in different scenarios.

This reminds me the attempt to standardize likelihood of echo which got ice boxed.

But hey, an experimental metric might be better than no metric, even if we can't yet guarantee interoperable estimates?

henbos commented 5 years ago

Action Item: Talk to an audio engineer to see if this is something that we could measure :)

henbos commented 5 years ago

@ivocreusen Do we have any estimates of background noise that could be exposed as stats? E.g. totalBackgroundAudioEnergy that would be <= totalAudioEnergy?

This is another issue I would like to talk to you about :)

ivocreusen commented 5 years ago

I think that the hard part for a standardized metric for background noise is defining what part of the signal is background and what part is not. We do have a background noise estimate in the gain controller, but if we were to standardize how it's computed we won't be able to easily make changes/improvements (without adding unnecessary additional computations).

Would it be an option to have a background noise metric without specifying how to decide what part of the signal is background?

On Tue, Sep 10, 2019 at 11:19 AM henbos notifications@github.com wrote:

@ivocreusen https://github.com/ivocreusen Do we have any estimates of background noise that could be exposed as stats? E.g. totalBackgroundAudioEnergy that would be <= totalAudioEnergy https://w3c.github.io/webrtc-stats/#dom-rtcinboundrtpstreamstats-totalaudioenergy ?

This is another issue I would like to talk to you about :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/webrtc-stats/issues/465?email_source=notifications&email_token=AEIYFIQBLWK6UVDG5KHUCNTQI5RA5A5CNFSM4IGYBQH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6KN3IQ#issuecomment-529849762, or mute the thread https://github.com/notifications/unsubscribe-auth/AEIYFITFETB6RT3ILCH6LQTQI5RA5ANCNFSM4IGYBQHQ .

henbos commented 5 years ago

I would propose the metric to be well defined in what it is a measurement of (audio energy), without specifying how the measurement is obtained. And then declare inaccurate estimates implementations bugs rather than spec bugs. Fingers crossed?

vr000m commented 5 years ago

This may work... assuming there are three classes of background noise detectors:

  1. very aggressive -- these would have low tolerance and end up marking low noisy environments (and any thing above) as noisy.
  2. very pessimistic -- these would have high tolerance and would mark very high noisy environments, if they are really bad estimators, they may never mark because noise and speech would be indistinuishable.
  3. something in between... I think this is what most implementations will end up doing, and we probably live with these kind of estimators elsewhere in the stack (for example -- quality/cpu limitation, etc).
henbos commented 5 years ago

Talked to an audio engineer, his take roughly:

There may be something we could surface, but he says it might be better for people to analze their audio themselves with WebAudio.

henbos commented 5 years ago

I'm not sure what to do with this one so I'm removing the TPAC label

henbos commented 5 years ago

@vr000m If you still want this (or #383) to be discussed at TPAC feel free to prepare a slide for it.