xiongyihui / python-webrtc-audio-processing

Python bindings of WebRTC Audio Processing
167 stars 51 forks source link

10ms framesize limit and noise in denoised audio stream #6

Closed demeng closed 4 years ago

demeng commented 4 years ago

We have a question on the limitation on the framesize to 10ms. We need to process audio stream with smaller size. Is the limitation from the WebRTC audio processing or Python wrapper? Is there a way to walk around this limitation?

In addition, when doing noise suppression on short audio stream buffer (e.g. 50ms) of a long audio, the denoised original audio have beat-like noise at the boundaries of the audio stream buffer. What causes the noise?

xiongyihui commented 4 years ago

It's a requirement of webrtc audio processing. Not sure about the second question. Does the audio become a discontinuous signal?

demeng commented 4 years ago

Not sure about the second question. Does the audio become a discontinuous signal?

Xiongyi, thanks for the response. Here are two examples of audios processed by using different buffer sizes (chopped the original audio into small buffers, processed each buffer in order, then merged them all together)

This is the audio using buffer of 50ms.

To compare, this is the audio using buffer of 2s.

The audio with 50ms become very discontinuous, and it seems that there are artifacts introduced between boundaries of buffers. In fact, in the audio of buffer 2s, a (mild) discontinuity can be heard every 2 second. But it is much more noticeable when buffer is much smaller as it is much more frequent.

Any ways to solve this issue so that the noise suppressed audios sound the same regardless of buffer size? Any insight is appreciated!

xiongyihui commented 4 years ago

The audio using 50ms buffer is short than the one with 2s buffer. It seems it added some zeros every 50ms.

image

You should divide a audio into 10ms each, and do NS