Source code for the paper titled "Speech Denoising without Clean Training Data: a Noise2Noise Approach". Paper accepted at the INTERSPEECH 2021 conference. This paper tackles the problem of the heavy dependence of clean speech data required by deep learning based audio denoising methods by showing that it is possible to train deep speech denoising networks using only noisy speech samples.
I'm trying to manually implement this in another language. Can you confirm this codebase is the one that produced the results in "Speech Denoising without Clean Training Data: a Noise2Noise Approach"? There are a few discrepancies I've noticed so far:
A model complexity of (45//1.414) would seem to result in 31 encoder channels (and 62 for deeper ones), rather than 32 as described.
The complex batchnorm module seems to implement batchnorm separately on real and imaginary components of the complex number, rather than using the whitening approach described in “Deep complex networks"
Similarly the masking process seems to multiply real and imaginary components of the spectrogram separately rather than using complex multiplication.
I appreciate any insight you may have about these points.
Hi,
I'm trying to manually implement this in another language. Can you confirm this codebase is the one that produced the results in "Speech Denoising without Clean Training Data: a Noise2Noise Approach"? There are a few discrepancies I've noticed so far:
I appreciate any insight you may have about these points.