Model Size and Computation?

madhavmk / Noise2Noise-audio_denoising_without_clean_training_data

Source code for the paper titled "Speech Denoising without Clean Training Data: a Noise2Noise Approach". Paper accepted at the INTERSPEECH 2021 conference. This paper tackles the problem of the heavy dependence of clean speech data required by deep learning based audio denoising methods by showing that it is possible to train deep speech denoising networks using only noisy speech samples.

MIT License

172 stars 42 forks source link

Hi.

Regarding model size: It's a 20 layer autoencoder model. The kernel, stride and channel size are detailed in Figure 1 in the paper https://arxiv.org/abs/2104.03838 (figure also present in README).

Regarding computational benchmarking: It took us about 48 hours on an Nvidia K80 per noise type for each training method (this is the same GPU available on Colab or on Azure data science VMs). You will need 12 GB of GPU memory for our 20 layered model. If you're looking to train it faster, you could use the smaller DCUnet10 model (Check out https://github.com/pheepa/DCUnet).

madhavmk / Noise2Noise-audio_denoising_without_clean_training_data

Model Size and Computation? #2