w-transposed-x / hifi-gan-denoising

An unofficial PyTorch implementation of "HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks" by Su et al. (2020).
GNU General Public License v3.0
22 stars 7 forks source link

Comparing WaveNet model to Speech Denoising WaveNet paper #3

Closed francislata closed 3 years ago

francislata commented 3 years ago

Currently, I'm working on assessing my WaveNet's performance with the same clean and noise prediction branches of the model. When I make augmentations of the audio by adding reverberation and background noise respectively, I notice that WaveNet learns to denoise and does not reverberate.

Have you experienced something like this where denoising done much better than dereverberating? I heard this a little bit from the samples you provided where there's still some reverb.

742617000027 commented 3 years ago

@francislata That's been our experience as well I think, safe for the occasional exception where it does apply noticeable dereverberation. We used a modified Tacotron before for dereverberation and had a lot more success with it. Should note as a disclaimer though that we have yet to complete a full training according to the specifications from the paper—up until now, there has always been something that's come up, preventing us from completing training.

francislata commented 3 years ago

@742617000027 - I see. It was interesting that if I train on simulated data with reverberation augmentation only, the WaveNet does dereverberate and is comparable to their quality using L1 loss between clean speech and the dereveberated speech predicted by the model.

When background noise is added on top of the reverberated speech, it seems that WaveNet focuses more on denoising than dereverberating.

If you go to SIM-1 on this https://daps.cs.princeton.edu/projects/enhancement/ (this is their work prior to HiFi-GAN) under WN column, you'll hear that reverberation is gone.

n-Guard commented 3 years ago

@francislata Just to be clear: the WaveNet model you are talking about is based on this paper by Rethage et al. 2018?

Although our model does some dereverberation/denoising, most of the time we notice heavy artifacts in the denoised speech. Did you experience anything similar with your WaveNet model?

On the Interspeech website you said you have some answers from the authors of HiFi-GAN regarding our open questions. We would of course be excited to hear them!

francislata commented 3 years ago

@n-Guard - Yes, I'm talking about the "A Speech Denoising WaveNet" version of it.

Pretty much the same experience here as well. Hearing some artifacts though the reverberation is almost gone.

Oh yes, I'll answer them over there now.