jik876 / hifi-gan

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
MIT License
1.94k stars 506 forks source link

Discrimator Loss Too Small #100

Open MMingabc opened 3 years ago

MMingabc commented 3 years ago

Hi,

I am training the V1 version of HiFi-GAN. I downsampled wavs to 16kHz and used a hopsize of 200 frames to make Mel-spectrums. Accordingly, I configed the upsampling scales as [5, 5, 4, 2].

After 600k steps of training, the sound quality is still very low. I found that the discriminator loss is very small (less than 0.1 for both fake and gt audios), while generator's adversial loss is very big (larger than 1.0). I think the discrimator may be too powerful.

What should I do? Do I continue the training and wait the generator to catch up?

Thank you!

Kristopher-Chen commented 2 years ago

Hi, I met with similar problems, with small discriminator losses. And in my test, obvious harmonics exist. Have you solved it? image

thecooltechguy commented 1 year ago

would love to hear if you solved this!

Ziyi6 commented 5 months ago

@Kristopher-Chen Hello mate, did you solve the problem as obvious harmonics exist in your test? I also met this problem

Ziyi6 commented 5 months ago

@Kristopher-Chen I think the reason why you got the spectrogram with many harmonics is that the discriminators are too weak, so that they are not capable to remove the upsampling artifact (checkerboard artifact in CV) which appear in GAN- and autoencoder-based TTS system because of the usage of many upsampling layers (convTranspose in HiFi-GAN)