rishikksh20 / VocGAN

VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network
MIT License
319 stars 61 forks source link

Metallic / Robotic sound #9

Closed Datasciensyash closed 4 years ago

Datasciensyash commented 4 years ago

I'm trying to train the VocGan model using two stages: STFT-Pretraining & Adversarial Loss training (+STFT Loss), but I face with metallic/robotic speech sound problem. If I training MelGan, then I get nearly-normal speech in about ~100 epochs, but VocGan demonstrating significantly worse results even with more epochs (100, 200, 300, ...).

Is it normal? Or maybe I just need to wait more time?

If it is not normal, what I probably need to check in my model and pipeline? (I forked your repo, but I needed to adapt the model to work with 200 hop_length and 16 000 sampling_rate).

Datasciensyash commented 4 years ago

Surprisingly fast, the problem was solved! As it turned out, I incorrectly fed the generation results to the discriminator and STFT loss.