kan-bayashi / ParallelWaveGAN

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
https://kan-bayashi.github.io/ParallelWaveGAN/
MIT License
1.54k stars 339 forks source link

The generated output contains a lot of noise. #410

Closed Lanzik closed 1 year ago

Lanzik commented 1 year ago

I am currently training Multiband-MelGAN and my Text2Mel model is based on FastSpeech2, trained with ESPnet. During synthesis, the model outputs have a lot of noise, even though I am using a checkpoint-50000 and the hyperparameters seem to be fine upon inspection. Everything appears normal in the predictions directory, but I cannot get the expected output during synthesis. I am attaching the Text2Mel and Mel2Wav config files and would appreciate it if you could take a look and let me know if you understand the issue. configs.zip log with 3 generated samples.zip

kan-bayashi commented 1 year ago

the model outputs have a lot of noise, even though I am using a checkpoint-50000

Step 50k is still early stage, the discriminator introduced form step 200k. GAN-based training requires huge steps (e.g., 1~2M steps)

Maybe the following discussion in https://github.com/kan-bayashi/ParallelWaveGAN/issues/143 helps you.