TTS & Vocoder Fine Tuning

kan-bayashi / ParallelWaveGAN

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch

MIT License

1.56k stars 341 forks source link

Is this expected behaviour or without fine tuning we should get clean audios without strange artifacts using HiFi or ParallellWaveGAN?

The GAN-based vocoder without noise inputs has the tendency (e.g., MelGAN, HiFiGAN), which causes metallic sound noise using TTS outputs. On the other hand, vocoders with noise inputs (e.g., PWG, StyleMelGAN) can reduce such a noise without fine-tuning.

Maybe our paper's results help you. https://espnet.github.io/icassp2022-tts/ https://arxiv.org/abs/2110.07840

kan-bayashi / ParallelWaveGAN

TTS & Vocoder Fine Tuning #318