keonlee9420 / PortaSpeech

PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech
MIT License
331 stars 36 forks source link

Is PortaSpeech a better choice than FastSpeech2 or DiffSpeech? #26

Open hertz-pj opened 2 years ago

hertz-pj commented 2 years ago

From your experience, how are the effects of these models ranked.

keonlee9420 commented 2 years ago

Hi @hertz-pj , good point. I would say it depends on the purpose. For example, you'd choose FastSpeech2 If you need fast and safe performance. It goes to DiffSpeech if you want randomness and non-metalic speech in the output. If the interest is in both speed and randomness, PortaSpeech can be satisfying you.

iamanigeeit commented 9 months ago

@hertz-pj This is old, but just putting it there in case someone is searching for a comparison.

If you want to compare inference only, you can simple download pretrained models and run inference (even better if they are hosted on HuggingFace -- you can try directly).

For training, i haven't trained DiffSpeech, but FastSpeech2 trains 5-10x faster for the same comparable audio quality. FS2 takes under 2 hours on a single RTX 3090 to produce totally intelligible speech. However, PortaSpeech has more prosody variation.