jaywalnut310 / vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
https://jaywalnut310.github.io/vits-demo/index.html
MIT License
6.91k stars 1.27k forks source link

FP32 training and alternative models #120

Open NikitaKononov opened 1 year ago

NikitaKononov commented 1 year ago

Hello! I see that FP16 training is default for this repo. Had someone found out, how does FP32 or mixed precision training behave comparing to FP16? Does full precision model sound better?

And another question Can you suggest some SOTA tts models to compate to VITS? I found Grad-TTS and DiffGan to be very interesting, started training them.

Thanks.