jaywalnut310 / vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
https://jaywalnut310.github.io/vits-demo/index.html
MIT License
6.48k stars 1.21k forks source link

VITS2? #169

Open OnceJune opened 11 months ago

OnceJune commented 11 months ago

Hi, VITS2 has been released at:https://arxiv.org/pdf/2307.16430.pdf, do you have the plan to release the code?

hildazzz commented 11 months ago

the same question

yiliu-mt commented 11 months ago

We are also very excited to see the new version of VITS!

JohnHerry commented 11 months ago

The VITS2 say it can make fully end-to-end TTS training and inference, without the TTS frontend which transfer text into phoneme sequence. It is means that, for Mandarin, we can input Chinese Characters directly, instead of Pinyin, I doute how much samples do we need then, Because there are so much Characters. Far more then the number of Pinyins.

p0p4k commented 11 months ago

I am trying to implement it here . If someone of you can guide me or give me feedback, will be helpful. Thanks;.