jaywalnut310 / vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
https://jaywalnut310.github.io/vits-demo/index.html
MIT License
6.8k stars 1.24k forks source link

Can use character input ? #18

Open bino282 opened 3 years ago

bino282 commented 3 years ago

i want train this model with japanese, i can use character as input Thank

nikich340 commented 1 year ago

Yes you can, but then it may be harder for model to learn prononcuation rules if the same symbol match different signals (while phoneme usually match the same signals).

himekifee commented 1 year ago

I'm quite interested in this topic though I don't quite have that much knowledge in the field. Could you briefly tell me what is limited in the scope of enabling character pronunciations, please? Does network architecture need a change for the task? I tried to ask GPT, and GPT4 seems pretty good at producing the symbolic rule but still needs tunning. It seems a possible way to coordinate with llm to produce a better result for this task.