jaywalnut310 / vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
https://jaywalnut310.github.io/vits-demo/index.html
MIT License
6.48k stars 1.21k forks source link

Is punctuation an essential part of input when training TTS model? #150

Open JohnHerry opened 1 year ago

JohnHerry commented 1 year ago

I did not see any paper to talk about it that are punctuation symbols a must part in the TTS phoneme sequence? It seems that punctuations are unvoiced, and not relevant. But it may contains some information about pause and emotion. When we have take good prosody labels in the inputs, the prosody labels can help train pause and emotion too. so can we just remove punctuations from the input phoneme sequence then?

nikich340 commented 1 year ago

VITS authors don't answer here. Probably you should test it yourself, but I think if you use your own letter set and some letters can replace comma, dot etc, why not.