I did not see any paper to talk about it that are punctuation symbols a must part in the TTS phoneme sequence?
It seems that punctuations are unvoiced, and not relevant. But it may contains some information about pause and emotion.
When we have take good prosody labels in the inputs, the prosody labels can help train pause and emotion too. so can we just remove punctuations from the input phoneme sequence then?
VITS authors don't answer here. Probably you should test it yourself, but I think if you use your own letter set and some letters can replace comma, dot etc, why not.
I did not see any paper to talk about it that are punctuation symbols a must part in the TTS phoneme sequence? It seems that punctuations are unvoiced, and not relevant. But it may contains some information about pause and emotion. When we have take good prosody labels in the inputs, the prosody labels can help train pause and emotion too. so can we just remove punctuations from the input phoneme sequence then?