question about tokenizer

thelinhbkhn2014 / Text2PhonemeSequence

Apache License 2.0

41 stars 10 forks source link

There are two main reasons why we built the text2phonemesequence based on the word-level:

We used the CharsiuG2P toolkit, which was trained on the word-level to convert graphemes to phonemes. Therefore, in order to maintain the performance of the G2P toolkit for multilingual purposes, we also built it based on the word-level.
We believe that fine-tuning the TTS model with phonemes from a word-segmented sentence may improve the TTS system in terms of prosody and naturalness. However, I also think that your idea makes sense. Perhaps we can compare the performance of both approaches when we have the time. Thank you for your interest!

thelinhbkhn2014 / Text2PhonemeSequence