thelinhbkhn2014 / Text2PhonemeSequence

Apache License 2.0
41 stars 10 forks source link

question about tokenizer #3

Closed thanhlong1997 closed 1 year ago

thanhlong1997 commented 1 year ago

Is we really need to tokenizer before feeding into your library ? Because as I can see, every 2, 3 ,... syllables word in vietnamese have phoneme is the combine of all 1 syllables word Example: cái : kaj˨˦ gì : ɣi˧˨ cái gì : kaj˨˦ɣi˧˨ I just loop over viet-n.tsv file and find out no exception. Logically I think we shouldn't use tokenizer here since phoneme and syllable share the same role in sentence. Pls let me know what you think about this.

thelinhbkhn2014 commented 1 year ago

There are two main reasons why we built the text2phonemesequence based on the word-level:

  1. We used the CharsiuG2P toolkit, which was trained on the word-level to convert graphemes to phonemes. Therefore, in order to maintain the performance of the G2P toolkit for multilingual purposes, we also built it based on the word-level.
  2. We believe that fine-tuning the TTS model with phonemes from a word-segmented sentence may improve the TTS system in terms of prosody and naturalness. However, I also think that your idea makes sense. Perhaps we can compare the performance of both approaches when we have the time. Thank you for your interest!