Closed NielsVandenEynde closed 8 months ago
Sorry for the late reply because I am very busy recently, but for the pitch in Japanese you may refer to https://github.com/yl4579/StyleTTS/issues/10#issuecomment-1407789937. The pitch can easily be extracted from OpenJTalk return: https://github.com/yl4579/PL-BERT/issues/6#issuecomment-1797869275
First of all, thanks for this awesome research, voice cloning desperately needs to be open sourced.
I'm interested in training a Japanese model, I have over a thousand hours of speech data.
However I'm a bit concerned about having to convert my transcriptions to IPA. Japanese has a pitch accent, with pitches possibly changing throughout a word. For example 橋、箸 are both pronounced as "hashi", but the pitch change is different for them. However when converting text to IPA, such as in this topic, this information is lost. Is there a way you can train a model with just the "raw" text? Besides from that, I just need to train/find a Japanese Bert model right? Any other things I should be aware of?
Thanks in advance