myshell-ai / MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
MIT License
4.84k stars 631 forks source link

Japanese sounds unnatural #214

Open michaellin99999 opened 3 days ago

michaellin99999 commented 3 days ago

I have combined the phoneme sets for all three langauges, English, Chinese, Japanese and started fine tuning using a datset comprised of all three speech languages The base model I use is the chinese and english base. However after 500 epochs, the result I get, chinese is good, english is good, however japanese sounds unnatural . My udnerstanding is that the phonemes are correct but the tone is just not how japanese is spoken. What can I do to improve this?

Here is a sample data of the japanese output. https://soundcloud.com/michael-lin-674069136/japanese-test

eliteexod commented 2 days ago

Are you using it on Docker?

michaellin99999 commented 2 days ago

i have tried on docker and also onnx runtime both sound like this