Japanese sounds unnatural

michaellin99999 commented 3 days ago

I have combined the phoneme sets for all three langauges, English, Chinese, Japanese and started fine tuning using a datset comprised of all three speech languages The base model I use is the chinese and english base. However after 500 epochs, the result I get, chinese is good, english is good, however japanese sounds unnatural . My udnerstanding is that the phonemes are correct but the tone is just not how japanese is spoken. What can I do to improve this?

Here is a sample data of the japanese output. https://soundcloud.com/michael-lin-674069136/japanese-test

eliteexod commented 2 days ago

Are you using it on Docker?

michaellin99999 commented 2 days ago

i have tried on docker and also onnx runtime both sound like this

myshell-ai / MeloTTS

Japanese sounds unnatural #214