I have combined the phoneme sets for all three langauges,
English, Chinese, Japanese and started fine tuning using a datset comprised of all three speech languages
The base model I use is the chinese and english base.
However after 500 epochs, the result I get, chinese is good, english is good, however japanese sounds unnatural .
My udnerstanding is that the phonemes are correct but the tone is just not how japanese is spoken.
What can I do to improve this?
I have combined the phoneme sets for all three langauges, English, Chinese, Japanese and started fine tuning using a datset comprised of all three speech languages The base model I use is the chinese and english base. However after 500 epochs, the result I get, chinese is good, english is good, however japanese sounds unnatural . My udnerstanding is that the phonemes are correct but the tone is just not how japanese is spoken. What can I do to improve this?
Here is a sample data of the japanese output. https://soundcloud.com/michael-lin-674069136/japanese-test