Closed sh-lee-prml closed 1 year ago
Thanks for sharing UTMOS results!
As the UTMOS paper argued, your UTMOS ranking (≠absolute MOS value) is very similar to that of nMOS.
It is valuable results!
Korean real dataset, the average score of UTMOS is 2.50. Did you have a similar case?
Yes, I have similar problem in Japanese.
When I score clear/high-pitch/whisper-ish normal female voice (つくよみちゃん/Tsukuyomi-chan), the score is around 2.5.
Other Japanese speaker's utterance is scored to 3.8, so this is not just language, but combination of language and speaker.
Other researcher report similar tendency in Japanese (narrow MOS range, lower shift).
This is the link of his tweet.
Hi
Thanks for nice work 👍
I have added UTMOS results for voice conversion. The results of UTMOS is very similar to the naturalness MOS (nMOS) for English speech dataset.
I utilized 400 samples I used in HierVST paper. HierVST: [Paper] [Demo]
I used the official implementation for each model and train the model with LibriTTS-Clean-100, 360 (1,151 speakers) and I used the official checkpoint of YourTTS.
Many-to-Many Voice Style Transfer
Zero-shot Voice Style Transfer
But, I have some questions. when I used UTMOS for High-quality Korean real dataset, the average score of UTMOS is 2.50. Did you have a similar case?