myshell-ai / MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
MIT License
4.84k stars 631 forks source link

Questions Regarding Training Data Volume and Future TTS Technology Directions #177

Open xddun opened 3 months ago

xddun commented 3 months ago

Hello, and thank you for your open-source contribution.

I have a question regarding the dataset used for training the model. How much mixed Chinese and English speech data was used? Specifically, how many hours of Chinese speech data and how many hours of English speech data were included? I'm asking because I'd like to assess the model's performance based on this information.

I feel that the natural fluency of the generated Chinese speech is not very high. I'm wondering if this might be due to insufficient training data. I'm considering whether adding more data to the training set could improve this issue.

Additionally, I'd like to know if you plan to develop a TTS model based on diffusion techniques, or if you might consider incorporating the technology used in this repository: https://github.com/shivammehta25/Matcha-TTS for training.

Looking forward to your response.