shivammehta25 / Matcha-TTS

[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
https://shivammehta25.github.io/Matcha-TTS/
MIT License
710 stars 86 forks source link

Is there any experiment on Chinese data set. #91

Open zhaojingxin123 opened 2 months ago

zhaojingxin123 commented 2 months ago

May I ask if there is any experiment on Chinese data set? Why I use pinyin as phoneme training on Chinese Mandarin data set, and what I synthesize is all noise?

zhaojingxin123 commented 2 months ago

Does anyone know why that is? Or is there a Chinese data set with experimental success? What methods do you use to phoneme Chinese texts?

shivammehta25 commented 2 months ago

I am sorry, I haven't trained a Chinese dataset, but I can assure that the model training is language independent. There are forks in Krygz https://github.com/UlutSoftLLC/MamtilTTS and Catalan https://huggingface.co/projecte-aina/matxa-tts-cat-multiaccent . So perhaps someone who has trained on a Chinese dataset can chip into the conversation.

Just to confirm, did you see this page? https://github.com/shivammehta25/Matcha-TTS/wiki/Training-%F0%9F%8D%B5-Matcha%E2%80%90TTS-with-different-dataset-&-languages

zhaojingxin123 commented 2 months ago

Hello author, thank you for your anwser !!! In addition, I am deeply sorry that I have been ill recently and have not seen your message. there should be no big problems with your code and model. It because I use a wrong way coding Chinese to phonemes . Your project can indeed be applied to Chinese,But what I trained model generate wavs was very noisy,

I trained the model on a chinese dataset AISHELL3 ,119 epochs, poor reception

myconfig is: image image

What do you think is the reason? 1.The number of epochs trained is not enough? 2.Or because the number of spk 174 is too much? 3.each spk"s data is not enough? 4.the n_vocab: 50 of the symbols ,Is there any influence?

how can i improve the synthesis ?

shivammehta25 commented 2 months ago

I think the dataset size and training should be enough.

4.the n_vocab: 50 of the symbols ,Is there any influence? Do you really have only 50 symbols? I feel something might be wrong here, what phonemizer are you using?

zhaojingxin123 commented 2 months ago

thank you foryour anwser ,shivammehta25。 It's not International Phonetic Alphabet (IPA), but rather Taiwanese Pinyin, a type of Chinese phoneme with 50 symbols. the model (i trained with AISHELL3) has a bit of human voice but also contains a lot of noise. Previously, I used the Mainland Chinese version of Pinyin, another form of Chinese phonetic notation with over 200 symbols. I suspect that the issue might be due to wrong processing, specifically with the Mainland Chinese version of Pinyin.