tuanh123789 / Train_Hifigan_XTTS

This is an implementation for train hifigan part of XTTSv2 model using Coqui/TTS.
53 stars 18 forks source link

Roadmap for training XTTS for custom language #5

Open VafaKnm opened 1 month ago

VafaKnm commented 1 month ago

Hi! First, i want to thank you for making this repo and sharing your experiences. I am trying to train XTTS model for Persian language. I read the topics in this repo https://github.com/coqui-ai/TTS/issues/3704 and i found that we have 3 steps to get XTTS model: DVAE -> GPT-2 -> HifiGAN According to the discussions in that repo, I realized that DVAE does not need fine tuning because it is independent of the language type. Assuming this claim is true, the next step is fine-tuning GPT-2 . Does the rope you provide (this repo), includes fine-tuning GPT-2 too?

tuanh123789 commented 1 month ago

Right dvae is not necessary finetune. You just need to finetune gpt part and hifigan. This repo is only using for finetune hifigan.

tuanh123789 commented 1 month ago

If you want finetune gpt part with language not in xtts original model. You have to make some change in training code

RifatMamayusupov commented 1 month ago

Hello, @tuanh123789 , @VafaKnm . I am going to train XTTS for Uzbek langauge, but I cann't find any example to train XTTS for new language . Please help me, if you have full code for trainning XTTS for other langauge, can you share it ?

mpquochung commented 2 weeks ago

You can check out this repo where the author fine-tune on a new language vietnamese: https://github.com/thinhlpg/TTS/tree/add-vietnamese-xtts. See the commit history you can see the author only need to add vi language into tokenizer part. Then you can fine-tune up to the document of XTTS for GPT part. Then you can finetune Hifigan for better sound result for your language (I bet so).

VafaKnm commented 2 weeks ago

You can check out this repo where the author fine-tune on a new language vietnamese: https://github.com/thinhlpg/TTS/tree/add-vietnamese-xtts. See the commit history you can see the author only need to add vi language into tokenizer part. Then you can fine-tune up to the document of XTTS for GPT part. Then you can finetune Hifigan for better sound result for your language (I bet so).

@mpquochung Thanks for helping. So after applying these changes, did you follow this notebook for training process? https://github.com/coqui-ai/TTS/blob/dev/recipes/ljspeech/xtts_v2/train_gpt_xtts.py