huggingface / parler-tts

Inference and training library for high-quality TTS models.
Apache License 2.0
2.6k stars 265 forks source link

Training on a NEW language #25

Open rudransh2004 opened 1 month ago

rudransh2004 commented 1 month ago

Suppose we have to train this TTS model on a language and the tokens of that language are not in the Flan-T5 transformer. So can I simply change the name of the tokenizer in the config.json or do I have to make any code changes also. NOTE The new tokenizer will not be of FLAN-T5

ylacombe commented 4 weeks ago

Hey @rudransh2004, you can do this but you'd have to retrain the model from scratch!

rudransh2004 commented 4 weeks ago

Hey @ylacombe thank you soo much for your reply. Could you share some reciepe for doing this with some another language without the annotations if we wish to

ylacombe commented 3 weeks ago

Hey @rudransh2004, if you want to avoid using the annotations, you could simply use a description column with each samples having empty string "". Note that the model currently doesn't support passing samples without annotations, but the trick above should work