huggingface / dataspeech

MIT License
222 stars 23 forks source link

work on other languages #16

Open taalua opened 2 months ago

taalua commented 2 months ago

Hi,

For fine-tuning the current model to other languages, is it better to use the existing trained model and prompt tokenizer "parler-tts/parler_tts_mini_v0.1" or maybe it better train from scratch with a custom tokenizer? Any suggestions for the multilingual tokenizer if using espeak-ng? Thank you for your insights.

ylacombe commented 1 month ago

Hey @taalua, it depends on the languages you want to fine-tune on! If the flan T5 tokenizer covers your language (say Spanish or French), you can fine-tune the existing model, otherwise you probably need another custom tokenizer or one suited for multilinguality (say mt5 or something) and to train your model from scratch!