yl4579 / StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
MIT License
4.97k stars 419 forks source link

Can StyleTTS2 use phonemization from different languages to finetune or train? #271

Open tanishbajaj101 opened 4 months ago

tanishbajaj101 commented 4 months ago

I am trying to make a model that can accomodate words from both my native language, along with words from English. My native language is Hindi, written in devanagari script

Here is an example "मेरा car service 21 जुलाई को scheduled है, क्या आप मुझे इसके सभी details दे सकते हैं?" of what I want

This is the phonemized output, that gives output in 2 diff. languages at the same time (hi)meːɾaː(enus) kɑːɹ sɜːvɪs twɛnti wʌn (hi)ɟʊlaːi koː(enus) skɛdʒuːld (hi)hɛː(enus) (hi)kːjaː aːp mʊɟʰeː ɪskeː sʌbʰi(enus) diːteɪlz (hi)deː sʌkteː hɛ̃(enus) Here it detects two different scripts and marks them as (hi) for hindi and (enus) for american english?

If i record appropriate sounds, would I be able to train the model appropriately?