idiap / coqui-ai-TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
https://coqui-tts.readthedocs.io
Mozilla Public License 2.0
576 stars 60 forks source link

[Bug] Mix language inference text #115

Closed cod3r0k closed 2 weeks ago

cod3r0k commented 2 weeks ago

Describe the bug

What should we do for text that contains multiple languages? Since inference sends everything to eSpeak with a fixed language setting, eSpeak does not handle it well!

To Reproduce

Since inference sends everything to eSpeak with a fixed language setting, eSpeak does not handle it well!

Expected behavior

well phonemizer working

Logs

No response

Environment

What should we do for text that contains multiple languages? Since inference sends everything to eSpeak with a fixed language setting, eSpeak does not handle it well!

Additional context

No response

eginhard commented 2 weeks ago

Yes, there is currently no way to do this directly in Coqui. But you can do mixed-language TTS with Vits/YourTTS models if you add some custom code (including calling Espeak separately per language if you don't use grapheme-based models), see #104 for details.

Integrating this would first need SSML support (see previous discussions in https://github.com/coqui-ai/TTS/issues/752), which is very complex, so closing as not planned for now, but I'm open to contributions regarding SSML.

cod3r0k commented 1 week ago

Hi, I check espeak with Persian (fa) language. I found that it handle English also. Isnt it enough? @eginhard

eginhard commented 1 week ago

If you just need to mix Persian and English, maybe? If you want to also mix other languages, then maybe not? I don't know your use case.

cod3r0k commented 1 week ago

For the first run, yes Just Persian and English. maybe Arabic in the next version. in the last one, Germany (it is different from all of them)

But step by step. for the first step (Persian and English), do you have any concern?