Closed r4nc0r closed 2 years ago
@domcross and i are working on new/better models using HifiGAN vocoder. Samples available on Thorsten-Voice project website. These models might be faster than the current one available. But maybe you should check work by @synesthesiam with larynx. My voice is available there too and it's really fast.
Did you test with "WaveGrad" or "Fullband-MelGAN" vocoder (Fullband-MelGAN is way faster).
Thanks for your quick reply and for pointing me in the right direction!
I just used your model with the parameters specified in the readme: tts-server --model_name tts_models/de/thorsten/tacotron2-DCA
I tried following pip3 instal tts==0.5.0
and run tts-server --model_name tts_models/de/thorsten/tacotron2-DCA
. Got an RTF around 0,6 - 1 on my notebook cpu which i think isn't too bad. What RTF do you have?
Just if you're interested in: https://www.thorsten-voice.de/2022/03/20/vergleich-thorsten-aktuell-mit-dem-neuen-modell/
I just did that with the addition of --show_details SHOW_DETAILS
and my RTF is about 0,6:
> Processing time: 3.101564407348633
> Real-time factor: 0.5756691513639508
I use a 12 Core Ryzen 3000 Processor. But the Processing time of 3s is extremly high given my use case of generating just in time responses for my voice Assistant. I build a workaround wich caches most wav files, but if I generate Responses with variable in the text this doenst work.
Also i would love to use your new model, is there a way to use it?
The new model is not released yet. I'll keep community updated on release date on Twitter or my Youtube channel. I'd recommend you taking a look larynx as it's designed for small compute power (like a raspberry) and my german voice is available too.
@r4nc0r Keep watch for the release of Mimic 3 (samples), which should be this month. You should get a 8-10x speedup with it; I typically get an RTF of 0.03, but I'm also on a Ryzen 5950X.
Also i would love to use your new model, is there a way to use it?
Hi @r4nc0r ,
you can download model and config on @coqui-ai prerelease 0.7.0 here: https://github.com/coqui-ai/TTS/releases
Easy pip
based installation will follow when final 0.7.0 will be released.
Keep watch for the release of Mimic 3
You can play around with beta of Mimic 3 with my german voice (and some more german voices) as mentioned by @synesthesiam: https://mycroft.ai/blog/mimic-3-preview/
As Mimic 3 is already released you can easily use this. You can watch this video on how to set it up and use it and/or check official doc.
If you want to use Coqui TTS (little bit slower, but better quality) you can do this by:
pip install tts==0.7.1
tts-server --model_name tts_models/de/thorsten/vits
I close this issue for now, but feel free top reopen if you have further questions.
Hi, first of all thank you very much for your contribution!
I'm trying to build a realtime voice assistant, for which I use different tools for stt, nlp and tts. I would love to use your voice for this, but the on the fly audio generation is a bit slow with your tacotron2 model.
I found this comparison https://github.com/coqui-ai/TTS/discussions/522 Is there any way to speed up the audio generation to similar values of the english models?