What TTS models can be run on raspi4 in real time?

secretsauceai commented 3 years ago

We know Mimic can be run on a raspi4 in 'real time', we also know that Tacotron(2) probably will never run real time on a raspi4 (or perhaps?), so what does that leave us with?

Has anyone tried the Silero TTS models?

sheosi commented 3 years ago

By Mimic, I understand you mean Mimic2, right? For the TTS I'm using Pico, I know that it is not AI-based, but it is fast and sounds good (at least in Spanish).

secretsauceai commented 3 years ago

I haven't tried anything locally beyond the original Mimic. However, I would like a much better solution in the future. The voice is pretty 'robotic'. Mimic2 is based on tacotron, so until it is possible to run that in tflite or something similar it won't even get close to running in 'real time' on a raspi4.

How do you feel about the performance of Pico?

Is this the same git repo you use, or do you use another one?
I couldn't find an architectural description for Pico, does anyone have one? I am curious how it actually works.
This is the same TTS as stock android still uses by default, right?
It must not be very resource intensive, does anyone know the level of resources it uses (memory, CPU% during TTS synthesis in Raspi4)?
How 'robotic' do you feel this is?
Is this the TTS you want to stick with?

Lots of further research is needed, I myself need to look into this with a lot more depth..

List of links related to tacotron (and similar) that can run tflite (or pytorch mobile)

sheosi commented 3 years ago

The one I use is this which is embedded on the Rust library that I use. It contains modifications for Pico to work on 64-bits.
Not me , and is difficult you'll see one. The company behind Pico (SVOX) released Pico just because Google wanted, so they open sourced just the TTS itself and some voices.
Toying around it seems that they still use Pico if you jave no internet connection. Can't back this up but they are definitely switching engines with internet or without.
My impression is that is lightweight too. Don't have any precise metrics but Lily runs with less than 100Mb of RAM and contains Pico and lots of other things (including PocketSphinx, Mqtt...) so I'd say a maximum of 50Mb of RAM.
The spanish voice is really good, the English one seems more robotic but still pretty nice.
Not necessarily. I still think that a AI-driven would not only sound much better but also be a blessing for small languages. As of now, generating a Pico voice model would be an enormous task as nothing has been open sourced (schemas, model generation tools...) our only alternative right now is espeak (which is available for virtually any language) but it sounds extremely robotic and is GPLv3 licensed (really difficult to use on closed embedded devices and younmeed to be careful with your own licensing).

For me the solution for the time being is being able to use online TTSs and have Pico and optionally eSpeak as fallback.

Regarding Mimic/Tacotron Mozilla improved deepspeech a ton and made it usable under a Raspi4, I hope Tacotron gets the same treatment.

Also, there are several variants of the TTS components. Mozilla-tts (which implements tacotron too) let's you play with them.

secretsauceai commented 3 years ago

It would be interesting to compile a data set of responses and measure how long it takes to generate the TTS as a benchmark. Also a subjective 'how robotic is the voice', wouldn't be bad.

I am quite curious about performance benchmarks in TTS.

I saw from the above linked article about Mozilla TTS on a raspi, it runs 6 times slower than 'real time' with that configuration.

hobbycommandline commented 1 year ago

I have used TTS on raspi 3b+, festvox/flite works well and in real enough time to be useful for a screen reader, so it will work as an Assistant voice. festvox/festival is the interpreted version and does not run in real time on the 3b+ but you may find luck on the 4. The default voices for festival and flite both are not as good as some of the other voices you can download. My blind friend suggested RHvoice which I haven't tested personally on raspi but it works on Android so I would hope it works on raspi too.

secretsauceai / secret_sauce_ai

What TTS models can be run on raspi4 in real time? #20

List of links related to tacotron (and similar) that can run tflite (or pytorch mobile)