secretsauceai / secret_sauce_ai

Secret Sauce AI: a coordinated community of tech minded AI enthusiasts
Apache License 2.0
65 stars 5 forks source link

What TTS models can be run on raspi4 in real time? #20

Open secretsauceai opened 3 years ago

secretsauceai commented 3 years ago

We know Mimic can be run on a raspi4 in 'real time', we also know that Tacotron(2) probably will never run real time on a raspi4 (or perhaps?), so what does that leave us with?

Has anyone tried the Silero TTS models?

sheosi commented 3 years ago

By Mimic, I understand you mean Mimic2, right? For the TTS I'm using Pico, I know that it is not AI-based, but it is fast and sounds good (at least in Spanish).

secretsauceai commented 3 years ago

I haven't tried anything locally beyond the original Mimic. However, I would like a much better solution in the future. The voice is pretty 'robotic'. Mimic2 is based on tacotron, so until it is possible to run that in tflite or something similar it won't even get close to running in 'real time' on a raspi4.

How do you feel about the performance of Pico?

Lots of further research is needed, I myself need to look into this with a lot more depth..

List of links related to tacotron (and similar) that can run tflite (or pytorch mobile)

sheosi commented 3 years ago

For me the solution for the time being is being able to use online TTSs and have Pico and optionally eSpeak as fallback.

Regarding Mimic/Tacotron Mozilla improved deepspeech a ton and made it usable under a Raspi4, I hope Tacotron gets the same treatment.

Also, there are several variants of the TTS components. Mozilla-tts (which implements tacotron too) let's you play with them.

secretsauceai commented 3 years ago

It would be interesting to compile a data set of responses and measure how long it takes to generate the TTS as a benchmark. Also a subjective 'how robotic is the voice', wouldn't be bad.

I am quite curious about performance benchmarks in TTS.

I saw from the above linked article about Mozilla TTS on a raspi, it runs 6 times slower than 'real time' with that configuration.

hobbycommandline commented 1 year ago

I have used TTS on raspi 3b+, festvox/flite works well and in real enough time to be useful for a screen reader, so it will work as an Assistant voice. festvox/festival is the interpreted version and does not run in real time on the 3b+ but you may find luck on the 4. The default voices for festival and flite both are not as good as some of the other voices you can download. My blind friend suggested RHvoice which I haven't tested personally on raspi but it works on Android so I would hope it works on raspi too.