Open yacineMTB opened 1 year ago
What do you think of https://github.com/rhasspy/piper - was pretty straightforward to set up. I haven't been able to train it but the voices were ok
Thank you for sharing!! I think that this is a facade that uses mimic3 under the hood. It's cpp so I should be able to churn out a binding pretty quickly for this
Yeah i looked at the code & some samples, and asked a friend of mine
it is perfect for this
thanks @synesthesiam!!
I've heard that the Mycroft model is not really good. Maybe it's better to use Microsoft's TTS.
coqui-ai
From a quick glance it seems too bloated
Microsoft's TTS
Is it locally runnable?
coqui-ai
From a quick glance this seems too bloated
I think this is why mimicv3 wins. This is actually ridiculous. Plus i think the project is kinda based Also, the models are highly variable based on data quality. Picking mimic, but I'll abstract the TTS portion so it's swappable.
https://github.com/iacore/nix-tts is pretty good
Microsoft TTS is usable on Windows machines. On Linux there is espeak, although the quality is not good.
@yacineMTB you're welcome! I wrote both Piper and Mimic 3: Piper is the better choice as it's newer and faster :+1:
help this is not working on windows
This thing needs to respond back to us on some event. Right now, the strategy to reduce latency is to generate precanned responses constantly. Maybe we can also follow the same strategy with some TTS system?
Ideally this would
For now we can just save it as a wav file. The scope of this task is figuring out what reasonable candidates we have for TTS, with one of the goals being low latency.