ubopod / ubo_app

This repo contains code for Ubo system app to control Raspberry Pi utilities and Ubo based functionalities
6 stars 2 forks source link

Local Text to Speech #62

Closed sassanh closed 5 months ago

sassanh commented 6 months ago
mehrdadfeller commented 5 months ago

suno bark may not be suitable for this application due to large overhead and not being optimized for this task. I suggest using Picovoice Orca for the PoC and meanwhile explore various options.

https://picovoice.ai/platform/orca/

The downside of Picovoice is that it is not an open source / free software. It requires getting a access token through their website (api token) and it limited to only 3 free devices.

One option is to ask developers to sign up on their website to get their own access keys (similar to other ngrok or openai) and get them to pass the access key via QR code.

For static content (hard coded sentences) such as instructions that don't change, we can save .wav files locally and simply play them back. Getting access token would be necessary only for dynamic text to speech generation.

Picovoice also offers local Speech to Text that we can use for building a conversational assistant with Ollama / OpenAI backend. They have a bunch of demos on their website that you can try:

https://picovoice.ai/platform/cheetah/

sassanh commented 5 months ago

Implemented via picovoice, one needs to enter their paid access key using the QR code scanner to activate offline service, with an unpaid access key it works, but needs an internet connection.