toverainc / willow

Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative
https://heywillow.io/
Apache License 2.0
2.48k stars 94 forks source link

OpenAI APIs for TTS/STT? #348

Open skorokithakis opened 6 months ago

skorokithakis commented 6 months ago

Is there (a plan for) a way to use the OpenAI servers for STT/TTS? They are fairly slow, unfortunately, but they might be a good option for some people.

kristiankielhofner commented 6 months ago

It's not exactly impossible but it hasn't been a focus because as you say it's quite slow - to the point of going against our mission of an Alexa-competitive voice interface.

Willow has a fairly unique streaming method to WIS. I'm not completely familiar with the OpenAI speech API but at best you'd almost certainly need a proxy of some sort, and if you were doing advanced things like audio compression (AMR) you'd need to do more.

skorokithakis commented 6 months ago

Makes sense, thank you.

skorokithakis commented 3 weeks ago

I'd like to revisit this now with GPT-4o being out, the multimodal functionality of sending the audio directly to the model and getting audio back might be interesting. Are there any plans for WIS to send the audio to the REST endpoint directly, and receive audio back?