toverainc / willow

Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative
https://heywillow.io/
Apache License 2.0
2.61k stars 96 forks source link

Enable TTS audio response from generic REST endpoints #225

Closed hamishcunningham closed 1 year ago

hamishcunningham commented 1 year ago

Enable TTS audio response from REST endpoints by checking for a message field in POST responses and passing it to the audio response fn_ok.

Add a configuration option to specify the maximum length of the text to send to TTS in this way.

Background: for HomeAssistant the response can be read out on the ESP BOX. To enable external REST APIs to do that we need to add a little code to rest.c (following the way it is done in hass.c).

kristiankielhofner commented 1 year ago

Nice! @stintel will be back soon and we'll take a deeper look!

stintel commented 1 year ago

We've decided to not accept this PR until after merging the feature/was branch.

zmarty commented 1 year ago

We've decided to not accept this PR until after merging the feature/was branch.

I vaguely think this branch is now merged, so would it please be possible to get this functionality in? I would really like to be able to use willow and WIS but have the "brains" be code that I write. So basically if a user says "Hi ESP", the audio should get streamed to whisper in WIS, which forwards the text to my REST endpoint. Then I would like to be able to reply with some text and that magically gets converted to audio via TTS, and then gets played on the ESP32 Box. Is this possible? It would be awesome. Thank you very much.

stintel commented 1 year ago

I vaguely think this branch is now merged, so would it please be possible to get this functionality in?

We indeed just merged the feature/was branched and tagged 0.1.0-rc.1. For now, we will solely focus on handling issues found in this release candidate, so new features will have to wait just a bit longer.

skorokithakis commented 1 year ago

I think an improvement here would be the ability to have separate strings for text and speech, as sometimes I want a long response spoken but a short one shown. If the text response is missing, showing the speech string on the screen would probably be a reasonable default.

stintel commented 1 year ago

We created a release/v0.1 branch, so we are accepting new features in main again. Can you fix the conflicts in the PR?

skorokithakis commented 1 year ago

@stintel is it just formatting changes that are required here?

skorokithakis commented 1 year ago

I could try to clone this PR to add the "text" and "speech" keys, as proposed, though my C++ isn't great.

stintel commented 1 year ago

@stintel is it just formatting changes that are required here?

No. There are conflicts as shown on the bottom of this PR page.

hamishcunningham commented 1 year ago

sorry to be slow, I'm pretty busy at present; I'll try and get back to this soon!

skorokithakis commented 1 year ago

Ah that's no problem, @hamishcunningham, I was just wondering whether it existed and I'd missed it.

kristiankielhofner commented 1 year ago

We have merged our own variant of this functionality and it is included in the current Willow release candidate.

skorokithakis commented 7 months ago

@kristiankielhofner is there any documentation on this? I've been trying to find it in the docs but no luck so far.

nikito commented 7 months ago

Looking in the code it looks like you just need a speech element in the json reply with the text to speak.

skorokithakis commented 7 months ago

@nikito that works, thanks, but it would be good to have docs on how REST works, in general. Also, I thought there are different elements for speaking and for displaying, is that not the case?