Add possibility to continue a spoken conversation when assist needs more info.

rhasspy / wyoming

Peer-to-peer protocol for voice assistants

MIT License

138 stars 20 forks source link

Add possibility to continue a spoken conversation when assist needs more info. #1

Open sanderkooger opened 11 months ago

sanderkooger commented 11 months ago

Hey big fan of the new integrations in home assistant.

TLDR: When user chooses to use an AI agent (OpenAI, Mixtral, etc) to do things in the house, the agent often has questions. With voice commands, it's currently not possible to continue the conversation, It would be a great improvement to add a function that an AI could trigger if it needs more information.

However, from what I have been reading, This would require a change in the workflow of the protocol, am I correct?

/what would it entail to allow the assistant itself to continue the conversation without having to shout out the wake word again an restating all that has been said before?

@jekalmin

synesthesiam commented 10 months ago

I've got a start on this, but there is more work to do. I've extended the intent/handle related events with a context dictionary that will be used to hold conversational context.

Another piece that's missing is something in the response events (e.g., Intent, Handled) indicating that a follow-up response from the other end is required or possible. This could be as simple as a boolean, but I'd like to consider more options before committing.

Shulyaka commented 10 months ago

The response events would need context as well, because they will need to pass the conversation_id somehow.

sdetweil commented 10 months ago

the new response event also triggers text to speech to inform the user of the new input request.. this changes the flow from before where tts was the end. so the tts event needs info for the state manager to return to the asr and mic turn on audio forwarding again.

thanks for bringing this up.. my intent was to use wyoming under smart mirror to replace the on platform snowboy with the docker container(and that dragged in hotword detection, and vad and asr...)

i had built in conversational support for an Alexa and Google Assistant plugins..(or anything) but now that amazon has killed software only Alexa's I hadn't used it much anymore..