rhasspy / wyoming-satellite

Remote voice satellite using Wyoming protocol
MIT License
578 stars 83 forks source link

Bypass wake word for an external event such as face recognition #81

Open RobertLukan opened 7 months ago

RobertLukan commented 7 months ago

Is there a way to by-pass the wake word and start STT on the requested ? I have face recognition system done with Frigate/Doube-Take/Compreface/Home Assistant. I have connected with Node-red event listening and sending TTS to remote satellite by using Rhasspy. But I noticed that Rhasspy is not developed anymore, so I am trying to migrate to Wyoming satellite.

I have Wyoming satellite running on RPI, I need to make more testing, however I would like to start the conversation with the Home Assistant on successful face recognition and not only on the wake word.

I was thinking is there a way to send Wyoming satellite HTTP command to start streaming audio for STT and bypass wake command? As I already have Node-red flow ready, I would just need to issue another HTTP call to the satellite.

llluis commented 7 months ago

How would you handle the conversation?

There’s an easy way to implement a remote trigger to the satellite but there’s no context, i.e., it’s exactly the same as you would trigger via wake word and you would have to give complete instructions, not only a reply.

Are you able to do differently today with rhaspy or node-red?

RobertLukan commented 7 months ago

Well, it would be perfect if the context would stay the same. My initial idea was to just replace wake word with an external trigger, simple as that. But I guess the proper way would be to inject data into Home Assistant(Assist) and let the HA or chat bot to answer.

An example:

  1. HA event happens(face recognition), it triggers Node-red or Automation based on the event
  2. Than automation/Node-red would "say" Mark came home to Chat-bot
  3. Chat-bot would respond "Welcome home Mark". I guess this would require to have chat-bot trained(I guess that is doable)
  4. Person replies with the wake command and the response or there is no wake command, it just waits for the input.

Right now, I am sending voice over Rhasspy as TTS(Node red HTTP API) and it works fine. But this is causing problems with wyoming-satellite. So I need to drop this option.

llluis commented 7 months ago

OK, I have a working demo. Willing to try it?

You must be in 2024.1.5 HA version. You need to copy the contents of this repo (https://github.com/llluis/homeassistant-wyoming) inside your home assistant /config/custom_components/wyoming and replace your satellite.py in the RPi by this one: https://github.com/llluis/wyoming-satellite/blob/remote-detection/wyoming_satellite/satellite.py

Reboot everything and you will have a new button in the satellite device in Home Assistant:

image

You can use this button in automations to trigger the speech to text of the satellite without the wake word. Let me know if it works.

llluis commented 7 months ago

For the TTS, you can install MPD in the RPi and have this functionality. See #28 and #40.

RobertLukan commented 7 months ago

Fantastic, I will give a try during the weekend. I will study that part of the code and play with it a bit. Now I see which part of the code is involved in this operation.

I actually like how Rhasspy is handling TTS, I wish that we could in similar way do TTS directly by invoking some HA API as part of the Assist(with the context). For now I will follow the MPD way.

RobertLukan commented 7 months ago

I just tested it and it works over HA GUI, I just need to figure it out how to do it over HTTP API(figured that one out). But there is one important detail "-wake-word" has to be defined when running wyoming-satellite otherwise an exception is thrown out.

If I define "--wake-word" when running wyoming-satellite the wake word simply does not work anymore(bug?).

Do you maybe know if HA Assist have API ? If yes, I could start contextual conversation with Node-red.

I am facing other issue, like false positive triggering wake word. I will play with this setup a bit more, but it looks buggy to me. I will open other issue once I have more data.

I have switched to GPU accelerated STT and it works way better.

RobertLukan commented 7 months ago

Now I know how to get and track conversation_id that is part of the HA API. I am attaching an example from Node-red.

So the next issue is how to associate the conversation id as part of the "wyoming conversation" so that TTS and STT are part of the same conversation. I am not sure what is the correct way forward to be honest.

Screenshot 2024-01-27 at 20 50 59
RobertLukan commented 7 months ago

Actually I found this feature within Rhasspy3(fork): https://github.com/rhasspy/rhasspy3/compare/master...Shulyaka:rhasspy3:home_assistant_conversation_id

It is a pity that Rhasspy3 is not being developed anymore. I tried wyoming-satellite and Rhasspy3 and Rhasspy3 works better. Wake word works way better.

I think I will try more Rhasspy3 for now. Regardless thank you for your help.

geman220 commented 7 months ago

OK, I have a working demo. Willing to try it?

You must be in 2024.1.5 HA version. You need to copy the contents of this repo (https://github.com/llluis/homeassistant-wyoming) inside your home assistant /config/custom_components/wyoming and replace your satellite.py in the RPi by this one: https://github.com/llluis/wyoming-satellite/blob/remote-detection/wyoming_satellite/satellite.py

Reboot everything and you will have a new button in the satellite device in Home Assistant:

image

You can use this button in automations to trigger the speech to text of the satellite without the wake word. Let me know if it works.

I don't mean to necro a completed item, so feel free to tell me to open a new item, but I am working to test this. I need a way to trigger voice via automation, so TTS gets sent to the satellite over MPD, and then I need the satellite to listen for the voice input for a response (yes or no) without the wake word being spoken to determine what to do next. If you are still looking for someone to test, I can provide feedback as I work through this.

RobertLukan commented 7 months ago

Ok, I am willing to test a new version. I got stuck with Rhasspy3 as I dont have time to debug right now.

geman220 commented 7 months ago

So if I understand correctly, I should be able to “press” on trigger detection in the UI, and that should make the satellite activate (as if a wake word was spoken) and perform the speech to text translation? Should the assist in progress sensor update to “on”?

llluis commented 7 months ago

So if I understand correctly, I should be able to “press” on trigger detection in the UI, and that should make the satellite activate (as if a wake word was spoken) and perform the speech to text translation? Should the assist in progress sensor update to “on”?

Exactly. However, you will have to give an action to the satellite. It's not possible (yet) to just capture the reply (the transcription).

Actually, I have a working version with this too, but still a proof of concept.

geman220 commented 7 months ago

So if I understand correctly, I should be able to “press” on trigger detection in the UI, and that should make the satellite activate (as if a wake word was spoken) and perform the speech to text translation? Should the assist in progress sensor update to “on”?

Exactly. However, you will have to give an action to the satellite. It's not possible (yet) to just capture the reply (the transcription).

Actually, I have a working version with this too, but still a proof of concept.

Well I’ve made the changes on my end, but hitting “press” doesn’t seem to do anything. The UI doesn’t update to reflect assist in progress, and I don’t notice any logs on the satellite that immediately indicate any change. It does log that the button was pressed.

So what I’m understanding is that I need to write an automaton to send an action to the satellite when the button is pressed? Do you mind sharing your proof of concept so that I may implement and take a look at?

llluis commented 7 months ago

No, the button shall trigger the satellite without any other automation. Did you use my version of the satellite too?

geman220 commented 7 months ago

No, the button shall trigger the satellite without any other automation. Did you use my version of the satellite too?

I did:

~/wyoming-satellite/wyoming_satellite $ mv satellite.py satelliteBACKUP.py ~/wyoming-satellite/wyoming_satellite $ wget https://raw.githubusercontent.com/llluis/wyoming-satellite/remote-detection/wyoming_satellite/satellite.py ~/wyoming-satellite $ script/run --name 'Screen Satellite' --uri 'tcp://0.0.0.0:10700' --mic-command 'arecord -D plughw:CARD=BYLM40,DEV=0 -r 16000 -c 1 -f S16_LE -t raw' --snd-command 'aplay -r 22050 -c 1 -f S16_LE -t raw'

image

llluis commented 7 months ago

Do you mind sharing your proof of concept so that I may implement and take a look at?

You must use my custom version of the Wyoming integration in Home Assistant available here: https://github.com/llluis/homeassistant-wyoming/tree/dev

And my custom version of the satellite available here: https://github.com/llluis/wyoming-satellite/tree/dev

This will enable:

image

So, you would, in an automation:

llluis commented 7 months ago

Just to let you know I pushed a few changes to make it easier in the wait_for_trigger automation. (just download again from the repo if already downloaded before)

image

The key pipeline_event doesn't exist anymore.

llluis commented 7 months ago

To wait the reply:

wait_for_trigger:
  - platform: event
    event_type: wyoming-satellite-pipeline
    event_data:
      satellite_name: Nabu1
      type: stt-end

The reply will be in wait.trigger.event.data.data.stt_output.text

geman220 commented 7 months ago

So developer-tools/event

I listen to wyoming-satellite-pipeline

If I press Trigger ask question I see nothing in the event log.

Restarting the satellite I do see run-start and wake_word-start, if I use the wake word I see normal stt-start and stt-end. But pressing the buttons in the UI doesn't log anything when listening to wyoming-satellite-pipeline

llluis commented 7 months ago

I must be forgotting something in the setup as it works here. You are seeing the button, so the deploy of the home assistant part went ok. You did copy all files in the wyoming directory, right?

image

Can you post the debug log of the satellite?

Here's the expected output when clicking the ask button:

Feb 06 15:23:38 nabu1 run[17414]: DEBUG    Detection called. Name: ask
Feb 06 15:23:38 nabu1 run[17414]: DEBUG    Detection triggered from event
Feb 06 15:23:38 nabu1 run[17414]: DEBUG    Streaming audio
Feb 06 15:23:38 nabu1 run[17414]: DEBUG    RunPipeline from PipelineStage.ASR to PipelineStage.ASR
geman220 commented 7 months ago

I downloaded the .zip from here https://github.com/llluis/homeassistant-wyoming/tree/dev. I deleted my old wyoming folder in \config\custom_components, created a new wyoming folder \config\custom_components\wyoming, and then placed all of the files from the .zip. I restarted HA after. I saw the new buttons so all went well there.

I did git pull your dev branch https://github.com/llluis/wyoming-satellite/tree/dev and just to be sure I checked for the recent changes in satellite.py and they were there.

I just went to get you a debug log and noticed it's working now. I haven't changed anything as far as your wyoming addon for HA or satellite dev. I'm adding the rest of this for context in case someone else may find it useful.

That being said, I did change my Whisper to a docker container on a different server so I could leverage the GPU to increase response speed. I'm not sure how or why that would make any difference in this case, but the whisper processing was around 15-20 seconds before and now it's down to 1-2s. I also moved OpenWakeWord to the Pi to do local detection of wakeword. So potentially some change in that stack resolved the issue?

I'm going to work on creating an automation now and I will report back.

Thanks!

llluis commented 7 months ago

I also moved OpenWakeWord to the Pi to do local detection of wakeword

This is it. The "hack" I'm using depends on the satellite running on mode local wake word, as I inject the detection the same way the openwakeword does.

I forgot about this detail.

geman220 commented 7 months ago

I also moved OpenWakeWord to the Pi to do local detection of wakeword

This is it. The "hack" I'm using depends on the satellite running on mode local wake word, as I inject the detection the same way the openwakeword does.

Oh! Sorry, I didn't realize that was required or I would have set that up sooner!

geman220 commented 7 months ago

I have it working in node-red, I need to tie some things together to make it fully functioning, but essentially the following works:

Inject a timestamp which fires a TTS "Do you want to turn the lights on?" delay 2 seconds, press the "ask question" button and get the SST. I listen to the satellite-pipeline and detect yes. or no. and process automation with a switch. In my screenshot Debug 5 has a payload of "yes.". So I can just change the trigger (timestamp injection) into something useful, and the execution of automation after the yes/no switch.

image

geman220 commented 7 months ago

As an update, I added a more useful trigger and execution. Works great.

image

llluis commented 7 months ago

Are you using the media_player from Wyoming or MPD?

geman220 commented 7 months ago

Are you using the media_player from Wyoming or MPD?

I am using your implementation of media_player, I figure that would be more useful to use so I could validate everything works as implemented.

llluis commented 7 months ago

@geman220, if you update both components, I now send a new event from satellite to home assistant after the TTS ended playing:

image

This way you could replace the time delay and wait for this event, so it would work for any TTS text lenght automatically.

geman220 commented 7 months ago

@geman220, if you update both components, I now send a new event from satellite to home assistant after the TTS ended playing:

image

This way you could replace the time delay and wait for this event, so it would work for any TTS text lenght automatically.

You’re a legend. I was actually trying to do a hacky work around for this so I don’t have to guess the wait length. I will get it updated tomorrow and report back.

geman220 commented 7 months ago

Just to verify, I did a git pull of https://github.com/llluis/wyoming-satellite/tree/dev and replaced the custom component with https://github.com/llluis/homeassistant-wyoming/tree/dev.

In Node-Red, using a timestamp to inject tts.speak to tts.piper {"message": "Help computer.","media_player_entity_id":"media_player.screen_satellite_satellite_speaker"} I don't see any events when listening to wyoming-satellite-pipeline.

llluis commented 7 months ago

Just to verify, I did a git pull of https://github.com/llluis/wyoming-satellite/tree/dev and replaced the custom component with https://github.com/llluis/homeassistant-wyoming/tree/dev.

In Node-Red, using a timestamp to inject tts.speak to tts.piper {"message": "Help computer.","media_player_entity_id":"media_player.screen_satellite_satellite_speaker"} I don't see any events when listening to wyoming-satellite-pipeline.

I named the event differently as it's an event from the satellite and not a "pipeline" event like the others. Check the screenshot. So the name is just wyoming-satellite.

geman220 commented 7 months ago

Just to verify, I did a git pull of https://github.com/llluis/wyoming-satellite/tree/dev and replaced the custom component with https://github.com/llluis/homeassistant-wyoming/tree/dev. In Node-Red, using a timestamp to inject tts.speak to tts.piper {"message": "Help computer.","media_player_entity_id":"media_player.screen_satellite_satellite_speaker"} I don't see any events when listening to wyoming-satellite-pipeline.

I named the event differently as it's an event from the satellite and not a "pipeline" event like the others. Check the screenshot. So the name is just wyoming-satellite.

Same result, listening to wyoming-satellite and using timestamp to fire a TTS doesn't show any events.

evgenyyy commented 7 months ago

I'm also trying this out as I want to play responses from Pi on a different speaker and need to capture TTS response. The original satellite.py that you linked above is not there anymore, so I tried replacing it with the satellite.py here: https://github.com/llluis/wyoming-satellite/blob/master/wyoming_satellite/satellite.py

As soon as I do this and reboot the Pi, wyoming-satellite.service fails to start

llluis commented 7 months ago

I'm also trying this out as I want to play responses from Pi on a different speaker and need to capture TTS response. The original satellite.py that you linked above is not there anymore, so I tried replacing it with the satellite.py here: https://github.com/llluis/wyoming-satellite/blob/master/wyoming_satellite/satellite.py

As soon as I do this and reboot the Pi, wyoming-satellite.service fails to start

The master branch will not work. You have to point to the dev branch.

llluis commented 7 months ago

Sorry folks. There`s a commit missing in the HA part. I'll get this published at night when I'll be back home.

evgenyyy commented 7 months ago

I'm also trying this out as I want to play responses from Pi on a different speaker and need to capture TTS response. The original satellite.py that you linked above is not there anymore, so I tried replacing it with the satellite.py here: https://github.com/llluis/wyoming-satellite/blob/master/wyoming_satellite/satellite.py As soon as I do this and reboot the Pi, wyoming-satellite.service fails to start

The master branch will not work. You have to point to the dev branch.

Silly me, confirming the dev branch satellite.py works and I can now trigger detection with a button. Will test the event listening once you've done the commit.

llluis commented 7 months ago

Hello! Commit published. Please update the file and try again. :)

geman220 commented 7 months ago

image

Works now. I'll have to work on implementing this into node-red so that it waits for this event before starting to listen.

I'm also working on this. This will send a custom TTS when the assistant "wakes up" You can save this as awake.sh then, in your satellite service (or however you are running it) add --detection-command '/home/user/location/to/awake.sh'

awake.sh

#!/bin/bash
echo $(date) "[awake.sh] ...Starting awake.sh script"

# Home Assistant Server Address
HA_SERVER="YOURSERVER"

# Home Assistant Long-Lived Access Token
HA_TOKEN="YOURTOKEN"

# TTS service endpoint
TTS_SERVICE_ENDPOINT="/api/services/tts/speak"

# Entity ID of the media player and TTS entity
MEDIA_PLAYER_ENTITY_ID="media_player.YOURMEDIAPLAYER"
TTS_ENTITY_ID="tts.piper"

# Message to be spoken
TTS_MESSAGE="How can I help?"

# Cache setting
TTS_CACHE="false"

# JSON payload with flat structure
PAYLOAD=$(cat <<EOF
{
  "media_player_entity_id": "$MEDIA_PLAYER_ENTITY_ID",
  "message": "$TTS_MESSAGE",
  "cache": $TTS_CACHE,
  "entity_id": "$TTS_ENTITY_ID"
}
EOF
)

# Send the TTS command to Home Assistant
curl -X POST -H "Authorization: Bearer $HA_TOKEN" -H "Content-Type: application/json" -d "$PAYLOAD" "$HA_SERVER$TTS_SERVICE_ENDPOINT"

echo $(date) "[awake.sh] ...TTS command sent"

The challenge here is that, in this use case, it will say do you want me to dim the lights up then how can I help since that's the --detection-command. So I am going to try to implement logic to bypass --detection-command if the wakeword is bypassed.

llluis commented 7 months ago

The challenge here is that, in this use case, it will say do you want me to dim the lights up then how can I help since that's the --detection-command. So I am going to try to implement logic to bypass --detection-command if the wakeword is bypassed.

Easiest way would be to create a new event in wyoming protocol to use when the wakeword is bypassed. I talked about it with synesthesiam. That would also open other doors with controlling the pipeline execution. For now I would like to focus on getting these new features tested and integrated in the main branch before going too far in the side track. :) But I'll try to think about it...

geman220 commented 7 months ago

image

Works as expected. I changed the node Ask to dim lights up to a very long sentence, it didn't call button.press until it was finished reading the long message. So it is successfully awaiting the TTS to complete dynamically. Now I need to figure out how to have it delay itself so it doesn't hear "How can I help?" as set by --detection-command

llluis commented 6 months ago

I published new features in my branches. Including this one:

image

However it's a breaking change: I removed both buttons in the interface and replaced by a service. It's much more clean as it's use is always in an automation.

I'll create a doc page to publish those changes and avoid linking all issues to this one...

geman220 commented 6 months ago

I published new features in my branches. Including this one:

image

However it's a breaking change: I removed both buttons in the interface and replaced by a service. It's much more clean as it's use is always in an automation.

I'll create a doc page to publish those changes and avoid linking all issues to this one...

So if I understand correctly, you're making the "listen" trigger by a service call instead of a button.press?

llluis commented 6 months ago

I published new features in my branches. Including this one: image However it's a breaking change: I removed both buttons in the interface and replaced by a service. It's much more clean as it's use is always in an automation. I'll create a doc page to publish those changes and avoid linking all issues to this one...

So if I understand correctly, you're making the "listen" trigger by a service call instead of a button.press?

Exactly:

image

geman220 commented 6 months ago

Sweet, is this an update to both the satellite and the addon, or just the addon? I can't update and adjust my node-red to validate if you need.

llluis commented 6 months ago

Both.

geman220 commented 6 months ago

Cool, I'll update from the dev branch for the addon, I'm assuming I still only need to replace satellite.py on the satellite?

llluis commented 6 months ago

Awesome, thank you! Yup, just it.

geman220 commented 6 months ago

Works great. It is, essentially, a drop-in replacement for the previous implementation. You just need to change the entity to send the TTS, and call the service wyoming.remote_trigger on the satellite entity, instead of button.press.

llluis commented 6 months ago

Exactly. And you need to set the Question ID to whatever you like so the satellite stops after the STT without trying to match an intent and send a reply. The plan is to use that ID in the future to match with the question but that's not in place yet.

And you probably noted I changed the name of the speaker. Just trying to improve step by step. :)

geman220 commented 6 months ago

I did notice the speaker name change, I didn't change or add a question_id. I am still listening to the wyoming-pipeline from the node-red flow askToDimUp. Which is how it knows what question is being asked and how to handle the response. Are you saying I can just set a question_id on the call service for wyoming.remote_trigger?