synesthesiam / rhasspy

Rhasspy voice assistant for offline home automation
https://rhasspy.readthedocs.io
MIT License
944 stars 101 forks source link

HomeAssistantIntentHandler -- not firing after transcription #114

Closed yukdumboobumm closed 4 years ago

yukdumboobumm commented 4 years ago

HomeAssistantIntentHandler POSTs correctly when using the "Get Intent" + "Send to Home Assistant" functionality on "Speech" tab:

[INFO:12229981] quart.serving: 192.168.50.97:51450 POST /api/text-to-intent 1.1 200 779 49250
[DEBUG:12229969] HomeAssistantIntentHandler: POSTed intent to http://192.168.50.2:8123/api/events/rhasspy_ChangeLightState
[DEBUG:12229962] urllib3.connectionpool: http://192.168.50.2:8123 "POST /api/events/rhasspy_ChangeLightState HTTP/1.1" 200 54
[DEBUG:12229949] urllib3.connectionpool: Starting new HTTP connection (1): 192.168.50.2:8123
[DEBUG:12229940] __main__: {"intent": {"name": "ChangeLightState", "confidence": 1.0}, "entities": [{"entity": "name", "value": "bulb_3", "raw_value": "living room wall", "start": 5, "raw_start": 5, "end": 11, "raw_end": 21, "tokens": ["bulb_3"], "raw_tokens": ["living", "room", "wall"]}, {"entity": "state", "value": "on", "raw_value": "on", "start": 12, "raw_start": 22, "end": 14, "raw_end": 24, "tokens": ["on"], "raw_tokens": ["on"]}], "text": "turn bulb_3 on", "raw_text": "turn living room wall on", "recognize_seconds": 0.001688663032837212, "tokens": ["turn", "bulb_3", "on"], "raw_tokens": ["turn", "living", "room", "wall", "on"], "speech_confidence": 1, "slots": {"name": "bulb_3", "state": "on"}, "wakeId": "", "siteId": "default", "time_sec": 0.005942821502685547}

But not when speaking:

[INFO:12163825] quart.serving: 192.168.50.30:49352 POST /api/text-to-intent 1.1 200 681 6952
[DEBUG:12163822] __main__: {"intent": {"name": "ChangeLightState", "confidence": 1.0}, "entities": [{"entity": "name", "value": "bulb_3", "raw_value": "living room wall", "start": 5, "raw_start": 5, "end": 11, "raw_end": 21, "tokens": ["bulb_3"], "raw_tokens": ["living", "room", "wall"]}, {"entity": "state", "value": "on", "raw_value": "on", "start": 12, "raw_start": 22, "end": 14, "raw_end": 24, "tokens": ["on"], "raw_tokens": ["on"]}], "text": "turn bulb_3 on", "raw_text": "turn living room wall on", "recognize_seconds": 0.0007086760015226901, "tokens": ["turn", "bulb_3", "on"], "raw_tokens": ["turn", "living", "room", "wall", "on"], "speech_confidence": 1, "slots": {"name": "bulb_3", "state": "on"}, "wakeId": "", "siteId": "default", "time_sec": 0.002857685089111328}
[INFO:12163784] quart.serving: 192.168.50.30:49350 POST /api/speech-to-text 1.1 200 24 834723
[DEBUG:12163780] PocketsphinxDecoder: turn living room wall on
[DEBUG:12163779] PocketsphinxDecoder: Transcription confidence: 0.0011084542951863563
[DEBUG:12163775] PocketsphinxDecoder: Decoded WAV in 0.8002548217773438 second(s)
[DEBUG:12162974] PocketsphinxDecoder: rate=16000, width=2, channels=1.

I'm also listening on my HA server for events. I see the event when using text-to-intent, but nothing from speech-to-intent.

Using a server / client setup as outlined in your docs

Server (also where my HA instance is):

Client:

{
    "command": {
        "system": "dummy"
    },
    "handle": {
        "system": "hass"
    },
    "home_assistant": {
        "access_token": "REMOVED",
        "url": "http://192.168.50.2:8123"
    },
    "microphone": {
        "system": "dummy"
    },
    "rhasspy": {
        "listen_on_start": false
    },
    "sounds": {
        "system": "dummy"
    },
    "text_to_speech": {
        "system": "dummy"
    }
}

Is there a more verbose log setting that I can enable to help debug?

frkos commented 4 years ago

Can it be related to disabled command?

Dummy Disables voice command listening. https://rhasspy.readthedocs.io/en/latest/command-listener/#dummy

I have HA as well and it works great in both ways

yukdumboobumm commented 4 years ago

@frkos are you using the latest (12/23/2019) docker image?

The config I listed is for my server instance...the client machine does the voice detection and wake-word.

both included below for reference: Server:

{
    "command": {
        "system": "dummy"
    },
    "handle": {
        "system": "hass"
    },
    "home_assistant": {
        "access_token": "REMOVED",
        "url": "http://192.168.50.2:8123"
    },
    "microphone": {
        "system": "dummy"
    },
    "rhasspy": {
        "listen_on_start": false
    },
    "sounds": {
        "system": "dummy"
    },
    "text_to_speech": {
        "system": "dummy"
    }
}

Client:

{
    "intent": {
        "remote": {
            "url": "http://192.168.50.2:12101/api/text-to-intent"
        },
        "system": "remote"
    },
    "microphone": {
        "pyaudio": {
            "device": "2"
        }
    },
    "sounds": {
        "aplay": {
            "device": "default:CARD=seeed2micvoicec"
        }
    },
    "speech_to_text": {
        "remote": {
            "url": "http://192.168.50.2:12101/api/speech-to-text"
        },
        "system": "remote"
    },
    "text_to_speech": {
        "system": "picotts"
    },
    "wake": {
        "snowboy": {
            "model": "snowboy/computer.pmdl",
            "sensitivity": "0.5"
        },
        "system": "snowboy"
    }
}

The client is successfully sending the WAV file to the server, but the server is not POSTing the intent to the HA api.

frkos commented 4 years ago

I'm using the latest 2.4.14.5 But I have all-in-one installation (hassio addon)...

According to the document:

(A text/plain response with the transcription is expected back. An additional profile query argument is sent with the current profile name, so the POST URL is effectively something like http://remote-server:12101/api/speech-to-text?profile=en)

So if I'm right, we have this logic: 1) client sends wav to server 2) server converts it to text and sends back to client 3) client sends received text back to server 4) server converts text to intent and process

But when you send text manually, it works... so 3) and 4) is ok Could you check your clients logs? in point 2) it should receive text from the server

As for me, rpi 3B+ is powerful enough to be a server, not just a clinet =)

yukdumboobumm commented 4 years ago

Client-side:

[DEBUG:139461741] SnowboyWakeListener: loaded -> listening
[DEBUG:139461739] DialogueManager: ready -> asleep
[INFO:139461738] DialogueManager: Automatically listening for wake word
[DEBUG:139461735] DialogueManager: handling -> ready
[DEBUG:139461734] WebSocketObserver: {"entities": [{"end": 11, "entity": "name", "raw_end": 21, "raw_start": 5, "raw_tokens": ["living", "room", "wall"], "raw_value": "living room wall", "start": 5, "tokens": ["bulb_3"], "value": "bulb_3"}, {"end": 14, "entity": "state", "raw_end": 24, "raw_start": 22, "raw_tokens": ["on"], "raw_value": "on", "start": 12, "tokens": ["on"], "value": "on"}], "intent": {"confidence": 1.0, "name": "ChangeLightState"}, "raw_text": "turn living room wall on", "raw_tokens": ["turn", "living", "room", "wall", "on"], "recognize_seconds": 0.0007333040121011436, "siteId": "default", "slots": {"name": "bulb_3", "state": "on"}, "speech_confidence": 1, "text": "turn bulb_3 on", "time_sec": 0.0028085708618164062, "tokens": ["turn", "bulb_3", "on"], "wakeId": "snowboy/computer.pmdl"}
[DEBUG:139461732] DialogueManager: recognizing -> handling
[DEBUG:139461730] DialogueManager: {'entities': [{'end': 11, 'entity': 'name', 'raw_end': 21, 'raw_start': 5, 'raw_tokens': ['living', 'room', 'wall'], 'raw_value': 'living room wall', 'start': 5, 'tokens': ['bulb_3'], 'value': 'bulb_3'}, {'end': 14, 'entity': 'state', 'raw_end': 24, 'raw_start': 22, 'raw_tokens': ['on'], 'raw_value': 'on', 'start': 12, 'tokens': ['on'], 'value': 'on'}], 'intent': {'confidence': 1.0, 'name': 'ChangeLightState'}, 'raw_text': 'turn living room wall on', 'raw_tokens': ['turn', 'living', 'room', 'wall', 'on'], 'recognize_seconds': 0.0007333040121011436, 'siteId': 'default', 'slots': {'name': 'bulb_3', 'state': 'on'}, 'speech_confidence': 1, 'text': 'turn bulb_3 on', 'time_sec': 0.0028085708618164062, 'tokens': ['turn', 'bulb_3', 'on'], 'wakeId': 'snowboy/computer.pmdl'}
[DEBUG:139461721] urllib3.connectionpool: http://192.168.50.2:12101 "POST /api/text-to-intent?profile=en&nohass=True HTTP/1.1" 200 682
[DEBUG:139461705] urllib3.connectionpool: Starting new HTTP connection (1): 192.168.50.2:12101
[DEBUG:139461691] DialogueManager: decoding -> recognizing
[DEBUG:139461690] DialogueManager: turn living room wall on (confidence=1)
[DEBUG:139461681] urllib3.connectionpool: http://192.168.50.2:12101 "POST /api/speech-to-text?profile=en HTTP/1.1" 200 24
[DEBUG:139460831] urllib3.connectionpool: Starting new HTTP connection (1): 192.168.50.2:12101
[DEBUG:139460796] RemoteDecoder: POSTing 221804 byte(s) of WAV data to http://192.168.50.2:12101/api/speech-to-text
[DEBUG:139460795] APlayAudioPlayer: ['aplay', '-q', '-D', 'default:CARD=seeed2micvoicec', '/usr/share/rhasspy/etc/wav/beep_lo.wav']
[DEBUG:139460793] DialogueManager: awake -> decoding
[DEBUG:139460789] WebrtcvadCommandListener: listening -> loaded
[DEBUG:139460786] WebrtcvadCommandListener: Voice command finished
[DEBUG:139453827] WebrtcvadCommandListener: Voice command started
[DEBUG:139453392] WebrtcvadCommandListener: loaded -> listening
[DEBUG:139453372] WebrtcvadCommandListener: Will timeout in 30 second(s)
[DEBUG:139453371] APlayAudioPlayer: ['aplay', '-q', '-D', 'default:CARD=seeed2micvoicec', '/usr/share/rhasspy/etc/wav/beep_hi.wav']
[DEBUG:139453367] SnowboyWakeListener: listening -> loaded
[DEBUG:139453364] DialogueManager: asleep -> awake
[DEBUG:139453363] DialogueManager: Awake!
[DEBUG:139453358] SnowboyWakeListener: Hotword(s) detected: ['snowboy/computer.pmdl']

and Server Side:

[INFO:93975] quart.serving: 192.168.50.30:50756 POST /api/text-to-intent 1.1 200 682 6972
[DEBUG:93972] __main__: {"intent": {"name": "ChangeLightState", "confidence": 1.0}, "entities": [{"entity": "name", "value": "bulb_3", "raw_value": "living room wall", "start": 5, "raw_start": 5, "end": 11, "raw_end": 21, "tokens": ["bulb_3"], "raw_tokens": ["living", "room", "wall"]}, {"entity": "state", "value": "on", "raw_value": "on", "start": 12, "raw_start": 22, "end": 14, "raw_end": 24, "tokens": ["on"], "raw_tokens": ["on"]}], "text": "turn bulb_3 on", "raw_text": "turn living room wall on", "recognize_seconds": 0.0007333040121011436, "tokens": ["turn", "bulb_3", "on"], "raw_tokens": ["turn", "living", "room", "wall", "on"], "speech_confidence": 1, "slots": {"name": "bulb_3", "state": "on"}, "wakeId": "", "siteId": "default", "time_sec": 0.0028085708618164062}
[INFO:93932] quart.serving: 192.168.50.30:50754 POST /api/speech-to-text 1.1 200 24 829149
[DEBUG:93928] PocketsphinxDecoder: turn living room wall on
[DEBUG:93928] PocketsphinxDecoder: Transcription confidence: 0.0012184474722329018
[DEBUG:93924] PocketsphinxDecoder: Decoded WAV in 0.7617373466491699 second(s)
[DEBUG:93160] PocketsphinxDecoder: rate=16000, width=2, channels=1.
[INFO:79785] quart.serving: 192.168.50.30:50752 POST /api/text-to-intent 1.1 200 158 7081

I'm having a hard time piecing together the log, but it seems like the Client is POSTing to both speech-to-text and text-to-intent, and receiving responses to both. Although the text-to-intent response appears twice, once in DialogueManager and once in WebSocketObserver.

Agreed about the 3b+ being able to do the work,. I'm using a Server / client setup because my server is located in a cabinet on the other side of the house. If I can't get this to work, I'll just run everything on the client and then send the intent from that machine. Still, it'd be neat to get this setup working.

yukdumboobumm commented 4 years ago

Finally understood what you were trying to say.

I added Intent Handling to the Client (in addition to the server), so now the intent is still transcibed by the Server, but the handling to HA is done with the Client. Everything works fine. I think there is some ambiguity in the docs that could be cleared up to make this more obvious. Will submit a request there. Thanks for the help and Merry Xmas!

-JD