synesthesiam / rhasspy

Rhasspy voice assistant for offline home automation
https://rhasspy.readthedocs.io
MIT License
950 stars 101 forks source link

wrong recognition via microphone but perfect via wav upload #144

Closed timaschew closed 4 years ago

timaschew commented 4 years ago

Related to #141 So I'm running client/server setup (from the tutorial) with a Raspberry 3B with docker. I have this soundcard and this microphone.

I've tried to speak different sentences and it was never recognized correctly.

  1. what time is it -> turn on the bedroom light
  2. tell me the time -> turn on the living room lamp
  3. whats the temperature -> turn on bedroom light
  4. how cold is it -> how hot is it (this seems to be close, but it's just random, see next attempt)
  5. how hot is it -> turn on the bedroom light

Also the hotword doesn't work.

See logs at the bottom. When I record on the same Raspberry wav file with this command:

arecord -r 16000 -f S16_LE -c 1 -t wav -d 4 -D hw:CARD=Device,DEV=0 how-cold-is-it.wav

and upload the file to rhasspy, all my 5 sentences are recognized correctly.
Of course I use the same microphone the same distance, voice, ...

Rhasspy logs during the recording. First two were recorded via Hold to record, the others via Tap to record.

# server start

DEBUG:__main__:Namespace(host='0.0.0.0', log_level='DEBUG', port=12101, profile='en', set=[], ssl=None, system_profiles='/usr/share/rhasspy/profiles', user_profiles='/profiles')
DEBUG:RhasspyCore:Loaded profile from /profiles/en/profile.json
DEBUG:RhasspyCore:Profile files will be written to /profiles/en
DEBUG:root:Loading default profile settings from /usr/share/rhasspy/profiles/defaults.json
DEBUG:WebSocketObserver: -> started
DEBUG:DialogueManager: -> started
DEBUG:DialogueManager:started -> loading
DEBUG:DialogueManager:Loading actors
DEBUG:DialogueManager:Actors created. Waiting for ['recorder', 'player', 'speech', 'wake', 'command', 'decoder', 'recognizer', 'handler', 'speech_trainer', 'intent_trainer', 'word_pronouncer'] to start.
DEBUG:ARecordAudioRecorder: -> started
DEBUG:DummyAudioPlayer: -> started
DEBUG:DummySentenceSpeaker: -> started
DEBUG:RemoteRecognizer: -> started
DEBUG:SnowboyWakeListener: -> started
DEBUG:WebrtcvadCommandListener: -> started
DEBUG:RemoteDecoder: -> started
DEBUG:DummyIntentHandler: -> started
DEBUG:DummySpeechTrainer: -> started
DEBUG:DummyIntentTrainer: -> started
DEBUG:PhonetisaurusPronounce: -> started
DEBUG:DialogueManager:recorder started
DEBUG:WebrtcvadCommandListener:started -> loaded
DEBUG:DialogueManager:player started
DEBUG:DialogueManager:speech started
DEBUG:DialogueManager:recognizer started
DEBUG:DialogueManager:decoder started
DEBUG:DialogueManager:handler started
DEBUG:DialogueManager:speech_trainer started
DEBUG:DialogueManager:intent_trainer started
DEBUG:DialogueManager:word_pronouncer started
DEBUG:DialogueManager:command started
DEBUG:SnowboyWakeListener:Loading snowboy model from /profiles/en/snowboy/snowboy.umdl
DEBUG:SnowboyWakeListener:Loaded snowboy model snowboy/snowboy.umdl ({'sensitivity': '0.5', 'audio_gain': 1.0, 'apply_frontend': False})
DEBUG:SnowboyWakeListener:started -> loaded
DEBUG:DialogueManager:wake started
DEBUG:DialogueManager:Actors loaded
DEBUG:DialogueManager:loading -> ready
INFO:DialogueManager:Automatically listening for wake word
DEBUG:DialogueManager:ready -> asleep
DEBUG:SnowboyWakeListener:loaded -> listening
DEBUG:ARecordAudioRecorder:started -> recording
INFO:__main__:Started
DEBUG:ARecordAudioRecorder:['arecord', '-q', '-r', '16000', '-f', 'S16_LE', '-c', '1', '-t', 'raw', '-D', 'hw:CARD=Device,DEV=0']
DEBUG:__main__:Starting web server at http://0.0.0.0:12101
DEBUG:ARecordAudioRecorder:Recording from microphone (arecord)
Running on https://0.0.0.0:12101 (CTRL + C to quit)
[2020-01-05 11:33:58,771] ASGI Framework Lifespan error, continuing without Lifespan support
WARNING:quart.serving:ASGI Framework Lifespan error, continuing without Lifespan support
[2020-01-05 11:34:02,684] 192.168.178.21:64160 GET / 1.1 200 1029 24961
INFO:quart.serving:192.168.178.21:64160 GET / 1.1 200 1029 24961
[2020-01-05 11:34:02,701] 192.168.178.21:64137 GET /api/events/log 1.1 101 - 3636574
INFO:quart.serving:192.168.178.21:64137 GET /api/events/log 1.1 101 - 3636574
ERROR:__main__:api_events_log
Traceback (most recent call last):
  File "app.py", line 1123, in api_events_log
    text = await q.get()
  File "/usr/lib/python3.6/asyncio/queues.py", line 167, in get
    yield from getter
concurrent.futures._base.CancelledError
ERROR:quart.serving:Error in ASGI Framework
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/hypercorn/asgi/wsproto.py", line 91, in handle_asgi_app
    await asgi_instance(self.asgi_receive, self.asgi_send)
  File "/usr/local/lib/python3.6/dist-packages/quart/asgi.py", line 106, in __call__
    await _cancel_tasks(pending)
  File "/usr/local/lib/python3.6/dist-packages/quart/asgi.py", line 189, in _cancel_tasks
    raise task.exception()
  File "/usr/local/lib/python3.6/dist-packages/quart/asgi.py", line 157, in handle_websocket
    'code': 1000,
  File "/usr/local/lib/python3.6/dist-packages/hypercorn/asgi/wsproto.py", line 149, in asgi_send
    await self.asend(CloseConnection(code=int(message["code"])))
  File "/usr/local/lib/python3.6/dist-packages/hypercorn/asyncio/wsproto.py", line 93, in asend
    self.write(self.connection.send(event))
  File "/usr/local/lib/python3.6/dist-packages/wsproto/__init__.py", line 64, in send
    data += self.connection.send(event)
  File "/usr/local/lib/python3.6/dist-packages/wsproto/connection.py", line 101, in send
    "Connection cannot be closed in state %s" % self.state
wsproto.utilities.LocalProtocolError: Connection cannot be closed in state ConnectionState.CLOSED
[2020-01-05 11:34:02,782] Error in ASGI Framework
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/hypercorn/asgi/wsproto.py", line 91, in handle_asgi_app
    await asgi_instance(self.asgi_receive, self.asgi_send)
  File "/usr/local/lib/python3.6/dist-packages/quart/asgi.py", line 106, in __call__
    await _cancel_tasks(pending)
  File "/usr/local/lib/python3.6/dist-packages/quart/asgi.py", line 189, in _cancel_tasks
    raise task.exception()
  File "/usr/local/lib/python3.6/dist-packages/quart/asgi.py", line 157, in handle_websocket
    'code': 1000,
  File "/usr/local/lib/python3.6/dist-packages/hypercorn/asgi/wsproto.py", line 149, in asgi_send
    await self.asend(CloseConnection(code=int(message["code"])))
  File "/usr/local/lib/python3.6/dist-packages/hypercorn/asyncio/wsproto.py", line 93, in asend
    self.write(self.connection.send(event))
  File "/usr/local/lib/python3.6/dist-packages/wsproto/__init__.py", line 64, in send
    data += self.connection.send(event)
  File "/usr/local/lib/python3.6/dist-packages/wsproto/connection.py", line 101, in send
    "Connection cannot be closed in state %s" % self.state
wsproto.utilities.LocalProtocolError: Connection cannot be closed in state ConnectionState.CLOSED
[2020-01-05 11:34:02,991] 192.168.178.21:64160 GET /api/sentences 1.1 200 581 14765
INFO:quart.serving:192.168.178.21:64160 GET /api/sentences 1.1 200 581 14765
INFO:quart.serving:192.168.178.21:64165 GET /api/events/log 1.1 101 - 94782
[2020-01-05 11:34:03,026] 192.168.178.21:64165 GET /api/events/log 1.1 101 - 94782
DEBUG:__main__:Loading phoneme examples from /usr/share/rhasspy/profiles/en/phoneme_examples.txt
[2020-01-05 11:34:03,071] 192.168.178.21:64160 GET /api/profiles 1.1 200 144 67862
INFO:quart.serving:192.168.178.21:64160 GET /api/profiles 1.1 200 144 67862
INFO:quart.serving:192.168.178.21:64166 GET /api/slots 1.1 200 2 55695
[2020-01-05 11:34:03,078] 192.168.178.21:64166 GET /api/slots 1.1 200 2 55695
INFO:quart.serving:192.168.178.21:64167 GET /api/phonemes 1.1 200 1526 56969
[2020-01-05 11:34:03,079] 192.168.178.21:64167 GET /api/phonemes 1.1 200 1526 56969
[2020-01-05 11:34:03,081] 192.168.178.21:64172 GET /api/profile 1.1 200 8697 56679
INFO:quart.serving:192.168.178.21:64172 GET /api/profile 1.1 200 8697 56679
INFO:quart.serving:192.168.178.21:64160 GET /api/profile 1.1 200 6825 28885
[2020-01-05 11:34:03,113] 192.168.178.21:64160 GET /api/profile 1.1 200 6825 28885
INFO:quart.serving:192.168.178.21:64166 GET /api/unknown-words 1.1 200 2 19630
[2020-01-05 11:34:03,116] 192.168.178.21:64166 GET /api/unknown-words 1.1 200 2 19630
INFO:quart.serving:192.168.178.21:64168 GET /api/microphones 1.1 200 1010 117473
[2020-01-05 11:34:03,140] 192.168.178.21:64168 GET /api/microphones 1.1 200 1010 117473
[2020-01-05 11:34:03,142] 192.168.178.21:64169 GET /api/speakers 1.1 200 2 118582
INFO:quart.serving:192.168.178.21:64169 GET /api/speakers 1.1 200 2 118582
[2020-01-05 11:34:03,144] 192.168.178.21:64167 GET /api/problems 1.1 200 271 47056
INFO:quart.serving:192.168.178.21:64167 GET /api/problems 1.1 200 271 47056
[2020-01-05 11:34:03,148] 192.168.178.21:64172 GET /img/favicon.png 1.1 200 802 50172
INFO:quart.serving:192.168.178.21:64172 GET /img/favicon.png 1.1 200 802 50172

#################  recording ###########################

speak: what time is it

INFO:quart.serving:192.168.178.21:64263 POST /api/start-recording 1.1 200 2 10670
[2020-01-05 11:34:17,455] 192.168.178.21:64263 POST /api/start-recording 1.1 200 2 10670
DEBUG:__main__:Recorded 154604 byte(s) of audio data
DEBUG:RemoteDecoder:POSTing 154604 byte(s) of WAV data to http://192.168.178.21:12101/api/speech-to-text
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 192.168.178.21:12101
DEBUG:urllib3.connectionpool:http://192.168.178.21:12101 "POST /api/speech-to-text?profile=en HTTP/1.1" 200 14
DEBUG:__main__:make the on it
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 192.168.178.21:12101
DEBUG:urllib3.connectionpool:http://192.168.178.21:12101 "POST /api/text-to-intent?profile=en&nohass=True HTTP/1.1" 200 582
DEBUG:__main__:{"entities": [{"end": 7, "entity": "state", "raw_end": 7, "raw_start": 5, "raw_value": "on", "start": 5, "value": "on"}, {"end": 25, "entity": "name", "raw_end": 25, "raw_start": 12, "raw_value": "bedroom light", "start": 12, "value": "bedroom light"}], "intent": {"confidence": 0.86, "name": "ChangeLightState"}, "raw_text": "turn on the bedroom light", "raw_tokens": ["turn", "on", "the", "bedroom", "light"], "siteId": "default", "slots": {"state": "on", "name": "bedroom light"}, "speech_confidence": 1, "text": "turn on the bedroom light", "time_sec": 0.11514592170715332, "tokens": ["turn", "on", "the", "bedroom", "light"], "wakeId": ""}
[2020-01-05 11:34:20,244] 192.168.178.21:64263 POST /api/stop-recording 1.1 200 582 1026068
INFO:quart.serving:192.168.178.21:64263 POST /api/stop-recording 1.1 200 582 1026068

speak: tell me the time
[2020-01-05 11:34:35,228] 192.168.178.21:64370 POST /api/start-recording 1.1 200 2 10497
INFO:quart.serving:192.168.178.21:64370 POST /api/start-recording 1.1 200 2 10497
DEBUG:__main__:Recorded 143084 byte(s) of audio data
DEBUG:RemoteDecoder:POSTing 143084 byte(s) of WAV data to http://192.168.178.21:12101/api/speech-to-text
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 192.168.178.21:12101
DEBUG:urllib3.connectionpool:http://192.168.178.21:12101 "POST /api/speech-to-text?profile=en HTTP/1.1" 200 23
DEBUG:__main__:tell room lamp to light
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 192.168.178.21:12101
DEBUG:urllib3.connectionpool:http://192.168.178.21:12101 "POST /api/text-to-intent?profile=en&nohass=True HTTP/1.1" 200 606
DEBUG:__main__:{"entities": [{"end": 7, "entity": "state", "raw_end": 7, "raw_start": 5, "raw_value": "on", "start": 5, "value": "on"}, {"end": 28, "entity": "name", "raw_end": 28, "raw_start": 12, "raw_value": "living room lamp", "start": 12, "value": "living room lamp"}], "intent": {"confidence": 0.67, "name": "ChangeLightState"}, "raw_text": "turn on the living room lamp", "raw_tokens": ["turn", "on", "the", "living", "room", "lamp"], "siteId": "default", "slots": {"state": "on", "name": "living room lamp"}, "speech_confidence": 1, "text": "turn on the living room lamp", "time_sec": 0.1124122142791748, "tokens": ["turn", "on", "the", "living", "room", "lamp"], "wakeId": ""}
[2020-01-05 11:34:37,758] 192.168.178.21:64370 POST /api/stop-recording 1.1 200 606 887518
INFO:quart.serving:192.168.178.21:64370 POST /api/stop-recording 1.1 200 606 887518

speak: whats the temperature
[2020-01-05 11:34:58,226] 192.168.178.21:64526 POST /api/start-recording 1.1 200 2 10570
INFO:quart.serving:192.168.178.21:64526 POST /api/start-recording 1.1 200 2 10570
DEBUG:__main__:Recorded 165164 byte(s) of audio data
DEBUG:RemoteDecoder:POSTing 165164 byte(s) of WAV data to http://192.168.178.21:12101/api/speech-to-text
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 192.168.178.21:12101
DEBUG:urllib3.connectionpool:http://192.168.178.21:12101 "POST /api/speech-to-text?profile=en HTTP/1.1" 200 34
DEBUG:__main__:blue temperature turn on how light
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 192.168.178.21:12101
DEBUG:urllib3.connectionpool:http://192.168.178.21:12101 "POST /api/text-to-intent?profile=en&nohass=True HTTP/1.1" 200 560
DEBUG:__main__:{"entities": [{"end": 7, "entity": "state", "raw_end": 7, "raw_start": 5, "raw_value": "on", "start": 5, "value": "on"}, {"end": 21, "entity": "name", "raw_end": 21, "raw_start": 8, "raw_value": "bedroom light", "start": 8, "value": "bedroom light"}], "intent": {"confidence": 0.86, "name": "ChangeLightState"}, "raw_text": "turn on bedroom light", "raw_tokens": ["turn", "on", "bedroom", "light"], "siteId": "default", "slots": {"state": "on", "name": "bedroom light"}, "speech_confidence": 1, "text": "turn on bedroom light", "time_sec": 0.10992002487182617, "tokens": ["turn", "on", "bedroom", "light"], "wakeId": ""}
[2020-01-05 11:35:01,208] 192.168.178.21:64526 POST /api/stop-recording 1.1 200 560 1126380
INFO:quart.serving:192.168.178.21:64526 POST /api/stop-recording 1.1 200 560 1126380

speak: how cold is it
[2020-01-05 11:35:20,878] 192.168.178.21:64677 POST /api/start-recording 1.1 200 2 10701
INFO:quart.serving:192.168.178.21:64677 POST /api/start-recording 1.1 200 2 10701
DEBUG:__main__:Recorded 154604 byte(s) of audio data
DEBUG:RemoteDecoder:POSTing 154604 byte(s) of WAV data to http://192.168.178.21:12101/api/speech-to-text
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 192.168.178.21:12101
DEBUG:urllib3.connectionpool:http://192.168.178.21:12101 "POST /api/speech-to-text?profile=en HTTP/1.1" 200 24
DEBUG:__main__:room bedroom light to it
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 192.168.178.21:12101
DEBUG:urllib3.connectionpool:http://192.168.178.21:12101 "POST /api/text-to-intent?profile=en&nohass=True HTTP/1.1" 200 283
DEBUG:__main__:{"entities": [], "intent": {"confidence": 0.86, "name": "GetTemperature"}, "raw_text": "how hot is it", "raw_tokens": ["how", "hot", "is", "it"], "siteId": "default", "slots": {}, "speech_confidence": 1, "text": "how hot is it", "time_sec": 0.13121891021728516, "tokens": ["how", "hot", "is", "it"], "wakeId": ""}
[2020-01-05 11:35:23,855] 192.168.178.21:64677 POST /api/stop-recording 1.1 200 283 1212675
INFO:quart.serving:192.168.178.21:64677 POST /api/stop-recording 1.1 200 283 1212675

speak: how hot is it
[2020-01-05 11:35:41,422] 192.168.178.21:64823 POST /api/start-recording 1.1 200 2 10871
INFO:quart.serving:192.168.178.21:64823 POST /api/start-recording 1.1 200 2 10871
DEBUG:__main__:Recorded 165164 byte(s) of audio data
DEBUG:RemoteDecoder:POSTing 165164 byte(s) of WAV data to http://192.168.178.21:12101/api/speech-to-text
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 192.168.178.21:12101
DEBUG:urllib3.connectionpool:http://192.168.178.21:12101 "POST /api/speech-to-text?profile=en HTTP/1.1" 200 11
DEBUG:__main__:turn on set
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 192.168.178.21:12101
DEBUG:urllib3.connectionpool:http://192.168.178.21:12101 "POST /api/text-to-intent?profile=en&nohass=True HTTP/1.1" 200 582
DEBUG:__main__:{"entities": [{"end": 7, "entity": "state", "raw_end": 7, "raw_start": 5, "raw_value": "on", "start": 5, "value": "on"}, {"end": 25, "entity": "name", "raw_end": 25, "raw_start": 12, "raw_value": "bedroom light", "start": 12, "value": "bedroom light"}], "intent": {"confidence": 0.86, "name": "ChangeLightState"}, "raw_text": "turn on the bedroom light", "raw_tokens": ["turn", "on", "the", "bedroom", "light"], "siteId": "default", "slots": {"state": "on", "name": "bedroom light"}, "speech_confidence": 1, "text": "turn on the bedroom light", "time_sec": 0.10613441467285156, "tokens": ["turn", "on", "the", "bedroom", "light"], "wakeId": ""}
[2020-01-05 11:35:44,458] 192.168.178.21:64823 POST /api/stop-recording 1.1 200 582 1175194
INFO:quart.serving:192.168.178.21:64823 POST /api/stop-recording 1.1 200 582 1175194

my profile.json

{
    "intent": {
        "remote": {
            "url": "http://192.168.178.21:12101/api/text-to-intent"
        },
        "system": "remote"
    },
    "microphone": {
        "arecord": {
            "device": "hw:CARD=Device,DEV=0"
        },
        "pyaudio": {
            "device": "2"
        },
        "system": "arecord"
    },
    "sounds": {
        "system": "dummy"
    },
    "speech_to_text": {
        "remote": {
            "url": "http://192.168.178.21:12101/api/speech-to-text"
        },
        "system": "remote"
    },
    "text_to_speech": {
        "system": "dummy"
    },
    "wake": {
        "system": "snowboy"
    }
}
frkos commented 4 years ago

Hi @timaschew

Am I right you run arecord outside of docker? If you have custom asound.conf it may be the reason

Try to run arecord from docker container and check the result

For reference please check the link below. It's for hassio addon but it will work for you too https://community.rhasspy.org/t/hass-io-addon-make-rhasspy-use-custom-asound-conf/106

frkos commented 4 years ago

And for hotword please try https://github.com/synesthesiam/rhasspy/issues/117#issuecomment-569540334

I believe it's hardware or docker misconfuguration...

timaschew commented 4 years ago

Am I right you run arecord outside of docker?

No, I've can run the command in the host and in the docker container. In both cases the wav files is good. This is how I did it:

  1. disabled audio recording in the settings (otherwise there is a "device busy error") and restart rhasspy
  2. copy hash from docker ps and start a bash inside the container: docker exec -it 02d69699c7fd bash
  3. then execute the record command in the profiles directory (file is mounted to the host automatically).

If you have custom asound.conf it may be the reason

This is my /home/pi/.asoundrc file:

pcm.!default {
        type hw
        card 1
}

ctl.!default {
        type hw
        card 1
}

I've copied the file to the container: sudo docker cp /home/pi/.asoundrc 02d69699c7fd:/root/.asoundrc

and restarted rhasspy and nothing has changed.

Changing the snowboy settings did also not help, but I don't expect that this will work before the manual triggered recording is working.

timaschew commented 4 years ago

I've changed the code to use wav instead of raw but it didn't help.

@synesthesiam Any chance to save the recorded wav files to listen them and find out what's wrong with them?

synesthesiam commented 4 years ago

Yes, the plan is to add an intent tests page where you can record examples and use them as unit tests for your assistant.

In the mean time, something could be added to have Rhasspy write recorded commands (WAVs) out to a directory. Would this be helpful?

timaschew commented 4 years ago

Yes, that would be great!

synesthesiam commented 4 years ago

v2.4 and v2.5 both have playback buttons for the last recorded voice command