synesthesiam / rhasspy

Rhasspy voice assistant for offline home automation
https://rhasspy.readthedocs.io
MIT License
944 stars 101 forks source link

Microphone settings are not working #141

Closed timaschew closed 4 years ago

timaschew commented 4 years ago

First of all, thank you for this great project!

I have this soundcard and this microphone. I was using snowbow, google and wit.ai last year but now stopped due to privacy concers. And now I want to switch to Rhasspy :). The microphone is attached on my raspberry 3B.

I spent 3 days to figure out that the german profile leads to a performance issues (high memory and CPU) in both docker and virtual-env setup (the raspberry becomes unusable after some minutes and then crashes). Somehow it doesn't happen wit the english profile, but that's another story (and issue which I will create later).

I followed the tutorial for the client/server setup.

However the microphone automatic selection doesn't work. For neither PyAudio nor ASLA. When I try to record something (triggered via the webinterface) the recording in the logs looks like this:

Choosing PyAudio leads to this error:

ERROR:DialogueManager:get_microphones
Traceback (most recent call last):
  File "/usr/share/rhasspy/rhasspy/dialogue.py", line 754, in handle_forward
    mics = recorder_class.get_microphones()
  File "/usr/share/rhasspy/rhasspy/audio_recorder.py", line 261, in get_microphones
    default_name = audio.get_default_input_device_info().get("name")
  File "/usr/local/lib/python3.6/dist-packages/pyaudio.py", line 949, in get_default_input_device_info
    device_index = pa.get_default_input_device()
OSError: No Default Input Device Available
[2020-01-04 23:10:21,557] 192.168.178.21:53797 GET /api/microphones 1.1 200 2 29239
INFO:quart.serving:192.168.178.21:53797 GET /api/microphones 1.1 200 2 29239

ERROR:DialogueManager:on_receive
Traceback (most recent call last):
  File "/usr/share/rhasspy/rhasspy/actor.py", line 169, in on_receive
    self._state_method(message, sender)
  File "/usr/share/rhasspy/rhasspy/dialogue.py", line 395, in in_asleep
    self.handle_any(message, sender)
  File "/usr/share/rhasspy/rhasspy/dialogue.py", line 733, in handle_any
    self.handle_forward(message, sender)
  File "/usr/share/rhasspy/rhasspy/dialogue.py", line 771, in handle_forward
    test_mics = recorder_class.test_microphones(chunk_size)  # type: ignore
  File "/usr/share/rhasspy/rhasspy/audio_recorder.py", line 285, in test_microphones
    default_name = audio.get_default_input_device_info().get("name")
  File "/usr/local/lib/python3.6/dist-packages/pyaudio.py", line 949, in get_default_input_device_info
    device_index = pa.get_default_input_device()
OSError: No Default Input Device Available

BTW: when running rhasspy via virtualenv it even showed me one device in PyAudio which could be autoselected via the Test button, but recording was still not possible. There was also a (error) suffix after clicking on Test. When doing a recording afterwards the logs showing this:

ERROR:PyAudioRecorder:to_recording
Traceback (most recent call last):
  File "/home/pi/rhasspy/rhasspy/audio_recorder.py", line 170, in to_recording
    frames_per_buffer=self.frames_per_buffer,
  File "/home/pi/rhasspy/.venv/lib/python3.6/site-packages/pyaudio.py", line 750, in open
    stream = Stream(self, *args, **kwargs)
  File "/home/pi/rhasspy/.venv/lib/python3.6/site-packages/pyaudio.py", line 441, in __init__
    self._stream = pa.open(**arguments)
OSError: [Errno -9997] Invalid sample rate
DEBUG:PyAudioRecorder:recording -> started

Screen Shot 2020-01-05 at 00 11 41

Using ALSA gives this feedback:

Screen Shot 2020-01-05 at 00 11 48
[2020-01-04 23:11:22,581] 192.168.178.21:54172 GET / 1.1 200 1029 20816
INFO:quart.serving:192.168.178.21:54172 GET / 1.1 200 1029 20816
ERROR:__main__:api_events_log
Traceback (most recent call last):
  File "app.py", line 1123, in api_events_log
    text = await q.get()
  File "/usr/lib/python3.6/asyncio/queues.py", line 167, in get
    yield from getter
concurrent.futures._base.CancelledError
[2020-01-04 23:11:22,617] Error in ASGI Framework
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/hypercorn/asgi/wsproto.py", line 91, in handle_asgi_app
    await asgi_instance(self.asgi_receive, self.asgi_send)
  File "/usr/local/lib/python3.6/dist-packages/quart/asgi.py", line 106, in __call__
    await _cancel_tasks(pending)
  File "/usr/local/lib/python3.6/dist-packages/quart/asgi.py", line 189, in _cancel_tasks
    raise task.exception()
  File "/usr/local/lib/python3.6/dist-packages/quart/asgi.py", line 157, in handle_websocket
    'code': 1000,
  File "/usr/local/lib/python3.6/dist-packages/hypercorn/asgi/wsproto.py", line 149, in asgi_send
    await self.asend(CloseConnection(code=int(message["code"])))
  File "/usr/local/lib/python3.6/dist-packages/hypercorn/asyncio/wsproto.py", line 93, in asend
    self.write(self.connection.send(event))
  File "/usr/local/lib/python3.6/dist-packages/wsproto/__init__.py", line 64, in send
    data += self.connection.send(event)
  File "/usr/local/lib/python3.6/dist-packages/wsproto/connection.py", line 101, in send
    "Connection cannot be closed in state %s" % self.state
wsproto.utilities.LocalProtocolError: Connection cannot be closed in state ConnectionState.CLOSED
ERROR:quart.serving:Error in ASGI Framework
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/hypercorn/asgi/wsproto.py", line 91, in handle_asgi_app
    await asgi_instance(self.asgi_receive, self.asgi_send)
  File "/usr/local/lib/python3.6/dist-packages/quart/asgi.py", line 106, in __call__
    await _cancel_tasks(pending)
  File "/usr/local/lib/python3.6/dist-packages/quart/asgi.py", line 189, in _cancel_tasks
    raise task.exception()
  File "/usr/local/lib/python3.6/dist-packages/quart/asgi.py", line 157, in handle_websocket
    'code': 1000,
  File "/usr/local/lib/python3.6/dist-packages/hypercorn/asgi/wsproto.py", line 149, in asgi_send
    await self.asend(CloseConnection(code=int(message["code"])))
  File "/usr/local/lib/python3.6/dist-packages/hypercorn/asyncio/wsproto.py", line 93, in asend
    self.write(self.connection.send(event))
  File "/usr/local/lib/python3.6/dist-packages/wsproto/__init__.py", line 64, in send
    data += self.connection.send(event)
  File "/usr/local/lib/python3.6/dist-packages/wsproto/connection.py", line 101, in send
    "Connection cannot be closed in state %s" % self.state
wsproto.utilities.LocalProtocolError: Connection cannot be closed in state ConnectionState.CLOSED
INFO:quart.serving:192.168.178.21:54172 GET /api/sentences 1.1 200 581 11891
[2020-01-04 23:11:22,883] 192.168.178.21:54172 GET /api/sentences 1.1 200 581 11891
[2020-01-04 23:11:22,910] 192.168.178.21:54181 GET /api/events/log 1.1 101 - 95379
INFO:quart.serving:192.168.178.21:54181 GET /api/events/log 1.1 101 - 95379
DEBUG:__main__:Loading phoneme examples from /usr/share/rhasspy/profiles/en/phoneme_examples.txt
[2020-01-04 23:11:22,988] 192.168.178.21:54182 GET /api/slots 1.1 200 2 86413
INFO:quart.serving:192.168.178.21:54182 GET /api/slots 1.1 200 2 86413
[2020-01-04 23:11:22,991] 192.168.178.21:54185 GET /api/phonemes 1.1 200 1526 89388
INFO:quart.serving:192.168.178.21:54185 GET /api/phonemes 1.1 200 1526 89388
[2020-01-04 23:11:23,014] 192.168.178.21:54172 GET /api/profiles 1.1 200 144 71627
INFO:quart.serving:192.168.178.21:54172 GET /api/profiles 1.1 200 144 71627
INFO:quart.serving:192.168.178.21:54188 GET /api/profile 1.1 200 8697 55089
[2020-01-04 23:11:23,030] 192.168.178.21:54188 GET /api/profile 1.1 200 8697 55089
[2020-01-04 23:11:23,087] 192.168.178.21:54182 GET /api/profile 1.1 200 6825 62603
INFO:quart.serving:192.168.178.21:54182 GET /api/profile 1.1 200 6825 62603
[2020-01-04 23:11:23,110] 192.168.178.21:54185 GET /api/unknown-words 1.1 200 2 84565
INFO:quart.serving:192.168.178.21:54185 GET /api/unknown-words 1.1 200 2 84565
[2020-01-04 23:11:23,137] 192.168.178.21:54186 GET /api/microphones 1.1 200 1010 233182
INFO:quart.serving:192.168.178.21:54186 GET /api/microphones 1.1 200 1010 233182
[2020-01-04 23:11:23,143] 192.168.178.21:54187 GET /api/speakers 1.1 200 2 237203
INFO:quart.serving:192.168.178.21:54187 GET /api/speakers 1.1 200 2 237203
[2020-01-04 23:11:23,155] 192.168.178.21:54172 GET /api/problems 1.1 200 271 84954
INFO:quart.serving:192.168.178.21:54172 GET /api/problems 1.1 200 271 84954
[2020-01-04 23:11:23,181] 192.168.178.21:54188 GET /img/favicon.png 1.1 200 802 99545
INFO:quart.serving:192.168.178.21:54188 GET /img/favicon.png 1.1 200 802 99545
ALSA lib pcm_dsnoop.c:618:(snd_pcm_dsnoop_open) unable to open slave
arecord: main:788: audio open error: Device or resource busy
ALSA lib pcm_dsnoop.c:618:(snd_pcm_dsnoop_open) unable to open slave
arecord: main:788: audio open error: Device or resource busy
arecord: main:788: audio open error: Device or resource busy
arecord: main:788: audio open error: Device or resource busy
arecord: main:788: audio open error: Device or resource busy
arecord: main:788: audio open error: Device or resource busy
arecord: main:788: audio open error: Device or resource busy
arecord: main:788: audio open error: Device or resource busy
arecord: main:788: audio open error: Device or resource busy
arecord: main:788: audio open error: Device or resource busy
ALSA lib pcm_dmix.c:990:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
arecord: main:788: audio open error: Invalid argument
ALSA lib pcm_dsnoop.c:618:(snd_pcm_dsnoop_open) unable to open slave
arecord: main:788: audio open error: Device or resource busy
arecord: main:788: audio open error: Device or resource busy
[2020-01-04 23:11:47,355] 192.168.178.21:54339 GET /api/test-microphones 1.1 200 1164 570993
INFO:quart.serving:192.168.178.21:54339 GET /api/test-microphones 1.1 200 1164 570993

But when trying to record it doesn't record anything and I even cannot stop the recording via the webinterface. The logs is showing just this, even when I click on Tap to stop recording.

[2020-01-04 23:25:45,678] 192.168.178.21:59590 POST /api/start-recording 1.1 200 2 8592
INFO:quart.serving:192.168.178.21:59590 POST /api/start-recording 1.1 200 2 8592

I know that the correct device is hw:CARD=Device,DEV=0 even if rhasspy shows (no sound) after clicking on Test. So I've just selected it and finally recording via rhasspy webinterface started to work, but:

  1. the hotword (snowboy) doesn't work even if I set the sensitivity to 1.
  2. it never recognize my correct speech, instead recognize always something with the light in the bedroom (but I've spoken what time is it, tell me the time, whats the temperature)

I saw this in the rhasspy logs:

DEBUG:ARecordAudioRecorder:['arecord', '-q', '-r', '16000', '-f', 'S16_LE', '-c', '1', '-t', 'raw', '-D', 'hw:CARD=Device,DEV=0']

When running this command in the host or in docker: arecord -D hw:1,0 -d 5 -f cd test.wav -c 1 the file is fine and I was even able to upload it via the rhasspy webinterface and all my 4 different examples were recognized correctly. I almost gave it up with rhasspy but this proofs it's really working.

But how can I fix it now?

Oh noo, I've just realized even with the english profile the raspberry becomes unusable after 2-4 hours. CPU is at 100% it shows me this on an attache monitor, but I cannot ssh or even do a ping.

timaschew commented 4 years ago

~~Just restarted the raspberry. Two minutes after boot, the docker consumed the whole memory.
I guess the high CPU is because it starts using the swap file.~~

// edit the problem was a wrong device for the microphone!

synesthesiam commented 4 years ago

Can you post your profiles? I'm not seeing this behavior at all. Thanks.

timaschew commented 4 years ago

Sure.

{
    "intent": {
        "remote": {
            "url": "http://192.168.178.21:12101/api/text-to-intent"
        },
        "system": "remote"
    },
    "microphone": {
        "arecord": {
            "device": "hw:CARD=Device,DEV=0"
        },
        "pyaudio": {
            "device": "2"
        },
        "system": "arecord"
    },
    "sounds": {
        "system": "dummy"
    },
    "speech_to_text": {
        "remote": {
            "url": "http://192.168.178.21:12101/api/speech-to-text"
        },
        "system": "remote"
    },
    "text_to_speech": {
        "system": "dummy"
    },
    "wake": {
        "system": "snowboy"
    }
}

Maybe I know when there is a performance issue. I left rhasspy in the config of having the wrong device for the microphone: this was in the settings:

 "microphone": {
        "arecord": {
            "device": "null"
        },
        "pyaudio": {
            "device": "2"
        },
        "system": "arecord"
    },

Now I switched to the proper soundcard and the memory usage is almost constant at 38%.

synesthesiam commented 4 years ago

Ok, thank you. This must be related to the microphone recording a bunch of fake data very quickly when the device is wrong. I protect against empty data being returned, but maybe I need to rate limit it or something.

timaschew commented 4 years ago

Okay, yes so it's my fault, sorry for the confusion.

Then let's go back to the original error. I've just successfully recorded a wav in the docker container with arecord -D hw:1,0 -d 5 -f cd test.wav -c 1 and I can play it and it's correct.

but and then I did with this command: arecord -q -r 16000 -f S16_LE -c 1 -t raw -D hw:CARD=Device,DEV=0 test.wav -d 5and the result is this wav file which seems to be broken:

https://send.firefox.com/download/8ab4527d05737f61/#vLTPKDoKFgr4QmgQVYP_8A

I think it would be nice if the profile allows to control the whole record command (including all the args).

synesthesiam commented 4 years ago

The -t raw is what's making it seem broken. Wav files have a header normally, but Rhasspy already knows the format when recording. Change this to -t wav on the command line.

timaschew commented 4 years ago

You're right! This works: arecord -r 16000 -f S16_LE -c 1 -t wav -d 3 -D hw:1,0 r16-fs16-twav.wav

I'm getting a warning that 16000 is not supported only 44100, but it still works.

Alright, so how this can be solved? Do you have a plan?

timaschew commented 4 years ago

Ah, I think you've missed my point that the recognition never worked via the microphone directly via rhasspy. It always recognize a totally different sentence. But when I'm uploading the wave files to the webinterface it works perfectly.

timaschew commented 4 years ago

Actually this time (I think I've tried it before with an untrained model) the recognized intent is an empty string. I tried it four times with different sentences but the response is always:

DEBUG:__main__:{"entities": [], "intent": {"confidence": 0, "name": ""}, "siteId": "default", "slots": {}, "speech_confidence": 1, "text": "", "time_sec": 0.0024182796478271484, "wakeId": ""}

// edit I've disabled microphone, that's the reason why the intent is empty. Still getting wrong sentences.

This issue got a bit messy, so I've created a new one: #144