rhasspy / rhasspy3

An open source voice assistant toolkit for many human languages
MIT License
311 stars 26 forks source link

satellite configuration #26

Open efschu opened 1 year ago

efschu commented 1 year ago

I guess I miss something.

On the server I have configured following:

cmd:
/root/rhasspy3/script/run /root/rhasspy3/bin/pipeline_run.py --loop --debug
config:

pipelines:
  default:
    mic:
      name: arecord
    wake:
      name: porcupine1
    vad:
      name: silero
    asr:
      name: faster-whisper.client
    handle:
      name: home_assistant
cmd:
/root/rhasspy3/script/run /root/rhasspy3/bin/server_run.py asr faster-whisper
config:

servers:
    faster-whisper:
      command: |
        script/server --language ${language} --device ${device} "${model}"
      template_args:
        language: "de"
        model: "${data_dir}/large-v2"
        device: "cuda"  # cpu or cuda

starting http_server with:

/root/rhasspy3/script/http_server --debug

On the satelite:

cmd:
/root/rhasspy3/script/run /root/rhasspy3/bin/satellite_run.py
config:

satellites:
  default:
    mic:
      name: arecord
    wake:
      name: porcupine1
    remote:
      name: websocket
    snd:
      name: aplay

  remote:
    websocket:
      command: |
        script/run "${uri}"
      template_args:
        uri: "ws://192.168.0.109:13331/pipeline/asr-tts"

"local" processing working fine

but satellite does not - debugoutput:

DEBUG:rhasspy3.core:Loading config from /root/rhasspy3/rhasspy3/configuration.yaml
DEBUG:rhasspy3.core:Skipping /root/rhasspy3/config/configuration.yaml
DEBUG:rhasspy3.program:mic_adapter_raw.py ['--samples-per-chunk', '1024', '--rate', '16000', '--width', '2', '--channels', '1', 'arecord -q -D pulse -r 16000 -c 1 -f S16_LE -t raw -']
DEBUG:rhasspy3.program:.venv/bin/python3 ['bin/porcupine_stream.py', '--model', '/root/rhasspy3/config/data/wake/porcupine1/resources/keyword_files_de/linux/ananas_linux.ppn', '--lang_model', '/root/rhasspy3/config/data/wake/porcupine1/lib/common/porcupine_params_de.pv']
DEBUG:rhasspy3.wake:detect: processing audio
DEBUG:rhasspy3.wake:detect: Detection(name='ananas_linux', timestamp=88896097256797)
DEBUG:rhasspy3.program:script/run ['ws://192.168.0.109:13331/pipeline/asr-tts']

After that nothing happes - have to kill the process.

Any ideas?

pipsen commented 1 year ago

Oh dear.. saw this issue too late, I created the same one 20 minutes ago, I have the same problem. Did you solve it meanwhile?

efschu commented 1 year ago

No, still hoping a contributor get some free time for us πŸ˜‰

pipsen commented 1 year ago

Are you german by accident? Another question: Does your server also have an own microphone, or is it a pure websocket server without audio devices?

efschu commented 1 year ago

Yes I am. Sry for wrong grammar. ;)

No, I use pulseaudio TCP server on a raspi with ps3 eyecam and connect it to my cuda container on my "big" server.

pipsen commented 1 year ago

Can we interconnect somewhere directly?

efschu commented 1 year ago

Can we interconnect somewhere directly?

githubconnection@fukaru.com

efschu commented 1 year ago

... Did you solve it meanwhile?

Instead of solving it, I run another "full" instance for now. Thnx to my P40 having enough VRAM to run this multiple times πŸ˜…

Shulyaka commented 1 year ago

Hi! Please try this fix: https://github.com/rhasspy/rhasspy3/pull/30.

Could be the same problem as mine.

ethereal-engineer commented 1 year ago

Same issue here. Doesn't seem to be fixed by #30.

Satellite output after I have said the wake word and begun to talk:

tk421➜  rhasspy3-satellite : master ✘ :✭ ᐅ script/run bin/satellite_run.py --debug
DEBUG:rhasspy3.core:Loading config from /home/doc/rhasspy3-satellite/rhasspy3/configuration.yaml
DEBUG:rhasspy3.core:Loading config from /home/doc/rhasspy3-satellite/config/configuration.yaml
DEBUG:rhasspy3.program:mic_adapter_raw.py ['--samples-per-chunk', '1024', '--rate', '16000', '--width', '2', '--channels', '1', 'arecord -q -r 16000 -c 1 -f S16_LE -t raw -']
DEBUG:rhasspy3.program:.venv/bin/python3 ['bin/porcupine_stream.py', '--model', 'computer_linux.ppn']
DEBUG:rhasspy3.wake:detect: processing audio
DEBUG:rhasspy3.wake:detect: Detection(name='computer_linux', timestamp=495552149037)
DEBUG:rhasspy3.program:script/run ['ws://localhost:13331/pipeline/asr-tts?pipeline=voidline']

Server output of the same:

tk421➜  rhasspy3 : master ✘ :✭ ᐅ script/http_server --debug --server asr faster-whisper --server tts piper
DEBUG:rhasspy3.core:Loading config from /home/doc/rhasspy3/rhasspy3/configuration.yaml
DEBUG:rhasspy3.core:Loading config from /home/doc/rhasspy3/config/configuration.yaml
DEBUG:rhasspy:['server_run.py', '--config', '/home/doc/rhasspy3/config', 'asr', 'faster-whisper']
INFO:rhasspy:Starting asr faster-whisper
DEBUG:rhasspy:['server_run.py', '--config', '/home/doc/rhasspy3/config', 'tts', 'piper']
DEBUG:asyncio:Using selector: EpollSelector
INFO:rhasspy:Starting tts piper
[2023-08-28 06:58:48 +1000] [5533] [INFO] Running on http://0.0.0.0:13331 (CTRL + C to quit)
INFO:hypercorn.error:Running on http://0.0.0.0:13331 (CTRL + C to quit)
INFO:piper_server:Ready
Load time: 0.226994 sec
Output directory: "/tmp/tmp0t8q910n"
INFO:faster_whisper_server:Ready
DEBUG:rhasspy3.program:client_unix_socket.py ['var/run/faster-whisper.socket']
DEBUG:rhasspy3.program:vad_adapter_raw.py ['--rate', '16000', '--width', '2', '--channels', '1', '--samples-per-chunk', '512', 'script/speech_prob "share/silero_vad.onnx"']
DEBUG:rhasspy3_http_api.pipeline:stream-to-stream: voice started

So it would seem that silero never returns with voice ended

ethereal-engineer commented 1 year ago

Server configuration:

servers:
  tts:
    piper:
      template_args:
        model: "${data_dir}/en-us-libritts-high.onnx"

pipelines:

  voidline:
    inherit: default
    wake:
      name: porcupine1
      template_args:
        model: "grasshopper_linux.ppn"
    tts:
      name: piper.client
    asr:
      name: faster-whisper.client

Satellite configuration:

programs:
  mic:
    arecord:
      command: |
        arecord -q -r 16000 -c 1 -f S16_LE -t raw -
      adapter: |
        mic_adapter_raw.py --samples-per-chunk 1024 --rate 16000 --width 2 --channels 1

  wake:
    porcupine1:
      command: |
        .venv/bin/python3 bin/porcupine_stream.py --model "${model}"
      template_args:
        model: "computer_linux.ppn"

  remote:
    websocket:
      command: |
        script/run "${uri}"
      template_args:
        uri: "ws://localhost:13331/pipeline/asr-tts?pipeline=voidline"
#        uri: "ws://localhost:13331/pipeline/asr-tts"

satellites:
  default:
    mic:
      name: arecord
    wake:
      name: porcupine1
    remote:
      name: websocket
    snd:
      name: aplay
ethereal-engineer commented 1 year ago

I am slowly setting up Python debugging and learning Python to fix this issue as I have available time and brainpower. It would be nice for someone who knows the code to get to it first but I'm not sitting around. Besides, I wanted to eventually learn Python anyway...

paddsen commented 11 months ago

I am also getting this behaviour. No fix worked so far. Switched from silero to webrtcvad with no effect. Really looks like no "end of voice" from VAD.