rhasspy / wyoming-satellite

Remote voice satellite using Wyoming protocol
MIT License
499 stars 72 forks source link

TTS Response is clipped at the beginning #122

Open mdvickst opened 4 months ago

mdvickst commented 4 months ago

I've got Wyoming Satellite running on an Ubuntu VM (Proxmox) with a USB speakerphone connected for mic/speaker and when it plays back the TTS Response the first 1-2 seconds is cutoff. Awake and Done wav sounds work as expected.

Satellite Service:

[Unit]
Description=Wyoming Satellite
After=multi-user.target

[Service]
WorkingDirectory=/home/satellite/wyoming-satellite
ExecStart=/usr/bin/env python3 script/run   --name 'my satellite'   --uri 'tcp://0.0.0.0:10700'   --mic-command 'arecord -r 16000 -c 1 -f S16_LE -t raw'   --snd-command 'aplay -r 22050 -c 1 -f S16_LE -t raw'   --wake-uri 'tcp://127.0.0.1:10400'   --wake-word-name 'hey_jarvis' --done-wav 'awake.wav'
Type=simple
Restart=always
RestartSec=1

[Install]
WantedBy=multi-user.target

Local Wake word service:

[Unit]
Description=Start OpenWakeWord Service
After=multi-user.target

[Service]
WorkingDirectory=/home/satellite/wyoming-openwakeword
ExecStart=/usr/bin/env python3 script/run --uri 'tcp://0.0.0.0:10400' --preload-model 'hey_jarvis' --threshold .99
Type=simple

[Install]
WantedBy=multi-user.target
mdvickst commented 4 months ago

Here's a sample where the response was a simple "done" and nothing was played.

stage: done
run:
  pipeline: 01gznrs9cwqteanxeccwr64hev
  language: en
events:
  - type: run-start
    data:
      pipeline: 01gznrs9cwqteanxeccwr64hev
      language: en
    timestamp: "2024-02-22T13:30:54.760654+00:00"
  - type: stt-start
    data:
      engine: stt.home_assistant_cloud
      metadata:
        language: en-US
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
    timestamp: "2024-02-22T13:30:54.760748+00:00"
  - type: stt-vad-start
    data:
      timestamp: 325
    timestamp: "2024-02-22T13:30:55.459661+00:00"
  - type: stt-vad-end
    data:
      timestamp: 1485
    timestamp: "2024-02-22T13:30:57.765833+00:00"
  - type: stt-end
    data:
      stt_output:
        text: Raise Girls Room shade.
    timestamp: "2024-02-22T13:30:57.926834+00:00"
  - type: intent-start
    data:
      engine: homeassistant
      language: en
      intent_input: Raise Girls Room shade.
      conversation_id: null
      device_id: 42a86d70378853b7a345e4b8bd136800
    timestamp: "2024-02-22T13:30:57.926956+00:00"
  - type: intent-end
    data:
      intent_output:
        response:
          speech:
            plain:
              speech: Opened
              extra_data: null
          card: {}
          language: en
          response_type: action_done
          data:
            targets: []
            success:
              - name: Girls Room Shade
                type: entity
                id: cover.girls_room_shade
            failed: []
        conversation_id: null
    timestamp: "2024-02-22T13:30:57.952652+00:00"
  - type: tts-start
    data:
      engine: tts.home_assistant_cloud
      language: en-GB
      voice: EthanNeural
      tts_input: Opened
    timestamp: "2024-02-22T13:30:57.952700+00:00"
  - type: tts-end
    data:
      tts_output:
        media_id: >-
          media-source://tts/tts.home_assistant_cloud?message=Opened&language=en-GB&voice=EthanNeural&preferred_format=wav&preferred_sample_rate=16000&preferred_sample_channels=1
        url: >-
          /api/tts_proxy/c4f1f5b1d49f90d5437402166829d6b471bf1593_en-gb_35edc9ddc9_tts.home_assistant_cloud.wav
        mime_type: audio/x-wav
    timestamp: "2024-02-22T13:30:57.953188+00:00"
  - type: run-end
    data: null
    timestamp: "2024-02-22T13:30:57.953247+00:00"
stt:
  engine: stt.home_assistant_cloud
  metadata:
    language: en-US
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  done: true
  stt_output:
    text: Raise Girls Room shade.
intent:
  engine: homeassistant
  language: en
  intent_input: Raise Girls Room shade.
  conversation_id: null
  device_id: 42a86d70378853b7a345e4b8bd136800
  done: true
  intent_output:
    response:
      speech:
        plain:
          speech: Opened
          extra_data: null
      card: {}
      language: en
      response_type: action_done
      data:
        targets: []
        success:
          - name: Girls Room Shade
            type: entity
            id: cover.girls_room_shade
        failed: []
    conversation_id: null
tts:
  engine: tts.home_assistant_cloud
  language: en-GB
  voice: EthanNeural
  tts_input: Opened
  done: true
  tts_output:
    media_id: >-
      media-source://tts/tts.home_assistant_cloud?message=Opened&language=en-GB&voice=EthanNeural&preferred_format=wav&preferred_sample_rate=16000&preferred_sample_channels=1
    url: >-
      /api/tts_proxy/c4f1f5b1d49f90d5437402166829d6b471bf1593_en-gb_35edc9ddc9_tts.home_assistant_cloud.wav
    mime_type: audio/x-wav
mdvickst commented 4 months ago

And here is another with a longer response where I just heard "rned off the lights"

stage: done
run:
  pipeline: 01gznrs9cwqteanxeccwr64hev
  language: en
events:
  - type: run-start
    data:
      pipeline: 01gznrs9cwqteanxeccwr64hev
      language: en
    timestamp: "2024-02-22T13:33:42.117326+00:00"
  - type: stt-start
    data:
      engine: stt.home_assistant_cloud
      metadata:
        language: en-US
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
    timestamp: "2024-02-22T13:33:42.117494+00:00"
  - type: stt-vad-start
    data:
      timestamp: 275
    timestamp: "2024-02-22T13:33:42.688250+00:00"
  - type: stt-vad-end
    data:
      timestamp: 1125
    timestamp: "2024-02-22T13:33:44.417176+00:00"
  - type: stt-end
    data:
      stt_output:
        text: Turn off living room lights.
    timestamp: "2024-02-22T13:33:44.591799+00:00"
  - type: intent-start
    data:
      engine: homeassistant
      language: en
      intent_input: Turn off living room lights.
      conversation_id: null
      device_id: 42a86d70378853b7a345e4b8bd136800
    timestamp: "2024-02-22T13:33:44.591861+00:00"
  - type: intent-end
    data:
      intent_output:
        response:
          speech:
            plain:
              speech: Turned off the lights
              extra_data: null
          card: {}
          language: en
          response_type: action_done
          data:
            targets: []
            success:
              - name: Living Room
                type: area
                id: 86726e558f304c699f0015d0f229a901
              - name: Living Room Can Lights Basic
                type: entity
                id: light.living_room_can_lights_basic
              - name: "Living Room Can Lights "
                type: entity
                id: light.living_room_can_lights
            failed: []
        conversation_id: null
    timestamp: "2024-02-22T13:33:44.736401+00:00"
  - type: tts-start
    data:
      engine: cloud
      language: en-GB
      voice: EthanNeural
      tts_input: Turned off the lights
    timestamp: "2024-02-22T13:33:44.736437+00:00"
  - type: tts-end
    data:
      tts_output:
        media_id: >-
          media-source://tts/cloud?message=Turned+off+the+lights&language=en-GB&voice=EthanNeural&preferred_format=wav&preferred_sample_rate=16000&preferred_sample_channels=1
        url: >-
          /api/tts_proxy/85d43b448ab715eae17c0361864a34ff749eb14a_en-gb_35edc9ddc9_cloud.wav
        mime_type: audio/x-wav
    timestamp: "2024-02-22T13:33:44.736757+00:00"
  - type: run-end
    data: null
    timestamp: "2024-02-22T13:33:44.736789+00:00"
stt:
  engine: stt.home_assistant_cloud
  metadata:
    language: en-US
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  done: true
  stt_output:
    text: Turn off living room lights.
intent:
  engine: homeassistant
  language: en
  intent_input: Turn off living room lights.
  conversation_id: null
  device_id: 42a86d70378853b7a345e4b8bd136800
  done: true
  intent_output:
    response:
      speech:
        plain:
          speech: Turned off the lights
          extra_data: null
      card: {}
      language: en
      response_type: action_done
      data:
        targets: []
        success:
          - name: Living Room
            type: area
            id: 86726e558f304c699f0015d0f229a901
          - name: Living Room Can Lights Basic
            type: entity
            id: light.living_room_can_lights_basic
          - name: "Living Room Can Lights "
            type: entity
            id: light.living_room_can_lights
        failed: []
    conversation_id: null
tts:
  engine: cloud
  language: en-GB
  voice: EthanNeural
  tts_input: Turned off the lights
  done: true
  tts_output:
    media_id: >-
      media-source://tts/cloud?message=Turned+off+the+lights&language=en-GB&voice=EthanNeural&preferred_format=wav&preferred_sample_rate=16000&preferred_sample_channels=1
    url: >-
      /api/tts_proxy/85d43b448ab715eae17c0361864a34ff749eb14a_en-gb_35edc9ddc9_cloud.wav
    mime_type: audio/x-wav
khalob commented 4 months ago

Try looking if lowering/toggling off this setting helps you: https://github.com/rhasspy/wyoming-satellite/pull/121

I had a similar issue

motoridersd commented 2 weeks ago

Considering doing this in a Proxmox box. Were you able to resolve the issue? Has it been working well for you?

regnighc commented 1 week ago

I'm also having this issue, and the suggestion at #121 didnt resolve it for me.