tetele / onju-voice-satellite

An ESPHome config for the Onju Voice which makes it a Home Assistant voice satellite
MIT License
90 stars 15 forks source link

No audio playback #34

Closed ther3zz closed 5 months ago

ther3zz commented 5 months ago

Flavor

OpenWakeWord or no wake word

Checklist

Describe the issue

No audio plays back either via the voice assistant or from the media player

Reproduction steps

  1. Use wake word
  2. say anything
  3. see error in logs ... or ...
  4. Go into media in home assistant
  5. Select onju device to playback on
  6. Click play for the media file

Debug logs

[17:01:50][D][voice_assistant:523]: Event Type: 3
[17:01:50][D][voice_assistant:537]: STT started
[17:01:50][D][light:036]: 'top_led' Setting:
[17:01:50][D][light:051]:   Brightness: 100%
[17:01:50][D][light:059]:   Red: 100%, Green: 100%, Blue: 100%
[17:01:50][D][light:109]:   Effect: 'listening'
[17:01:51][D][voice_assistant:523]: Event Type: 11
[17:01:51][D][voice_assistant:677]: Starting STT by VAD
[17:01:52][D][voice_assistant:523]: Event Type: 12
[17:01:52][D][voice_assistant:681]: STT by VAD end
[17:01:52][D][voice_assistant:416]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[17:01:52][D][voice_assistant:422]: Desired state set to AWAITING_RESPONSE
[17:01:52][D][voice_assistant:416]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[17:01:52][D][light:036]: 'top_led' Setting:
[17:01:52][D][light:051]:   Brightness: 70%
[17:01:52][D][light:059]:   Red: 0%, Green: 20%, Blue: 100%
[17:01:52][D][light:109]:   Effect: 'processing'
[17:01:52][D][voice_assistant:416]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[17:01:54][D][voice_assistant:523]: Event Type: 4
[17:01:54][D][voice_assistant:551]: Speech recognised as: " What time is it?"
[17:01:54][D][voice_assistant:523]: Event Type: 5
[17:01:54][D][voice_assistant:556]: Intent started
[17:01:54][D][voice_assistant:523]: Event Type: 6
[17:01:54][D][voice_assistant:523]: Event Type: 7
[17:01:54][D][voice_assistant:579]: Response: "Sorry, I couldn't understand that"
[17:01:54][D][voice_assistant:523]: Event Type: 8
[17:01:54][D][voice_assistant:599]: Response URL: "https://homeassistant.my.domain/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_7238ee98e6_marytts.mp3"
[17:01:54][D][voice_assistant:416]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[17:01:54][D][voice_assistant:422]: Desired state set to STREAMING_RESPONSE
[17:01:54][D][media_player:059]: 'Office Onju 2a44d8' - Setting
[17:01:54][D][media_player:066]:   Media URL: https://homeassistant.my.domain/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_7238ee98e6_marytts.mp3
[17:01:54][D][media_player:059]: 'Office Onju 2a44d8' - Setting
[17:01:54][D][media_player:066]:   Media URL: https://homeassistant.my.domain/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_7238ee98e6_marytts.mp3
[17:01:54][D][light:036]: 'top_led' Setting:
[17:01:54][D][light:059]:   Red: 20%, Green: 100%, Blue: 0%
[17:01:54][D][light:109]:   Effect: 'speaking'
[17:01:54][D][voice_assistant:523]: Event Type: 2
[17:01:54][D][voice_assistant:613]: Assist Pipeline ended
[17:01:54][W][component:232]: Component i2s_audio.media_player took a long time for an operation (522 ms).
[17:01:54][W][component:233]: Components should block for at most 30 ms.
[17:01:55][W][component:232]: Component i2s_audio.media_player took a long time for an operation (504 ms).
[17:01:55][W][component:233]: Components should block for at most 30 ms.
[17:01:55][D][light:036]: 'top_led' Setting:
[17:01:55][D][light:051]:   Brightness: 60%
[17:01:55][D][light:059]:   Red: 100%, Green: 0%, Blue: 100%
[17:01:55][D][light:109]:   Effect: 'listening_ww'
[17:01:56][D][voice_assistant:416]: State changed from STREAMING_RESPONSE to IDLE
[17:01:56][D][voice_assistant:422]: Desired state set to IDLE
[17:01:56][D][voice_assistant:416]: State changed from IDLE to START_PIPELINE
[17:01:56][D][voice_assistant:422]: Desired state set to START_MICROPHONE
[17:01:56][D][voice_assistant:118]: microphone not running
[17:01:56][D][voice_assistant:202]: Requesting start...
[17:01:56][D][voice_assistant:416]: State changed from START_PIPELINE to STARTING_PIPELINE
tetele commented 5 months ago

3. see error in logs

What error? I don't see any error in the logs.

Also, please provide the full logs. Power cycle your device, reproduce the issue and then copy and paste the ESPHome logs from the device boot up to that point.

ther3zz commented 5 months ago

Here are the full logs:

INFO Successful handshake with office-onju-2a44d8 @ 192.168.2.   in 0.058s
[17:08:48][I][app:102]: ESPHome version 2024.3.2 compiled on Apr 15 2024, 16:49:05
[17:08:48][I][app:104]: Project tetele.onju_voice_satellite version 1.0.0
[17:08:48][C][wifi:580]: WiFi:
[17:08:48][C][wifi:408]:   Local MAC: DC
[17:08:48][C][wifi:413]:   SSID: [redacted]
[17:08:48][C][wifi:416]:   IP Address: 192.168.2.
[17:08:48][C][wifi:420]:   BSSID: [redacted]
[17:08:48][C][wifi:421]:   Hostname: 'office-onju-2a44d8'
[17:08:48][C][wifi:423]:   Signal strength: -48 dB ▂▄▆█
[17:08:48][C][wifi:427]:   Channel: 11
[17:08:48][C][wifi:428]:   Subnet: 255.255.255.0
[17:08:48][C][wifi:429]:   Gateway: 192.168.2.1
[17:08:48][C][wifi:430]:   DNS1: 0.0.0.0
[17:08:48][C][wifi:431]:   DNS2: 0.0.0.0
[17:08:48][C][logger:166]: Logger:
[17:08:48][C][logger:167]:   Level: DEBUG
[17:08:48][C][logger:169]:   Log Baud Rate: 115200
[17:08:48][C][logger:170]:   Hardware UART: USB_CDC
[17:08:48][C][template.number:050]: Template Number 'Touch threshold percentage'
[17:08:48][C][template.number:051]:   Optimistic: YES
[17:08:48][C][template.number:052]:   Update Interval: never
[17:08:48][C][esp32_rmt_led_strip:175]: ESP32 RMT LED Strip:
[17:08:48][C][esp32_rmt_led_strip:176]:   Pin: 11
[17:08:48][C][esp32_rmt_led_strip:177]:   Channel: 0
[17:08:48][C][esp32_rmt_led_strip:202]:   RGB Order: GRB
[17:08:48][C][esp32_rmt_led_strip:203]:   Max refresh rate: 0
[17:08:48][C][esp32_rmt_led_strip:204]:   Number of LEDs: 6
[17:08:48][C][gpio.binary_sensor:015]: GPIO Binary Sensor 'Disable wake word'
[17:08:48][C][gpio.binary_sensor:016]:   Pin: GPIO38
[17:08:49][C][light:103]: Light 'leds'
[17:08:49][C][light:105]:   Default Transition Length: 0.0s
[17:08:49][C][light:106]:   Gamma Correct: 2.80
[17:08:49][C][light:103]: Light 'left_led'
[17:08:49][C][light:105]:   Default Transition Length: 0.1s
[17:08:49][C][light:106]:   Gamma Correct: 2.80
[17:08:49][C][light:103]: Light 'top_led'
[17:08:49][C][light:105]:   Default Transition Length: 0.1s
[17:08:49][C][light:106]:   Gamma Correct: 2.80
[17:08:49][C][light:103]: Light 'right_led'
[17:08:49][C][light:105]:   Default Transition Length: 0.1s
[17:08:49][C][light:106]:   Gamma Correct: 2.80
[17:08:49][C][template.switch:068]: Template Switch 'Use Wake Word'
[17:08:49][C][template.switch:091]:   Restore Mode: restore defaults to ON
[17:08:49][C][template.switch:057]:   Optimistic: YES
[17:08:49][C][esp32_touch:073]: Config for ESP32 Touch Hub:
[17:08:49][C][esp32_touch:074]:   Meas cycle: 0.80ms
[17:08:49][C][esp32_touch:075]:   Sleep cycle: 2.00ms
[17:08:49][C][esp32_touch:095]:   Low Voltage Reference: 0.8V
[17:08:49][C][esp32_touch:115]:   High Voltage Reference: 2.4V
[17:08:49][C][esp32_touch:135]:   Voltage Attenuation: 0V
[17:08:49][C][esp32_touch:169]:   Filter mode: IIR_16
[17:08:49][C][esp32_touch:170]:   Debounce count: 2
[17:08:49][C][esp32_touch:171]:   Noise threshold coefficient: 0
[17:08:49][C][esp32_touch:172]:   Jitter filter step size: 0
[17:08:49][C][esp32_touch:191]:   Smooth level: IIR_2
[17:08:49][C][esp32_touch:213]:   Denoise grade: BIT8
[17:08:49][C][esp32_touch:245]:   Denoise capacitance level: L0
[17:08:49][C][esp32_touch:260]:   Touch Pad 'volume_down'
[17:08:49][C][esp32_touch:261]:     Pad: T4
[17:08:49][C][esp32_touch:262]:     Threshold: 529174
[17:08:49][C][esp32_touch:260]:   Touch Pad 'volume_up'
[17:08:49][C][esp32_touch:261]:     Pad: T2
[17:08:49][C][esp32_touch:262]:     Threshold: 501357
[17:08:49][C][esp32_touch:260]:   Touch Pad 'action'
[17:08:49][C][esp32_touch:261]:     Pad: T3
[17:08:49][C][esp32_touch:262]:     Threshold: 690433
[17:08:49][C][captive_portal:088]: Captive Portal:
[17:08:49][C][mdns:115]: mDNS:
[17:08:49][C][mdns:116]:   Hostname: office-onju-2a44d8
[17:08:49][C][ota:096]: Over-The-Air Updates:
[17:08:49][C][ota:097]:   Address: 192.168.2.   :3232
[17:08:49][C][ota:100]:   Using Password.
[17:08:49][C][ota:103]:   OTA version: 2.
[17:08:49][C][api:139]: API Server:
[17:08:49][C][api:140]:   Address: 192.168.2.176:6053
[17:08:49][C][api:142]:   Using noise encryption: YES
[17:08:49][C][improv_serial:032]: Improv Serial:
[17:08:49][C][audio:203]: Audio:
[17:08:49][C][audio:225]:   External DAC channels: 1
[17:08:49][C][audio:226]:   I2S DOUT Pin: 12
[17:08:49][C][audio:227]:   Mute Pin: GPIO21
[17:08:49][D][voice_assistant:523]: Event Type: 0
[17:08:49][D][voice_assistant:523]: Event Type: 2
[17:08:49][D][voice_assistant:613]: Assist Pipeline ended
[17:08:49][D][voice_assistant:416]: State changed from STREAMING_MICROPHONE to IDLE
[17:08:49][D][voice_assistant:422]: Desired state set to IDLE
[17:08:49][D][voice_assistant:416]: State changed from IDLE to START_PIPELINE
[17:08:49][D][voice_assistant:422]: Desired state set to START_MICROPHONE
[17:08:49][D][voice_assistant:202]: Requesting start...
[17:08:49][D][voice_assistant:416]: State changed from START_PIPELINE to STARTING_PIPELINE
[17:08:49][D][voice_assistant:437]: Client started, streaming microphone
[17:08:49][D][voice_assistant:416]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[17:08:49][D][voice_assistant:422]: Desired state set to STREAMING_MICROPHONE
[17:08:49][D][voice_assistant:523]: Event Type: 1
[17:08:49][D][voice_assistant:526]: Assist Pipeline running
[17:08:49][D][voice_assistant:523]: Event Type: 9
[17:08:49][D][light:036]: 'top_led' Setting:
[17:08:49][D][light:051]:   Brightness: 60%
[17:08:49][D][light:059]:   Red: 100%, Green: 0%, Blue: 100%
[17:08:49][D][light:085]:   Transition length: 0.1s
[17:08:51][D][voice_assistant:523]: Event Type: 10
[17:08:51][D][voice_assistant:532]: Wake word detected
[17:08:51][D][voice_assistant:523]: Event Type: 3
[17:08:51][D][voice_assistant:537]: STT started
[17:08:51][D][light:036]: 'top_led' Setting:
[17:08:51][D][light:051]:   Brightness: 100%
[17:08:51][D][light:059]:   Red: 100%, Green: 100%, Blue: 100%
[17:08:51][D][light:109]:   Effect: 'listening'
[17:08:53][D][voice_assistant:523]: Event Type: 11
[17:08:53][D][voice_assistant:677]: Starting STT by VAD
[17:08:53][D][voice_assistant:523]: Event Type: 12
[17:08:53][D][voice_assistant:681]: STT by VAD end
[17:08:53][D][voice_assistant:416]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[17:08:53][D][voice_assistant:422]: Desired state set to AWAITING_RESPONSE
[17:08:53][D][voice_assistant:416]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[17:08:53][D][light:036]: 'top_led' Setting:
[17:08:53][D][light:051]:   Brightness: 70%
[17:08:53][D][light:059]:   Red: 0%, Green: 20%, Blue: 100%
[17:08:53][D][light:109]:   Effect: 'processing'
[17:08:53][D][voice_assistant:416]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[17:08:55][D][voice_assistant:523]: Event Type: 4
[17:08:55][D][voice_assistant:551]: Speech recognised as: " What time is it?"
[17:08:55][D][voice_assistant:523]: Event Type: 5
[17:08:55][D][voice_assistant:556]: Intent started
[17:08:55][D][voice_assistant:523]: Event Type: 6
[17:08:55][D][voice_assistant:523]: Event Type: 7
[17:08:55][D][voice_assistant:579]: Response: "Sorry, I couldn't understand that"
[17:08:55][D][voice_assistant:523]: Event Type: 8
[17:08:55][D][voice_assistant:599]: Response URL: "https://homeassistant.my.domain/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_7238ee98e6_marytts.mp3"
[17:08:55][D][voice_assistant:416]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[17:08:55][D][voice_assistant:422]: Desired state set to STREAMING_RESPONSE
[17:08:55][D][media_player:059]: 'Office Onju 2a44d8' - Setting
[17:08:55][D][media_player:066]:   Media URL: https://homeassistant.my.domain/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_7238ee98e6_marytts.mp3
[17:08:55][D][media_player:059]: 'Office Onju 2a44d8' - Setting
[17:08:55][D][media_player:066]:   Media URL: https://homeassistant.my.domain/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_7238ee98e6_marytts.mp3
[17:08:55][D][light:036]: 'top_led' Setting:
[17:08:55][D][light:059]:   Red: 20%, Green: 100%, Blue: 0%
[17:08:55][D][light:109]:   Effect: 'speaking'
[17:08:55][D][voice_assistant:523]: Event Type: 2
[17:08:55][D][voice_assistant:613]: Assist Pipeline ended
[17:08:56][W][component:232]: Component i2s_audio.media_player took a long time for an operation (521 ms).
[17:08:56][W][component:233]: Components should block for at most 30 ms.
[17:08:56][W][component:232]: Component i2s_audio.media_player took a long time for an operation (504 ms).
[17:08:56][W][component:233]: Components should block for at most 30 ms.
[17:08:56][D][light:036]: 'top_led' Setting:
[17:08:56][D][light:051]:   Brightness: 60%
[17:08:56][D][light:059]:   Red: 100%, Green: 0%, Blue: 100%
[17:08:56][D][light:109]:   Effect: 'listening_ww'
[17:08:58][D][voice_assistant:416]: State changed from STREAMING_RESPONSE to IDLE
[17:08:58][D][voice_assistant:422]: Desired state set to IDLE
[17:08:58][D][voice_assistant:416]: State changed from IDLE to START_PIPELINE
[17:08:58][D][voice_assistant:422]: Desired state set to START_MICROPHONE
[17:08:58][D][voice_assistant:118]: microphone not running
[17:08:58][D][voice_assistant:202]: Requesting start...
[17:08:58][D][voice_assistant:416]: State changed from START_PIPELINE to STARTING_PIPELINE
[17:08:58][D][voice_assistant:118]: microphone not running
[17:08:58][D][voice_assistant:437]: Client started, streaming microphone
[17:08:58][D][voice_assistant:416]: State changed from STARTING_PIPELINE to START_MICROPHONE
[17:08:58][D][voice_assistant:422]: Desired state set to STREAMING_MICROPHONE
[17:08:58][D][voice_assistant:155]: Starting Microphone
[17:08:58][D][voice_assistant:416]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[17:08:58][D][voice_assistant:523]: Event Type: 1
[17:08:58][D][voice_assistant:526]: Assist Pipeline running
[17:08:58][D][voice_assistant:416]: State changed from STARTING_MICROPHONE to STREAMING_MICROPHONE
[17:08:58][D][voice_assistant:523]: Event Type: 9

The specific error i see popping up is:

    [17:08:56][W][component:232]: Component i2s_audio.media_player took a long time for an operation (521 ms).
    [17:08:56][W][component:233]: Components should block for at most 30 ms.
    [17:08:56][W][component:232]: Component i2s_audio.media_player took a long time for an operation (504 ms).
    [17:08:56][W][component:233]: Components should block for at most 30 ms.

It's just after the media URLs are printed in logs

tetele commented 5 months ago

The specific error i see popping up is:

That's a common warning (not error), and I don't think it's the culprit.

Just making sure: are you certain you've properly inserted the speaker connector back into the PCB?

ther3zz commented 5 months ago

Gotcha, that was the only thing that stood out to me in the logs...

Yeah I actually just opened it back up to confirm... It's plugged in correctly

ther3zz commented 5 months ago

Here's the config I tried

substitutions:
  name: "office-onju"
  friendly_name: "Office Onju"
  project_version: "1.0.0"
  device_description: "Onju Voice Satellite with ESPHome software and microWakeWord"

esphome:
  name: "${name}"
  friendly_name: "{$friendly_name}"
  comment: "${device_description}"
  name_add_mac_suffix: true
  project:
    name: tetele.onju_voice_satellite
    version: "${project_version}"
  min_version: 2024.3.0
  platformio_options:
    board_build.flash_mode: dio
    build_flags: "-DBOARD_HAS_PSRAM"
    board_build.arduino.memory_type: qio_opi
  on_boot:
    then:
      - light.turn_on:
          id: top_led
          effect: slow_pulse
          red: 100%
          green: 60%
          blue: 0%
      - wait_until:
          condition:
            wifi.connected:
      - light.turn_on:
          id: top_led
          effect: pulse
          red: 0%
          green: 100%
          blue: 0%
      - wait_until:
          condition:
            api.connected:
      - light.turn_on:
          id: top_led
          effect: none
          red: 0%
          green: 100%
          blue: 0%
      - delay: 1s
      - script.execute: reset_led

dashboard_import:
  package_import_url: github://tetele/onju-voice-satellite/esphome/onju-voice-microwakeword.yaml@main

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: esp-idf

psram:
  mode: octal
  speed: 80MHz

# Enable logging
logger:

# Allow OTA updates
ota:
  password: "some password"

# Allow provisioning Wi-Fi via serial
improv_serial:

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  manual_ip:
    static_ip: 192.168.1.   
    gateway: 192.168.2.1
    subnet: 255.255.255.0
  ap:
    ssid: "Office-Onju Fallback Hotspot"
    password: "some password"

# In combination with the `ap` this allows the user
# to provision wifi credentials to the device via WiFi AP.
captive_portal:

api:
  encryption:
    key: "some key"
  services:
    - service: start_va
      then:
        - voice_assistant.start
    - service: stop_va
      then:
        - voice_assistant.stop
    - service: notification_on
      then:
        - script.execute: turn_on_notification
    - service: notification_clear
      then:
        - script.execute: clear_notification

globals:
  - id: thresh_percent
    type: float
    initial_value: "0.03"
    restore_value: false
  - id: touch_calibration_values_left
    type: uint32_t[5]
    restore_value: false
  - id: touch_calibration_values_center
    type: uint32_t[5]
    restore_value: false
  - id: touch_calibration_values_right
    type: uint32_t[5]
    restore_value: false
  - id: notification
    type: bool
    restore_value: false

interval:
  - interval: 1s
    then:
      - script.execute:
          id: calibrate_touch
          button: 0
      - script.execute:
          id: calibrate_touch
          button: 1
      - script.execute:
          id: calibrate_touch
          button: 2

i2s_audio:
  - i2s_lrclk_pin: GPIO13
    i2s_bclk_pin: GPIO18

micro_wake_word:
  #model: okay_nabu
  model: hey_jarvis
  # model: alexa
  on_wake_word_detected:
    - voice_assistant.start:
        wake_word: !lambda return wake_word;

speaker:
  - platform: i2s_audio
    id: onju_out
    dac_type: external
    i2s_dout_pin: GPIO12

microphone:
  - platform: i2s_audio
    id: onju_microphone
    i2s_din_pin: GPIO17
    adc_type: external
    pdm: false

voice_assistant:
  id: va
  microphone: onju_microphone
  speaker: onju_out
  use_wake_word: false
  on_listening:
    - light.turn_on:
        id: top_led
        blue: 100%
        red: 100%
        green: 100%
        brightness: 100%
        effect: listening
  on_stt_vad_end:
    - light.turn_on:
        id: top_led
        blue: 100%
        red: 0%
        green: 20%
        brightness: 70%
        effect: processing
  on_tts_end:
    - light.turn_on:
        id: top_led
        blue: 0%
        red: 20%
        green: 100%
        effect: speaking
  on_end:
    - delay: 500ms
    - wait_until:
        not:
          speaker.is_playing: onju_out
    - script.execute: reset_led
    - if:
        condition:
          and:
            - switch.is_on: use_wake_word
            - binary_sensor.is_off: mute_switch
        then:
          - delay: 200ms
          - micro_wake_word.start
  on_client_connected:
    - if:
        condition:
          and:
            - switch.is_on: use_wake_word
            - binary_sensor.is_off: mute_switch
        then:
          - micro_wake_word.start:
  on_client_disconnected:
    - if:
        condition:
          and:
            - switch.is_on: use_wake_word
            - binary_sensor.is_off: mute_switch
        then:
          - voice_assistant.stop:
          - micro_wake_word.stop:
  on_error:
    - light.turn_on:
        id: top_led
        blue: 0%
        red: 100%
        green: 0%
        effect: none
    - delay: 1s
    - script.execute: reset_led

number:
  - platform: template
    name: "Touch threshold percentage"
    id: touch_threshold_percentage
    update_interval: never
    entity_category: config
    initial_value: 0.75
    min_value: 0.25
    max_value: 5
    step: 0.05
    optimistic: true
    on_value:
      then:
        - lambda: !lambda |-
            id(thresh_percent) = 0.01 * x;

esp32_touch:
  setup_mode: false
  sleep_duration: 2ms
  measurement_duration: 800us
  low_voltage_reference: 0.8V
  high_voltage_reference: 2.4V

  filter_mode: IIR_16
  debounce_count: 2
  noise_threshold: 0
  jitter_step: 0
  smooth_mode: IIR_2

  denoise_grade: BIT8
  denoise_cap_level: L0

binary_sensor:
  - platform: esp32_touch
    id: volume_down
    pin: GPIO4
    threshold: 539000

  - platform: esp32_touch
    id: volume_up
    pin: GPIO2
    threshold: 580000

  - platform: esp32_touch
    id: action
    pin: GPIO3
    threshold: 751000
    on_click:
      - if:
          condition:
            or:
              - switch.is_off: use_wake_word
              - binary_sensor.is_on: mute_switch
          then:
            - logger.log:
                tag: "action_click"
                format: "Voice assistant is running: %s"
                args: ['id(va).is_running() ? "yes" : "no"']
            - if:
                condition: speaker.is_playing
                then:
                  - speaker.stop
            - if:
                condition: voice_assistant.is_running
                then:
                  - voice_assistant.stop:
                else:
                  - voice_assistant.start:
          else:
            - logger.log:
                tag: "action_click"
                format: "Voice assistant was running with wake word detection enabled. Starting continuously"
            - if:
                condition: speaker.is_playing
                then:
                  - speaker.stop
            - voice_assistant.stop
            - delay: 1s
            - script.execute: reset_led
            - script.wait: reset_led
            - voice_assistant.start_continuous:

  - platform: gpio
    id: mute_switch
    pin:
      number: GPIO38
      mode: INPUT_PULLUP
    name: Disable wake word
    on_press:
      - script.execute: turn_off_wake_word
    on_release:
      - script.execute: turn_on_wake_word

light:
  - platform: esp32_rmt_led_strip
    id: leds
    pin: GPIO11
    chipset: SK6812
    num_leds: 6
    rgb_order: grb
    rmt_channel: 0
    default_transition_length: 0s
    gamma_correct: 2.8
  - platform: partition
    id: left_led
    segments:
      - id: leds
        from: 0
        to: 0
    default_transition_length: 100ms
  - platform: partition
    id: top_led
    segments:
      - id: leds
        from: 1
        to: 4
    default_transition_length: 100ms
    effects:
      - pulse:
          name: pulse
          transition_length: 250ms
          update_interval: 250ms
      - pulse:
          name: slow_pulse
          transition_length: 1s
          update_interval: 2s
      - addressable_twinkle:
          name: listening_ww
          twinkle_probability: 1%
      - addressable_twinkle:
          name: listening
          twinkle_probability: 45%
      - addressable_scan:
          name: processing
          move_interval: 80ms
      - addressable_flicker:
          name: speaking
          intensity: 35%
  - platform: partition
    id: right_led
    segments:
      - id: leds
        from: 5
        to: 5
    default_transition_length: 100ms

script:
  - id: reset_led
    then:
      - if:
          condition:
            - lambda: return id(notification);
          then:
            - light.turn_on:
                id: top_led
                blue: 100%
                red: 100%
                green: 0%
                brightness: 100%
                effect: slow_pulse
          else:
            - if:
                condition:
                  and:
                    - switch.is_on: use_wake_word
                    - binary_sensor.is_off: mute_switch
                then:
                  - light.turn_on:
                      id: top_led
                      blue: 100%
                      red: 100%
                      green: 0%
                      brightness: 60%
                      effect: listening_ww
                else:
                  - light.turn_off: top_led

  - id: turn_on_notification
    then:
      - lambda: id(notification) = true;
      - script.execute: reset_led

  - id: clear_notification
    then:
      - lambda: id(notification) = false;
      - script.execute: reset_led

  - id: turn_on_wake_word
    then:
      - if:
          condition:
            and:
              - binary_sensor.is_off: mute_switch
              - switch.is_on: use_wake_word
          then:
            - micro_wake_word.start
            - if:
                condition:
                  speaker.is_playing:
                then:
                  - speaker.stop:
            - script.execute: reset_led
          else:
            - logger.log:
                tag: "turn_on_wake_word"
                format: "Trying to start listening for wake word, but %s"
                args:
                  [
                    'id(mute_switch).state ? "mute switch is on" : "use wake word toggle is off"',
                  ]
                level: "INFO"

  - id: turn_off_wake_word
    then:
      - micro_wake_word.stop
      - script.execute: reset_led

  - id: calibrate_touch
    parameters:
      button: int
    then:
      - lambda: |-
          static uint8_t thresh_indices[3] = {0, 0, 0};
          static uint32_t sums[3] = {0, 0, 0};
          static uint8_t qsizes[3] = {0, 0, 0};
          static uint16_t consecutive_anomalies_per_button[3] = {0, 0, 0};

          uint32_t newval;
          uint32_t* calibration_values;
          switch(button) {
            case 0:
              newval = id(volume_down).get_value();
              calibration_values = id(touch_calibration_values_left);
              break;
            case 1:
              newval = id(action).get_value();
              calibration_values = id(touch_calibration_values_center);
              break;
            case 2:
              newval = id(volume_up).get_value();
              calibration_values = id(touch_calibration_values_right);
              break;
            default:
              ESP_LOGE("touch_calibration", "Invalid button ID (%d)", button);
              return;
          }

          if(newval == 0) return;

          //ESP_LOGD("touch_calibration", "[%d] qsize %d, sum %d, thresh_index %d, consecutive_anomalies %d", button, qsizes[button], sums[button], thresh_indices[button], consecutive_anomalies_per_button[button]);
          //ESP_LOGD("touch_calibration", "[%d] New value is %d", button, newval);

          if(qsizes[button] == 5) {
            float avg = float(sums[button])/float(qsizes[button]);
            if((fabs(float(newval)-avg)/avg) > id(thresh_percent)) {
              consecutive_anomalies_per_button[button]++;
              //ESP_LOGD("touch_calibration", "[%d] %d anomalies detected.", button, consecutive_anomalies_per_button[button]);
              if(consecutive_anomalies_per_button[button] < 10)
                return;
            } 
          }

          //ESP_LOGD("touch_calibration", "[%d] Resetting consecutive anomalies counter.", button);
          consecutive_anomalies_per_button[button] = 0;

          if(qsizes[button] == 5) {
            //ESP_LOGD("touch_calibration", "[%d] Queue full, removing %d.", button, id(touch_calibration_values)[thresh_indices[button]]);
            sums[button] -= (uint32_t) *(calibration_values+thresh_indices[button]);// id(touch_calibration_values)[thresh_indices[button]];
            qsizes[button]--;
          }
          *(calibration_values+thresh_indices[button]) = newval;
          sums[button] += newval;
          qsizes[button]++;
          thresh_indices[button] = (thresh_indices[button] + 1) % 5;

          //ESP_LOGD("touch_calibration", "[%d] Average value is %d", button, sums[button]/qsizes[button]);
          uint32_t newthresh = uint32_t((sums[button]/qsizes[button]) * (1.0 + id(thresh_percent)));
          //ESP_LOGD("touch_calibration", "[%d] Setting threshold %d", button, newthresh);

          switch(button) {
            case 0:
              id(volume_down).set_threshold(newthresh);
              break;
            case 1:
              id(action).set_threshold(newthresh);
              break;
            case 2:
              id(volume_up).set_threshold(newthresh);
              break;
            default:
              ESP_LOGE("touch_calibration", "Invalid button ID (%d)", button);
              return;
          }

switch:
  - platform: template
    name: Use Wake Word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    on_turn_on:
      - script.execute: turn_on_wake_word
    on_turn_off:
      - script.execute: turn_off_wake_word
  - platform: gpio
    id: dac_mute
    restore_mode: ALWAYS_OFF
    pin:
      number: GPIO21
      inverted: True

And here's the current config I'm using:

packages:
    esphome.voice-assistant: github://tetele/onju-voice-satellite/esphome/onju-voice.yaml@main

esphome:
  name: office-onju
  friendly_name: Office Onju

#micro_wake_word:
#    model: hey_jarvis

#esp32:
#  board: esp32-s3-devkitc-1
#  framework:
#    type: arduino

# Enable logging
logger:

# Enable Home Assistant API
api:
  encryption:
    key: "some key"

ota:
  password: "some password"

wifi:
  ssid: "some wifi"
  password: "some password"

  #ssid: !secret wifi_ssid
  #password: !secret wifi_password
  #manual_ip:
  #  static_ip: 192.168.2.   
  #  gateway: 192.168.2.1
  #  subnet: 255.255.255.0
  manual_ip:
    static_ip: 192.168.1.   
    gateway: 192.168.1.1
    subnet: 255.255.255.0

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Office-Onju Fallback Hotspot"
    password: "some password"

captive_portal:
cowboyrushforth commented 5 months ago

Hi,

Just got my boards in the mail and am having the same experience.

Have tried both the micro wake word branch and the normal.

With the micro wake word branch, no matter what I do will it actually pick up the wake word, only works on "touch".

Hoping the non-wake-word version would be more stable, I installed that. It does pick up the "ok nabu" wake word using the pipeline, and occasionally sound does come out.

When the TTS works I see:

[20:24:54][D][voice_assistant:579]: Response: "20:24 Mountain Time, sir."
[20:24:54][D][voice_assistant:523]: Event Type: 8
[20:24:54][D][voice_assistant:599]: Response URL: "http://10.19.15.100:8123/api/tts_proxy/23a1fb0f92188d42f4f8333babbd21b0fa62e1ce_en-gb_add2e9951e_tts.piper.mp3"
[20:24:54][D][voice_assistant:416]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[20:24:54][D][voice_assistant:422]: Desired state set to STREAMING_RESPONSE
[20:24:54][D][media_player:059]: 'onju-office' - Setting
[20:24:54][D][media_player:066]:   Media URL: http://10.19.15.100:8123/api/tts_proxy/23a1fb0f92188d42f4f8333babbd21b0fa62e1ce_en-gb_add2e9951e_tts.piper.mp3
[20:24:54][D][media_player:059]: 'onju-office' - Setting
[20:24:54][D][media_player:066]:   Media URL: http://10.19.15.100:8123/api/tts_proxy/23a1fb0f92188d42f4f8333babbd21b0fa62e1ce_en-gb_add2e9951e_tts.piper.mp3
[20:24:54][D][light:036]: 'top_led' Setting:
[20:24:54][D][light:059]:   Red: 20%, Green: 100%, Blue: 0%
[20:24:54][D][light:109]:   Effect: 'speaking'
[20:24:54][D][voice_assistant:523]: Event Type: 2
[20:24:54][D][voice_assistant:613]: Assist Pipeline ended
[20:24:54][D][light:036]: 'top_led' Setting:
[20:24:54][D][light:109]:   Effect: 'show_volume'
[20:24:54][W][component:232]: Component i2s_audio.media_player took a long time for an operation (549 ms).
[20:24:54][W][component:233]: Components should block for at most 30 ms.
[20:24:54][W][component:232]: Component i2s_audio.media_player took a long time for an operation (66 ms).
[20:24:54][W][component:233]: Components should block for at most 30 ms.
[20:24:55][W][component:232]: Component i2s_audio.media_player took a long time for an operation (56 ms).
[20:24:55][W][component:233]: Components should block for at most 30 ms.
[20:24:55][W][component:232]: Component i2s_audio.media_player took a long time for an operation (57 ms).
[20:24:55][W][component:233]: Components should block for at most 30 ms.
[20:24:55][W][component:232]: Component i2s_audio.media_player took a long time for an operation (56 ms).
[20:24:55][W][component:233]: Components should block for at most 30 ms.

when it doesnt work i see:

[20:25:18][D][voice_assistant:579]: Response: "I'm quite well, thank you for asking! How can I assist you today?"
[20:25:18][D][voice_assistant:523]: Event Type: 8
[20:25:18][D][voice_assistant:599]: Response URL: "http://10.19.15.100:8123/api/tts_proxy/86d0bfd2e659d831fd2a25f256e719186a9ad4a0_en-gb_add2e9951e_tts.piper.mp3"
[20:25:18][D][voice_assistant:416]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[20:25:18][D][voice_assistant:422]: Desired state set to STREAMING_RESPONSE
[20:25:18][D][media_player:059]: 'onju-office' - Setting
[20:25:18][D][media_player:066]:   Media URL: http://10.19.15.100:8123/api/tts_proxy/86d0bfd2e659d831fd2a25f256e719186a9ad4a0_en-gb_add2e9951e_tts.piper.mp3
[20:25:18][D][media_player:059]: 'onju-office' - Setting
[20:25:18][D][media_player:066]:   Media URL: http://10.19.15.100:8123/api/tts_proxy/86d0bfd2e659d831fd2a25f256e719186a9ad4a0_en-gb_add2e9951e_tts.piper.mp3
[20:25:18][D][light:036]: 'top_led' Setting:
[20:25:18][D][light:059]:   Red: 20%, Green: 100%, Blue: 0%
[20:25:18][D][light:109]:   Effect: 'speaking'
[20:25:18][D][voice_assistant:523]: Event Type: 2
[20:25:18][D][voice_assistant:613]: Assist Pipeline ended
[20:25:19][W][component:232]: Component i2s_audio.media_player took a long time for an operation (533 ms).
[20:25:19][W][component:233]: Components should block for at most 30 ms.
[20:25:19][W][component:232]: Component i2s_audio.media_player took a long time for an operation (66 ms).
[20:25:19][W][component:233]: Components should block for at most 30 ms.
[20:25:19][W][component:232]: Component i2s_audio.media_player took a long time for an operation (57 ms).
[20:25:19][W][component:233]: Components should block for at most 30 ms.
[20:25:19][W][component:232]: Component i2s_audio.media_player took a long time for an operation (57 ms).
[20:25:19][W][component:233]: Components should block for at most 30 ms.
[20:25:20][W][component:232]: Component i2s_audio.media_player took a long time for an operation (57 ms).
[20:25:20][W][component:233]: Components should block for at most 30 ms.
[20:25:20][W][component:232]: Component i2s_audio.media_player took a long time for an operation (57 ms).
[20:25:20][W][component:233]: Components should block for at most 30 ms.
[20:25:20][W][component:232]: Component i2s_audio.media_player took a long time for an operation (57 ms).
[20:25:20][W][component:233]: Components should block for at most 30 ms.
[20:25:21][W][component:232]: Component i2s_audio.media_player took a long time for an operation (56 ms).
[20:25:21][W][component:233]: Components should block for at most 30 ms.
[20:25:21][W][component:232]: Component i2s_audio.media_player took a long time for an operation (57 ms).
[20:25:21][W][component:233]: Components should block for at most 30 ms.
[20:25:21][W][component:232]: Component i2s_audio.media_player took a long time for an operation (57 ms).
[20:25:21][W][component:233]: Components should block for at most 30 ms.
[20:25:21][W][component:232]: Component i2s_audio.media_player took a long time for an operation (57 ms).
[20:25:21][W][component:233]: Components should block for at most 30 ms.
[20:25:22][W][component:232]: Component i2s_audio.media_player took a long time for an operation (57 ms).
[20:25:22][W][component:233]: Components should block for at most 30 ms.
[20:25:22][W][component:232]: Component i2s_audio.media_player took a long time for an operation (57 ms).
[20:25:22][W][component:233]: Components should block for at most 30 ms.
[20:25:22][W][component:232]: Component i2s_audio.media_player took a long time for an operation (57 ms).
[20:25:22][W][component:233]: Components should block for at most 30 ms.
[20:25:23][W][component:232]: Component i2s_audio.media_player took a long time for an operation (57 ms).
[20:25:23][W][component:233]: Components should block for at most 30 ms.
[20:25:23][W][component:232]: Component i2s_audio.media_player took a long time for an operation (58 ms).
[20:25:23][W][component:233]: Components should block for at most 30 ms.
[20:25:24][W][component:232]: Component i2s_audio.media_player took a long time for an operation (511 ms).
[20:25:24][W][component:233]: Components should block for at most 30 ms.
[20:25:24][W][component:232]: Component i2s_audio.media_player took a long time for an operation (509 ms).
[20:25:24][W][component:233]: Components should block for at most 30 ms.

In looking at yaml file (non-wakeword one), this is the only suspicious thing obvious. Being that I can get it to work sometimes, I am confident that the speaker works and is

cowboyrushforth commented 5 months ago

No smoking gun, but for what its worth, I edited the same yaml, until it was just a media_player, no touch, no LEDs, no voice assistant, etc. Once it was in this state, I could finally stream a wav file to it without error.

Will try to add yaml back to it piece by piece until I figure out what is going on that it does not like. But for what its worth I had even taken the voice_assistant functionality out, and had the media_player there, with all of the perhiperals, and it still refused to play media.

tetele commented 5 months ago

@ther3zz why did you select the "OperWakeWord" flavor when you added the issue if you're using the microWakeWord config? The difference between the 2 is quite significant.

Also, how are you playing anything else apart from voice responses with the microWakeWord config? That version does not expose a media_player. Are you sure you're using the config you pasted?

@cowboyrushforth thanks for debugging this! I'm not sure it's the same issue, as you're also reporting the wake word not working. How confident are you in your network setup and the fact that your satellite has good WiFi signal?

cowboyrushforth commented 5 months ago

@tetele yes, i believe its the same issue. lets focus on the non-micro wake word config:

  1. yes, it picks up the wake word, but not consistently, but it does.
  2. i am very sure the networking and wifi is working, and working well with low latency, etc. sitting 4 feet from the router, and have a lot of experience with esphome and similar devices.

later I will try to provide more details, as well as also trying on additional onju boards and try to isolate the issue down further. I see in the past there was non-wake word versions that dont have media_player but "speaker", so will probably experiment with that too.

ther3zz commented 5 months ago

@ther3zz why did you select the "OperWakeWord" flavor when you added the issue if you're using the microWakeWord config? The difference between the 2 is quite significant.

Also, how are you playing anything else apart from voice responses with the microWakeWord config? That version does not expose a media_player. Are you sure you're using the config you pasted?

@cowboyrushforth thanks for debugging this! I'm not sure it's the same issue, as you're also reporting the wake word not working. How confident are you in your network setup and the fact that your satellite has good WiFi signal?

Apologies, I should have been clearer... I've tried both micro wake word and open wake word configs. I started with the micro wake word config and ran into the same issues described by @cowboyrushforth but since it was listed as beta, I switched to the open wake word config.

I'm also seeing the same issues with the open wake word config that @cowboyrushforth mentioned (wake word not picked up, no audio).

In regards to the wake word issue, it'll work one time and then stop. Basically I either have to toggle the physical mute switch or toggle the "use wake word" switch in home assistant to get it to start listening again.

During this test I was using the wake word and it wasnt being recognized until i toggled the physical mute switch (I dont really see anything that stands out there, logs just dont show the wake word triggered):

[08:26:56][D][light:036]: 'top_led' Setting:
[08:26:56][D][light:051]:   Brightness: 60%
[08:26:56][D][light:059]:   Red: 100%, Green: 0%, Blue: 100%
[08:26:56][D][light:109]:   Effect: 'listening_ww'
[08:26:57][D][voice_assistant:416]: State changed from STREAMING_RESPONSE to IDLE
[08:26:57][D][voice_assistant:422]: Desired state set to IDLE
[08:26:57][D][voice_assistant:416]: State changed from IDLE to START_PIPELINE
[08:26:57][D][voice_assistant:422]: Desired state set to START_MICROPHONE
[08:26:57][D][voice_assistant:118]: microphone not running
[08:26:57][D][voice_assistant:202]: Requesting start...
[08:26:57][D][voice_assistant:416]: State changed from START_PIPELINE to STARTING_PIPELINE
[08:26:57][D][voice_assistant:118]: microphone not running
[08:26:57][D][voice_assistant:437]: Client started, streaming microphone
[08:26:57][D][voice_assistant:416]: State changed from STARTING_PIPELINE to START_MICROPHONE
[08:26:57][D][voice_assistant:422]: Desired state set to STREAMING_MICROPHONE
[08:26:57][D][voice_assistant:155]: Starting Microphone
[08:26:57][D][voice_assistant:416]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[08:26:57][D][voice_assistant:523]: Event Type: 1
[08:26:57][D][voice_assistant:526]: Assist Pipeline running
[08:26:57][D][voice_assistant:416]: State changed from STARTING_MICROPHONE to STREAMING_MICROPHONE
[08:26:57][D][voice_assistant:523]: Event Type: 9
[08:30:18][I][ota:117]: Boot seems successful, resetting boot loop counter.
[08:30:18][D][esp32.preferences:114]: Saving 1 preferences to flash...
[08:30:18][D][esp32.preferences:143]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed
[08:30:45][D][binary_sensor:036]: 'Disable wake word': Sending state ON
[08:30:45][D][voice_assistant:516]: Signaling stop...
[08:30:45][D][voice_assistant:416]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[08:30:45][D][voice_assistant:422]: Desired state set to IDLE
[08:30:45][D][light:036]: 'top_led' Setting:
[08:30:45][D][light:047]:   State: OFF
[08:30:45][D][light:085]:   Transition length: 0.1s
[08:30:45][D][light:091]:   Effect: 'None'
[08:30:45][D][voice_assistant:523]: Event Type: 0
[08:30:45][E][voice_assistant:653]: Error: no_wake_word - No wake word detected
[08:30:45][D][voice_assistant:516]: Signaling stop...
[08:30:45][D][voice_assistant:416]: State changed from STOP_MICROPHONE to STOP_MICROPHONE
[08:30:45][D][voice_assistant:422]: Desired state set to IDLE
[08:30:45][D][voice_assistant:416]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[08:30:45][D][light:036]: 'top_led' Setting:
[08:30:45][D][light:047]:   State: ON
[08:30:45][D][light:059]:   Red: 100%, Green: 0%, Blue: 0%
[08:30:45][D][light:085]:   Transition length: 0.1s
[08:30:45][D][voice_assistant:523]: Event Type: 2
[08:30:45][D][voice_assistant:613]: Assist Pipeline ended
[08:30:45][D][voice_assistant:416]: State changed from STOPPING_MICROPHONE to IDLE
[08:30:45][D][light:036]: 'top_led' Setting:
[08:30:45][D][light:047]:   State: OFF
[08:30:45][D][light:085]:   Transition length: 0.1s
[08:30:46][D][light:036]: 'top_led' Setting:
[08:30:46][D][light:085]:   Transition length: 0.1s
[08:30:50][D][binary_sensor:036]: 'Disable wake word': Sending state OFF
[08:30:50][D][voice_assistant:416]: State changed from IDLE to START_PIPELINE
[08:30:50][D][voice_assistant:422]: Desired state set to START_MICROPHONE
[08:30:50][D][light:036]: 'top_led' Setting:
[08:30:50][D][light:047]:   State: ON
[08:30:50][D][light:051]:   Brightness: 60%
[08:30:50][D][light:059]:   Red: 100%, Green: 0%, Blue: 100%
[08:30:50][D][light:109]:   Effect: 'listening_ww'
[08:30:50][D][voice_assistant:118]: microphone not running
[08:30:50][D][voice_assistant:202]: Requesting start...
[08:30:50][D][voice_assistant:416]: State changed from START_PIPELINE to STARTING_PIPELINE
[08:30:50][D][voice_assistant:437]: Client started, streaming microphone
[08:30:50][D][voice_assistant:416]: State changed from STARTING_PIPELINE to START_MICROPHONE
[08:30:50][D][voice_assistant:422]: Desired state set to STREAMING_MICROPHONE
[08:30:50][D][voice_assistant:155]: Starting Microphone
[08:30:50][D][voice_assistant:416]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[08:30:50][D][voice_assistant:523]: Event Type: 1
[08:30:50][D][voice_assistant:526]: Assist Pipeline running
[08:30:50][D][voice_assistant:416]: State changed from STARTING_MICROPHONE to STREAMING_MICROPHONE
[08:30:50][D][voice_assistant:523]: Event Type: 9
[08:30:53][D][voice_assistant:523]: Event Type: 10
[08:30:53][D][voice_assistant:532]: Wake word detected
[08:30:53][D][voice_assistant:523]: Event Type: 3
[08:30:53][D][voice_assistant:537]: STT started
[08:30:53][D][light:036]: 'top_led' Setting:
[08:30:53][D][light:051]:   Brightness: 100%
[08:30:53][D][light:059]:   Red: 100%, Green: 100%, Blue: 100%
[08:30:53][D][light:109]:   Effect: 'listening'
[08:30:54][D][voice_assistant:523]: Event Type: 11
[08:30:54][D][voice_assistant:677]: Starting STT by VAD
[08:30:55][D][voice_assistant:523]: Event Type: 12
[08:30:55][D][voice_assistant:681]: STT by VAD end
[08:30:55][D][voice_assistant:416]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[08:30:55][D][voice_assistant:422]: Desired state set to AWAITING_RESPONSE
[08:30:55][D][voice_assistant:416]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[08:30:55][D][light:036]: 'top_led' Setting:
[08:30:55][D][light:051]:   Brightness: 70%
[08:30:55][D][light:059]:   Red: 0%, Green: 20%, Blue: 100%
[08:30:55][D][light:109]:   Effect: 'processing'
[08:30:55][D][voice_assistant:416]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[08:30:56][D][voice_assistant:523]: Event Type: 4
[08:30:56][D][voice_assistant:551]: Speech recognised as: " What time is it?"
[08:30:56][D][voice_assistant:523]: Event Type: 5
[08:30:56][D][voice_assistant:556]: Intent started
[08:30:56][D][voice_assistant:523]: Event Type: 6
[08:30:56][D][voice_assistant:523]: Event Type: 7
[08:30:56][D][voice_assistant:579]: Response: "Sorry, I couldn't understand that"
[08:30:56][D][voice_assistant:523]: Event Type: 8
[08:30:56][D][voice_assistant:599]: Response URL: "https://homeassistant.my.domain/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_7238ee98e6_marytts.mp3"
[08:30:56][D][voice_assistant:416]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[08:30:56][D][voice_assistant:422]: Desired state set to STREAMING_RESPONSE
[08:30:56][D][media_player:059]: 'Office Onju 2a44d8' - Setting
[08:30:56][D][media_player:066]:   Media URL: https://homeassistant.my.domain/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_7238ee98e6_marytts.mp3
[08:30:56][D][media_player:059]: 'Office Onju 2a44d8' - Setting
[08:30:56][D][media_player:066]:   Media URL: https://homeassistant.my.domain/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_7238ee98e6_marytts.mp3
[08:30:56][D][light:036]: 'top_led' Setting:
[08:30:56][D][light:059]:   Red: 20%, Green: 100%, Blue: 0%
[08:30:56][D][light:109]:   Effect: 'speaking'
[08:30:56][D][voice_assistant:523]: Event Type: 2
[08:30:57][D][voice_assistant:613]: Assist Pipeline ended
[08:30:57][W][component:232]: Component i2s_audio.media_player took a long time for an operation (522 ms).
[08:30:57][W][component:233]: Components should block for at most 30 ms.
[08:30:58][W][component:232]: Component i2s_audio.media_player took a long time for an operation (504 ms).
[08:30:58][W][component:233]: Components should block for at most 30 ms.
[08:30:58][D][light:036]: 'top_led' Setting:
[08:30:58][D][light:051]:   Brightness: 60%
[08:30:58][D][light:059]:   Red: 100%, Green: 0%, Blue: 100%
[08:30:58][D][light:109]:   Effect: 'listening_ww'
[08:30:59][D][voice_assistant:416]: State changed from STREAMING_RESPONSE to IDLE
[08:30:59][D][voice_assistant:422]: Desired state set to IDLE
[08:30:59][D][voice_assistant:416]: State changed from IDLE to START_PIPELINE
[08:30:59][D][voice_assistant:422]: Desired state set to START_MICROPHONE
[08:30:59][D][voice_assistant:118]: microphone not running
[08:30:59][D][voice_assistant:202]: Requesting start...
[08:30:59][D][voice_assistant:416]: State changed from START_PIPELINE to STARTING_PIPELINE
[08:30:59][D][voice_assistant:118]: microphone not running
[08:30:59][D][voice_assistant:118]: microphone not running
[08:30:59][D][voice_assistant:437]: Client started, streaming microphone
[08:30:59][D][voice_assistant:416]: State changed from STARTING_PIPELINE to START_MICROPHONE
[08:30:59][D][voice_assistant:422]: Desired state set to STREAMING_MICROPHONE
[08:30:59][D][voice_assistant:155]: Starting Microphone
[08:30:59][D][voice_assistant:416]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[08:30:59][D][voice_assistant:523]: Event Type: 1
[08:30:59][D][voice_assistant:526]: Assist Pipeline running
[08:30:59][D][voice_assistant:416]: State changed from STARTING_MICROPHONE to STREAMING_MICROPHONE
[08:30:59][D][voice_assistant:523]: Event Type: 9
cowboyrushforth commented 5 months ago

after more hours debugging than I would like to admit - I have come to a conclusion...

  1. during my debugging and assembly I had mostly had the unit fully assembled. but during parts of debugging to triple, quadruple check things, or use a serial cable (for example to go back and forth between arduino and esp-idf, you cant OTA) I had run the unit in various states of assembled and dis-assembled.

  2. in one thought process I was starting to think that perhaps I had a bad ac adapter, because i had a refurbished unit.

  3. after more debugging I realized, it works on the ac adaptor, OR on usb, BUT ONLY IF ITS NOT ASSEMBLED.

  4. after more debugging I realized that I would look at the inside of a new unit, (because i bought a couple, and luckily one i bought brand new and one i bought refurbished off amazon)

  5. in the new unit the plate that covers the PCBA is PLASTIC. In the refurbished ones, the PLATE IS METAL.

  6. finally my eureka moment, I re-assembled it with the plastic plate, or removed the metal plate, and it all works.

So in short - if you got a metal plate that holds the PCB down, just remove it, and re-assemble. It must quietly short something in the audio circuitry. The reason I thought it was working with some code vs other code had nothing to do with the code, but had everything to do with the state of the physical assembly and whether that silly metal plate was installed that appears to short something.

Hope this helps someone!

tetele commented 5 months ago

Very interesting, thanks for taking the time to debug this @cowboyrushforth!

The only issue i was aware of regarding the metal plate is the fact that it sometimes would make the unit not boot at all. Apparently, there's a bit of conductive foam that's causing all the headache

However, if that indeed is the problem that @ther3zz is facing, I can add a note to the README

IMG_1014_1

ther3zz commented 5 months ago

Very interesting, thanks for taking the time to debug this @cowboyrushforth!

The only issue i was aware of regarding the metal plate is the fact that it sometimes would make the unit not boot at all. Apparently, there's a bit of conductive foam that's causing all the headache

However, if that indeed is the problem that @ther3zz is facing, I can add a note to the README

IMG_1014_1

Mine doesnt have that foam and is plastic instead of metal. I also disassembled it and have the board and the speaker connected together (the board is still within the top part though) and I'm still not hearing any playback. I Tried powering through USB and through the AC adapter and it still doesnt work.

cowboyrushforth commented 5 months ago

Just to be verbose, you also have the sort of secondary pcb also connected (where the mute switch is) ya? Because if that is not connected, then the device will be set to mute.

Also you have "V3" of the onju device pcba? Did you get it from pcbway?

Finally, you could try something super super simple like this to see if it plays: (you will need to create a startup.h sound file for this, per instructions here https://esphome.io/guides/audio_clips_for_i2s.html

If you run this, you would end up with a button on the device page to play a sound. This is the most simple barebones thing to see if you can produce any sound.

substitutions:
  name: onju-voice-office
  friendly_name: onju-office

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
esphome:
  name: "${name}"
  name_add_mac_suffix: false
  friendly_name: "${friendly_name}"
  min_version: 2023.11.6
  includes:
    - startup.h
  on_boot:
    priority: 600
    then:
      - output.turn_off: set_low_speaker
  platformio_options:
    board_build.flash_mode: dio
    build_flags: "-DBOARD_HAS_PSRAM"
    board_build.arduino.memory_type: qio_opi
esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: arduino

switch:
  - platform: gpio
    pin: GPIO21
    name: "speaker enable"
    id: speakeren
    restore_mode: ALWAYS_ON
psram:
  mode: octal
  speed: 80MHz
logger:
    level: VERY_VERBOSE
ota:
improv_serial:
captive_portal:

api:
  encryption:
    key: YOUR_ENCRYPTION_KEY

i2s_audio:
  - i2s_lrclk_pin: GPIO13
    i2s_bclk_pin: GPIO18
    id: theaudio

output:
  - platform: gpio
    pin: 
      number: GPIO12
      allow_other_uses: true
    id: set_low_speaker

speaker:
  - platform: i2s_audio
    dac_type: external
    i2s_audio_id: theaudio
    i2s_dout_pin:
      number: GPIO12 
      allow_other_uses: true
    id: foobar
    mode: mono

button:
  - platform: template
    name: Play Sound
    id: playsound
    icon: "mdi:emoticon-outline"
    on_press:
      - logger.log: "Button pressed"
      - speaker.play:
          id: foobar
          data: !lambda return startup_raw;
ther3zz commented 5 months ago

Just to be verbose, you also have the sort of secondary pcb also connected (where the mute switch is) ya? Because if that is not connected, then the device will be set to mute.

Also you have "V3" of the onju device pcba? Did you get it from pcbway?

Finally, you could try something super super simple like this to see if it plays: (you will need to create a startup.h sound file for this, per instructions here https://esphome.io/guides/audio_clips_for_i2s.html

If you run this, you would end up with a button on the device page to play a sound. This is the most simple barebones thing to see if you can produce any sound.

substitutions:
  name: onju-voice-office
  friendly_name: onju-office

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
esphome:
  name: "${name}"
  name_add_mac_suffix: false
  friendly_name: "${friendly_name}"
  min_version: 2023.11.6
  includes:
    - startup.h
  on_boot:
    priority: 600
    then:
      - output.turn_off: set_low_speaker
  platformio_options:
    board_build.flash_mode: dio
    build_flags: "-DBOARD_HAS_PSRAM"
    board_build.arduino.memory_type: qio_opi
esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: arduino

switch:
  - platform: gpio
    pin: GPIO21
    name: "speaker enable"
    id: speakeren
    restore_mode: ALWAYS_ON
psram:
  mode: octal
  speed: 80MHz
logger:
    level: VERY_VERBOSE
ota:
improv_serial:
captive_portal:

api:
  encryption:
    key: YOUR_ENCRYPTION_KEY

i2s_audio:
  - i2s_lrclk_pin: GPIO13
    i2s_bclk_pin: GPIO18
    id: theaudio

output:
  - platform: gpio
    pin: 
      number: GPIO12
      allow_other_uses: true
    id: set_low_speaker

speaker:
  - platform: i2s_audio
    dac_type: external
    i2s_audio_id: theaudio
    i2s_dout_pin:
      number: GPIO12 
      allow_other_uses: true
    id: foobar
    mode: mono

button:
  - platform: template
    name: Play Sound
    id: playsound
    icon: "mdi:emoticon-outline"
    on_press:
      - logger.log: "Button pressed"
      - speaker.play:
          id: foobar
          data: !lambda return startup_raw;

Yeah, I made sure to plug in the mute board and power it through the AC adapter. Yup, I got it through PCBWay and it's the v3 board...

I'm having difficulty getting the whole conversion working properly on windows... would you mind sharing the file you've used? Thank you!

ther3zz commented 5 months ago

Just to be verbose, you also have the sort of secondary pcb also connected (where the mute switch is) ya? Because if that is not connected, then the device will be set to mute. Also you have "V3" of the onju device pcba? Did you get it from pcbway? Finally, you could try something super super simple like this to see if it plays: (you will need to create a startup.h sound file for this, per instructions here https://esphome.io/guides/audio_clips_for_i2s.html If you run this, you would end up with a button on the device page to play a sound. This is the most simple barebones thing to see if you can produce any sound.

substitutions:
  name: onju-voice-office
  friendly_name: onju-office

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
esphome:
  name: "${name}"
  name_add_mac_suffix: false
  friendly_name: "${friendly_name}"
  min_version: 2023.11.6
  includes:
    - startup.h
  on_boot:
    priority: 600
    then:
      - output.turn_off: set_low_speaker
  platformio_options:
    board_build.flash_mode: dio
    build_flags: "-DBOARD_HAS_PSRAM"
    board_build.arduino.memory_type: qio_opi
esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: arduino

switch:
  - platform: gpio
    pin: GPIO21
    name: "speaker enable"
    id: speakeren
    restore_mode: ALWAYS_ON
psram:
  mode: octal
  speed: 80MHz
logger:
    level: VERY_VERBOSE
ota:
improv_serial:
captive_portal:

api:
  encryption:
    key: YOUR_ENCRYPTION_KEY

i2s_audio:
  - i2s_lrclk_pin: GPIO13
    i2s_bclk_pin: GPIO18
    id: theaudio

output:
  - platform: gpio
    pin: 
      number: GPIO12
      allow_other_uses: true
    id: set_low_speaker

speaker:
  - platform: i2s_audio
    dac_type: external
    i2s_audio_id: theaudio
    i2s_dout_pin:
      number: GPIO12 
      allow_other_uses: true
    id: foobar
    mode: mono

button:
  - platform: template
    name: Play Sound
    id: playsound
    icon: "mdi:emoticon-outline"
    on_press:
      - logger.log: "Button pressed"
      - speaker.play:
          id: foobar
          data: !lambda return startup_raw;

Yeah, I made sure to plug in the mute board and power it through the AC adapter. Yup, I got it through PCBWay and it's the v3 board...

I'm having difficulty getting the whole conversion working properly on windows... would you mind sharing the file you've used? Thank you!

Nevermind, I got something to work. I can confirm that the audio does indeed playback on the speaker when i click the button

ther3zz commented 5 months ago

Here are some verbose logs of when it's supposed to be playing back the audio:

[12:02:02][VV][api.service:964]: on_voice_assistant_event_response: VoiceAssistantEventResponse {
  event_type: VOICE_ASSISTANT_STT_END
  data: VoiceAssistantEventData {
  name: 'text'
  value: ' What time is it?'
}
}
[12:02:02][D][voice_assistant:563]: Event Type: 4
[12:02:02][D][voice_assistant:591]: Speech recognised as: " What time is it?"
[12:02:02][VV][scheduler:032]: set_timeout(name='', timeout=0)
[12:02:02][VV][scheduler:226]: Running timeout '' with interval=0 last_execution=40940 (now=40942)
[12:02:02][VV][api.service:964]: on_voice_assistant_event_response: VoiceAssistantEventResponse {
  event_type: VOICE_ASSISTANT_INTENT_START
}
[12:02:02][D][voice_assistant:563]: Event Type: 5
[12:02:02][D][voice_assistant:596]: Intent started
[12:02:02][VV][scheduler:032]: set_timeout(name='', timeout=0)
[12:02:02][VV][scheduler:226]: Running timeout '' with interval=0 last_execution=40949 (now=40951)
[12:02:02][VV][esp32_rmt_led_strip:095]: Writing RGB values to bus...
[12:02:02][VV][api.service:964]: on_voice_assistant_event_response: VoiceAssistantEventResponse {
  event_type: VOICE_ASSISTANT_INTENT_END
  data: VoiceAssistantEventData {
  name: 'conversation_id'
  value: ''
}
}
[12:02:02][D][voice_assistant:563]: Event Type: 6
[12:02:02][VV][scheduler:032]: set_timeout(name='', timeout=0)
[12:02:02][VV][scheduler:226]: Running timeout '' with interval=0 last_execution=41032 (now=41035)
[12:02:02][VV][api.service:964]: on_voice_assistant_event_response: VoiceAssistantEventResponse {
  event_type: VOICE_ASSISTANT_TTS_START
  data: VoiceAssistantEventData {
  name: 'text'
  value: 'Sorry, I couldn't understand that'
}
}
[12:02:02][D][voice_assistant:563]: Event Type: 7
[12:02:02][D][voice_assistant:619]: Response: "Sorry, I couldn't understand that"
[12:02:02][VV][scheduler:032]: set_timeout(name='', timeout=0)
[12:02:02][VV][scheduler:226]: Running timeout '' with interval=0 last_execution=41042 (now=41045)
[12:02:02][VV][api.service:964]: on_voice_assistant_event_response: VoiceAssistantEventResponse {
  event_type: VOICE_ASSISTANT_TTS_END
  data: VoiceAssistantEventData {
  name: 'url'
  value: 'https://homeassistant.my.domain/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_7238ee98e6_marytts.mp3'
}
}
[12:02:02][D][voice_assistant:563]: Event Type: 8
[12:02:02][D][voice_assistant:639]: Response URL: "https://homeassistant.my.domain/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_7238ee98e6_marytts.mp3"
[12:02:02][VV][scheduler:032]: set_timeout(name='', timeout=0)
[12:02:02][D][voice_assistant:439]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[12:02:02][D][voice_assistant:445]: Desired state set to STREAMING_RESPONSE
[12:02:02][VV][scheduler:226]: Running timeout '' with interval=0 last_execution=41053 (now=41058)
[12:02:02][D][media_player:059]: 'Office Onju 2a44d8' - Setting
[12:02:02][D][media_player:066]:   Media URL: https://homeassistant.my.domain/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_7238ee98e6_marytts.mp3
[12:02:02][D][media_player:059]: 'Office Onju 2a44d8' - Setting
[12:02:02][D][media_player:066]:   Media URL: https://homeassistant.my.domain/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_7238ee98e6_marytts.mp3
[12:02:02][D][light:036]: 'top_led' Setting:
[12:02:02][D][light:059]:   Red: 20%, Green: 100%, Blue: 0%
[12:02:02][D][light:109]:   Effect: 'speaking'
[12:02:02][VV][api.service:964]: on_voice_assistant_event_response: VoiceAssistantEventResponse {
  event_type: VOICE_ASSISTANT_RUN_END
}
[12:02:02][D][voice_assistant:563]: Event Type: 2
[12:02:02][D][voice_assistant:653]: Assist Pipeline ended
[12:02:02][VV][scheduler:032]: set_timeout(name='', timeout=0)
[12:02:02][VV][api.service:324]: send_media_player_state_response: MediaPlayerStateResponse {
  key: 3307342432
  state: MEDIA_PLAYER_STATE_PLAYING
  volume: 1
  muted: NO
}
[12:02:02][W][component:237]: Component i2s_audio.media_player took a long time for an operation (543 ms).
[12:02:02][W][component:238]: Components should block for at most 30 ms.
[12:02:02][VV][scheduler:226]: Running timeout '' with interval=0 last_execution=41078 (now=41626)
[12:02:02][VV][scheduler:032]: set_timeout(name='', timeout=100)
[12:02:02][VV][scheduler:226]: Running interval 'update' with interval=1000 last_execution=40185 (now=41626)
[12:02:02][VV][esp32_rmt_led_strip:095]: Writing RGB values to bus...
[12:02:02][VV][scheduler:032]: set_timeout(name='playing', timeout=2000)
[12:02:02][VV][esp32_rmt_led_strip:095]: Writing RGB values to bus...
[12:02:02][VV][scheduler:032]: set_timeout(name='playing', timeout=2000)
[12:02:03][VV][api.service:324]: send_media_player_state_response: MediaPlayerStateResponse {
  key: 3307342432
  state: MEDIA_PLAYER_STATE_IDLE
  volume: 1
  muted: NO
}
[12:02:03][W][component:237]: Component i2s_audio.media_player took a long time for an operation (472 ms).
[12:02:03][W][component:238]: Components should block for at most 30 ms.
[12:02:03][VV][scheduler:226]: Running timeout '' with interval=100 last_execution=41628 (now=42115)
cowboyrushforth commented 5 months ago

Thats great that you got something! So maybe not a hardware issue. By chance is "homeassistant.my.domain" actually in your logs, is that actually internally resolvable? For my ESPHome config I use the IP of my homeassistant server and port, so for me its 10.19.15.100:8123, not homeassistant.my.domain. Thats the only thing that looks suspicious from your logs to me.

ther3zz commented 5 months ago

Thats great that you got something! So maybe not a hardware issue. By chance is "homeassistant.my.domain" actually in your logs, is that actually internally resolvable? For my ESPHome config I use the IP of my homeassistant server and port, so for me its 10.19.15.100:8123, not homeassistant.my.domain. Thats the only thing that looks suspicious from your logs to me.

Yeah, I was really worried I had messed the hardware up during the replacement process!

Yeah, thats just a placeholder. It contains the actual domain which is internally accessible. That being said... ITS ALWAYS DNS!!! Looks like esphome isnt using my 2 internal dns servers (home assistant itself is).... when I manually specified the dns servers on the onju config, it started working

Thank you for your help!

ther3zz commented 5 months ago

Thats great that you got something! So maybe not a hardware issue. By chance is "homeassistant.my.domain" actually in your logs, is that actually internally resolvable? For my ESPHome config I use the IP of my homeassistant server and port, so for me its 10.19.15.100:8123, not homeassistant.my.domain. Thats the only thing that looks suspicious from your logs to me.

Yeah, I was really worried I had messed the hardware up during the replacement process!

Yeah, thats just a placeholder. It contains the actual domain which is internally accessible. That being said... ITS ALWAYS DNS!!! Looks like esphome isnt using my 2 internal dns servers (home assistant itself is).... when I manually specified the dns servers on the onju config, it started working

Thank you for your help!

So it looks like I can get responses back when asking it to turn things on/off but it's not playing back TTS generated from the Media section and it does not seem to play media files (though i only tried with an .m4a )

EDIT: So I just figured this out. I noticed that while speaking to the assistant I would get responses but if I attempted to playback an MP3 or TTS via media it would not work. You have to turn off the Wake Word switch and then you can playback stuff.

If you have an automation that playback specific TTS or mp3s, you can set an action to turn off the wake word switch ->delay 1 seond -> playback tts/mp3 -> delay for the length of the audio played -> turn on the switch

cowboyrushforth commented 5 months ago

Ya, in my understanding of the current esphome audio frameworks, support for a lot of codecs and things is not supported. I think basic MP3 and WAV are the only reliable things, and from my understanding no HTTPS works. There is some upstream patches that help this if you dig around in esphome/esphome repo which look to have a lot of that improved, but some work to integrate.