Closed witold-gren closed 1 day ago
You should not put the wakeup sound behind https, but http, AND, avoid DNS lookup and use IP as it do steal you for some milliseconds of lookup time. Do this and it will be immediate wakeup response!
Regarding waiting for the response - that is a combination - first is back to https but here you can't do anything as the firmware of the ESP32 is pointing to your external URL (really bad from the HA Team!) - it should at least pointed to the internal team. Https and downloading mp3 is not a really good combination.
Second part - I'm about to test something this weekend - hopefully this will improve the response time.
@TheStigh Thanks for your answer, I will try to check your recommendation. PS. My DNS name of HA is completely different but I had to hide it so that the domain was not publicly available :)
I can confirm, that when I replace https to http its start working really fast. Also I removed https connection from internal path and I replace it to don't use domain name but to use directly internal IP, and now piper response also really fast. Home Assistant configuration file:
homeassistant:
name: Mieszkanie
...
external_url: "https://MY_EXTERNAL_DOMAIN"
internal_url: "http://10.20.1.4:8123"
...
http:
server_port: 8123
use_x_forwarded_for: true
trusted_proxies:
- 127.0.0.1
- 10.20.0.0/23
# IMPORTANT: Removed this ssl certificates
# ssl_certificate: /ssl/fullchain.pem
# ssl_key: /ssl/privkey.pem
ip_ban_enabled: true
login_attempts_threshold: 10
Configuration of Onju Voice:
substitutions:
name: salon-onju-voice
friendly_name: Salon Onju Voice
project_version: "1.1.0"
device_description: "Onju Voice Satellite with ESPHome software and microWakeWord"
wakeup_sound_url: "http://10.20.1.4:8123/local/sounds/wakeup.mp3"
error_sound_url: "http://10.20.1.4:8123/local/sounds/error.mp3"
timer_finished_sound_url: "http://10.20.1.4:8123/local/sounds/timer_finished.mp3"
...
Below you can find logs from my Onju Voice device:
INFO ESPHome 2024.9.0
INFO Reading configuration /config/esphome/kuchnia-onju-voice.yaml...
INFO Starting log output from 10.20.0.77 using esphome API
INFO Successfully connected to kuchnia-onju-voice @ 10.20.0.77 in 0.192s
INFO Successful handshake with kuchnia-onju-voice @ 10.20.0.77 in 0.072s
[21:35:15][I][app:100]: ESPHome version 2024.9.0 compiled on Sep 20 2024, 20:47:51
[21:35:15][I][app:102]: Project tetele.onju_voice_satellite version 1.1.0
[21:35:15][C][wifi:600]: WiFi:
[21:35:15][C][wifi:428]: Local MAC: 64:E8:33:47:7A:98
[21:35:15][C][wifi:433]: SSID: [redacted]
[21:35:15][C][wifi:436]: IP Address: 10.20.0.77
[21:35:15][C][wifi:440]: BSSID: [redacted]
[21:35:15][C][wifi:441]: Hostname: 'kuchnia-onju-voice'
[21:35:15][C][wifi:443]: Signal strength: -67 dB ▂▄▆█
[21:35:15][C][wifi:447]: Channel: 4
[21:35:15][C][wifi:448]: Subnet: 255.255.254.0
[21:35:15][C][wifi:449]: Gateway: 10.20.0.1
[21:35:15][C][wifi:450]: DNS1: 10.20.0.1
[21:35:15][C][wifi:451]: DNS2: 0.0.0.0
[21:35:15][C][logger:185]: Logger:
[21:35:15][C][logger:186]: Level: DEBUG
[21:35:15][C][logger:188]: Log Baud Rate: 115200
[21:35:15][C][logger:189]: Hardware UART: USB_SERIAL_JTAG
[21:35:15][C][template.number:050]: Template Number 'Touch threshold percentage'
[21:35:15][C][template.number:051]: Optimistic: YES
[21:35:15][C][template.number:052]: Update Interval: never
[21:35:15][C][esp32_rmt_led_strip:187]: ESP32 RMT LED Strip:
[21:35:15][C][esp32_rmt_led_strip:188]: Pin: 11
[21:35:15][C][esp32_rmt_led_strip:189]: Channel: 0
[21:35:15][C][esp32_rmt_led_strip:214]: RGB Order: GRB
[21:35:15][C][esp32_rmt_led_strip:215]: Max refresh rate: 0
[21:35:15][C][esp32_rmt_led_strip:216]: Number of LEDs: 6
[21:35:15][C][switch.gpio:068]: GPIO Switch 'dac_mute'
[21:35:15][C][switch.gpio:091]: Restore Mode: always OFF
[21:35:15][C][switch.gpio:031]: Pin: GPIO21
[21:35:15][C][gpio.binary_sensor:015]: GPIO Binary Sensor 'Disable wake word'
[21:35:15][C][gpio.binary_sensor:016]: Pin: GPIO38
[21:35:15][C][light:103]: Light 'leds'
[21:35:15][C][light:105]: Default Transition Length: 0.0s
[21:35:15][C][light:106]: Gamma Correct: 2.80
[21:35:15][C][light:103]: Light 'left_led'
[21:35:15][C][light:105]: Default Transition Length: 0.1s
[21:35:15][C][light:106]: Gamma Correct: 2.80
[21:35:15][C][light:103]: Light 'top_led'
[21:35:15][C][light:105]: Default Transition Length: 0.1s
[21:35:15][C][light:106]: Gamma Correct: 2.80
[21:35:15][C][light:103]: Light 'right_led'
[21:35:15][C][light:105]: Default Transition Length: 0.1s
[21:35:15][C][light:106]: Gamma Correct: 2.80
[21:35:15][C][template.switch:068]: Template Switch 'Use Wake Word'
[21:35:15][C][template.switch:091]: Restore Mode: restore defaults to ON
[21:35:15][C][template.switch:057]: Optimistic: YES
[21:35:15][C][template.switch:068]: Template Switch 'Wake Word Listening Light'
[21:35:15][C][template.switch:091]: Restore Mode: restore defaults to ON
[21:35:15][C][template.switch:057]: Optimistic: YES
[21:35:15][C][psram:020]: PSRAM:
[21:35:15][C][psram:021]: Available: YES
[21:35:15][C][psram:024]: Size: 8191 KB
[21:35:15][C][i2s_audio:028]: I2SController:
[21:35:15][C][i2s_audio:029]: AccessMode: duplex
[21:35:15][C][i2s_audio:030]: Port: 0
[21:35:15][C][i2s_audio:032]: Reader registered.
[21:35:15][C][i2s_audio:035]: Writer registered.
[21:35:15][C][i2s_audio:139]: I2S-Writer (Fixed-CFG):
[21:35:15][C][i2s_audio:141]: sample-rate: 16000 bits_per_sample: 32
[21:35:15][C][i2s_audio:142]: channel_fmt: 4 channels: 1
[21:35:15][C][i2s_audio:143]: use_apll: no, use_pdm: no
[21:35:15][C][i2s_audio:136]: I2S-Reader (Fixed-CFG):
[21:35:15][C][i2s_audio:141]: sample-rate: 16000 bits_per_sample: 32
[21:35:15][C][i2s_audio:142]: channel_fmt: 4 channels: 1
[21:35:15][C][i2s_audio:143]: use_apll: no, use_pdm: no
[21:35:15][C][esp32_touch:073]: Config for ESP32 Touch Hub:
[21:35:15][C][esp32_touch:074]: Meas cycle: 0.80ms
[21:35:15][C][esp32_touch:075]: Sleep cycle: 2.00ms
[21:35:15][C][esp32_touch:095]: Low Voltage Reference: 0.8V
[21:35:15][C][esp32_touch:115]: High Voltage Reference: 2.4V
[21:35:15][C][esp32_touch:135]: Voltage Attenuation: 0V
[21:35:15][C][esp32_touch:169]: Filter mode: IIR_16
[21:35:15][C][esp32_touch:170]: Debounce count: 2
[21:35:15][C][esp32_touch:171]: Noise threshold coefficient: 0
[21:35:15][C][esp32_touch:172]: Jitter filter step size: 0
[21:35:15][C][esp32_touch:191]: Smooth level: IIR_2
[21:35:15][C][esp32_touch:213]: Denoise grade: BIT8
[21:35:15][C][esp32_touch:245]: Denoise capacitance level: L0
[21:35:15][C][esp32_touch:260]: Touch Pad 'volume_down'
[21:35:15][C][esp32_touch:261]: Pad: T4
[21:35:15][C][esp32_touch:262]: Threshold: 477347
[21:35:15][C][esp32_touch:260]: Touch Pad 'volume_up'
[21:35:15][C][esp32_touch:261]: Pad: T2
[21:35:15][C][esp32_touch:262]: Threshold: 545831
[21:35:15][C][esp32_touch:260]: Touch Pad 'action'
[21:35:15][C][esp32_touch:261]: Pad: T3
[21:35:15][C][esp32_touch:262]: Threshold: 702088
[21:35:15][C][captive_portal:089]: Captive Portal:
[21:35:15][C][mdns:116]: mDNS:
[21:35:15][C][mdns:117]: Hostname: kuchnia-onju-voice
[21:35:15][C][esphome.ota:073]: Over-The-Air updates:
[21:35:15][C][esphome.ota:074]: Address: kuchnia-onju-voice.local:3232
[21:35:15][C][esphome.ota:075]: Version: 2
[21:35:15][C][safe_mode:018]: Safe Mode:
[21:35:15][C][safe_mode:020]: Boot considered successful after 60 seconds
[21:35:15][C][safe_mode:021]: Invoke after 10 boot attempts
[21:35:15][C][safe_mode:023]: Remain in safe mode for 300 seconds
[21:35:15][C][api:139]: API Server:
[21:35:15][C][api:140]: Address: kuchnia-onju-voice.local:6053
[21:35:15][C][api:142]: Using noise encryption: YES
[21:35:15][C][improv_serial:032]: Improv Serial:
[21:35:15][C][micro_wake_word:051]: microWakeWord:
[21:35:15][C][micro_wake_word:052]: models:
[21:35:15][C][micro_wake_word:015]: - Wake Word: hey jarvis
[21:35:15][C][micro_wake_word:016]: Probability cutoff: 0.970
[21:35:15][C][micro_wake_word:017]: Sliding window size: 5
[21:35:15][C][micro_wake_word:021]: - VAD Model
[21:35:15][C][micro_wake_word:022]: Probability cutoff: 0.500
[21:35:15][C][micro_wake_word:023]: Sliding window size: 5
[21:35:15][C][esp_adf_pipeline.microphone:020]: ADF-Microphone
[21:35:15][C][adf_media_player:016]: ESP-ADF-MediaPlayer:
[21:35:15][C][adf_media_player:018]: MP_ANNOUNCE enabled
[21:35:15][C][adf_media_player:024]: Number of ADFComponents: 3
[21:35:28][D][micro_wake_word:162]: The 'hey jarvis' model sliding average probability is 0.987 and most recent probability is 1.000
[21:35:28][D][micro_wake_word:123]: Wake Word 'hey jarvis' Detected
[21:35:28][D][micro_wake_word:195]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE
[21:35:28][D][micro_wake_word:129]: Stopping Microphone
[21:35:28][D][esp_adf_pipeline:070]: Called 'stop' while in RUNNING state.
[21:35:28][D][micro_wake_word:195]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[21:35:28][D][esp_adf_pipeline:448]: [ADFMicrophone] Pipeline changed from RUNNING to ABORTING. (REQ: 1)
[21:35:28][D][adf_audio_element:324]: [i2s_in] Checking State for stopping, got 3
[21:35:28][D][esp-idf:000][i2s_in]: W (1916424) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT
[21:35:28][D][esp-idf:000][i2s_in]: W (1916427) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT
[21:35:28][D][esp-idf:000][i2s_in]: W (1916430) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT
[21:35:28][D][esp-idf:000][i2s_in]: W (1916433) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT
[21:35:28][D][esp-idf:000][i2s_in]: W (1916436) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT
[21:35:28][D][esp_adf_pipeline:448]: [ADFMicrophone] Pipeline changed from ABORTING to STOPPED. (REQ: 1)
[21:35:28][D][micro_wake_word:195]: State changed from STOPPING_MICROPHONE to IDLE
[21:35:28][D][media_player:061]: 'Kuchnia Onju Voice' - Setting
[21:35:28][D][media_player:068]: Media URL: http://10.20.1.4:8123/local/sounds/wakeup.mp3
[21:35:28][D][esp_audio_sources:098]: Set new uri: http://10.20.1.4:8123/local/sounds/wakeup.mp3
[21:35:28][D][adf_media_player:057]: Got control call in state IDLE
[21:35:28][D][adf_media_player:058]: req_track stream uri: http://10.20.1.4:8123/local/sounds/wakeup.mp3
[21:35:28][D][esp_adf_pipeline:060]: Starting request, current state STOPPED
[21:35:28][D][voice_assistant:514]: State changed from IDLE to START_MICROPHONE
[21:35:28][D][voice_assistant:520]: Desired state set to START_PIPELINE
[21:35:28][D][voice_assistant:226]: Starting Microphone
[21:35:28][D][esp_adf_pipeline.microphone:025]: start request while ine state 0
[21:35:28][D][esp_adf_pipeline:060]: Starting request, current state STOPPED
[21:35:28][D][voice_assistant:514]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[21:35:28][D][esp_adf_pipeline:448]: [ADFMicrophone] Pipeline changed from STOPPED to PREPARING. (REQ: 0)
[21:35:28][D][esp_adf_pipeline:448]: [MediaPlayer] Pipeline changed from STOPPED to PREPARING. (REQ: 0)
[21:35:28][I][adf_media_player:192]: got new pipeline state: 3, while in MP state IDLE
[21:35:28][D][adf_i2s_out:141]: Set final i2s settings: 16000
[21:35:28][D][esp_audio_processors:124]: Current settings: SRC: rate: 22050, ch: 1 bits: 16, DST: rate: 16000, ch: 1, bits 16
[21:35:28][I][adf_media_player:256]: current mp state: PLAYING
[21:35:28][I][adf_media_player:257]: anouncement: false
[21:35:28][I][adf_media_player:258]: play_intent: false
[21:35:28][I][adf_media_player:259]: current_uri_: yes
[21:35:28][D][adf_audio_element:108]: Preparing [i2s_in]...
[21:35:28][D][esp_audio_sources:103]: Prepare elements called (initial_call)!
[21:35:28][D][esp_audio_sources:137]: Use fixed settings: no
[21:35:28][D][esp_audio_sources:138]: Streamer status: 6
[21:35:28][D][esp_audio_sources:139]: decoder status: 6
[21:35:28][D][esp_audio_sources:140]: stream uri: http://10.20.1.4:8123/local/sounds/wakeup.mp3
[21:35:28][D][adf_audio_element:108]: Preparing [http]...
[21:35:28][D][adf_audio_element:108]: Preparing [decoder]...
[21:35:28][D][adf_audio_element:108]: Preparing [pcm_reader]...
[21:35:28][D][adf_audio_element:108]: Preparing [resampler]...
[21:35:28][D][adf_audio_element:108]: Preparing [i2s_out]...
[21:35:28][D][esp_adf_pipeline:342]: wait for preparation, done
[21:35:28][D][esp_adf_pipeline:448]: [ADFMicrophone] Pipeline changed from PREPARING to STARTING. (REQ: 0)
[21:35:28][D][adf_audio_element:165]: Resuming [i2s_in]...
[21:35:28][D][adf_audio_element:172]: [i2s_in] Sending resume command.
[21:35:28][D][esp-idf:000][i2s_in]: I (1916574) AUDIO_ELEMENT: [i2s_in] AEL_MSG_CMD_RESUME,state:1
[21:35:28][D][adf_audio_element:191]: [i2s_in] Checking State, got 78
[21:35:28][I][esp_adf_pipeline:132]: [ i2s_in ] status: 12
[21:35:28][D][adf_audio_element:165]: Resuming [http]...
[21:35:28][D][adf_audio_element:172]: [http] Sending resume command.
[21:35:28][D][adf_audio_element:165]: Resuming [decoder]...
[21:35:28][D][adf_audio_element:172]: [decoder] Sending resume command.
[21:35:28][D][adf_audio_element:191]: [pcm_reader] Checking State, got 65
[21:35:28][D][esp_adf_pipeline:448]: [ADFMicrophone] Pipeline changed from STARTING to RUNNING. (REQ: 0)
[21:35:28][D][voice_assistant:514]: State changed from STARTING_MICROPHONE to START_PIPELINE
[21:35:28][D][voice_assistant:280]: Requesting start...
[21:35:28][D][voice_assistant:514]: State changed from START_PIPELINE to STARTING_PIPELINE
[21:35:28][D][adf_audio_element:191]: [http] Checking State, got 79
[21:35:28][D][adf_audio_element:191]: [decoder] Checking State, got 79
[21:35:28][D][voice_assistant:535]: Client started, streaming microphone
[21:35:28][D][voice_assistant:514]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[21:35:28][D][voice_assistant:520]: Desired state set to STREAMING_MICROPHONE
[21:35:28][I][HTTPStreamReader:230]: Codec Format reported: 3.
[21:35:28][D][voice_assistant:637]: Event Type: 1
[21:35:28][D][voice_assistant:640]: Assist Pipeline running
[21:35:28][D][voice_assistant:637]: Event Type: 3
[21:35:28][D][voice_assistant:651]: STT started
[21:35:28][D][light:036]: 'top_led' Setting:
[21:35:28][D][light:051]: Brightness: 100%
[21:35:28][D][light:059]: Red: 100%, Green: 100%, Blue: 100%
[21:35:28][D][light:109]: Effect: 'listening'
[21:35:28][I][HTTPStreamReader:240]: [ * ] Receive music info from decoder, sample_rates=44100, bits=16, ch=2
[21:35:28][I][HTTPStreamReader:243]: [ * ] Receive music info from decoder, codec_fmt=3, bps=192000, duration=0, bytes=-93
[21:35:28][D][adf_i2s_out:141]: Set final i2s settings: 16000
[21:35:28][D][esp_audio_processors:108]: Received request from: HTTPStreamReader
[21:35:28][D][esp_audio_processors:113]: New settings: SRC: rate: 44100, ch: 2 bits: 16, DST: rate: 16000, ch: 1, bits 16
[21:35:28][D][esp_audio_processors:124]: Current settings: SRC: rate: 44100, ch: 2 bits: 16, DST: rate: 16000, ch: 1, bits 16
[21:35:28][D][adf_audio_element:108]: Preparing [http]...
[21:35:28][D][adf_audio_element:108]: Preparing [decoder]...
[21:35:28][D][esp-idf:000][decoder]: W (1916719) AUDIO_ELEMENT: OUT-[decoder] AEL_IO_ABORT
[21:35:28][D][esp-idf:000][decoder]: W (1916722) MP3_DECODER: output aborted -3
[21:35:28][D][esp-idf:000][decoder]: I (1916728) MP3_DECODER: Closed
[21:35:28][D][esp_audio_sources:193]: Preparation done!
[21:35:28][D][esp_adf_pipeline:342]: wait for preparation, done
[21:35:28][D][esp_adf_pipeline:448]: [MediaPlayer] Pipeline changed from PREPARING to STARTING. (REQ: 0)
[21:35:28][I][adf_media_player:192]: got new pipeline state: 5, while in MP state PLAYING
[21:35:28][I][adf_media_player:256]: current mp state: PLAYING
[21:35:28][I][adf_media_player:257]: anouncement: false
[21:35:28][I][adf_media_player:258]: play_intent: false
[21:35:28][I][adf_media_player:259]: current_uri_: yes
[21:35:28][D][adf_audio_element:165]: Resuming [http]...
[21:35:28][D][adf_audio_element:172]: [http] Sending resume command.
[21:35:28][D][adf_audio_element:165]: Resuming [decoder]...
[21:35:28][D][adf_audio_element:172]: [decoder] Sending resume command.
[21:35:28][D][esp-idf:000][decoder]: I (1916774) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:5
[21:35:29][D][esp-idf:000][decoder]: I (1916778) MP3_DECODER: MP3 opened
[21:35:29][D][esp-idf:000][http]: I (1917080) HTTP_CLIENT: Body received in fetch header state, 0x3fccab9e, 1702
[21:35:29][D][esp-idf:000][http]: I (1917085) HTTP_STREAM: total_bytes=19527
[21:35:29][I][HTTPStreamReader:230]: Codec Format reported: 3.
[21:35:29][I][esp_adf_pipeline:132]: [ http ] status: 12
[21:35:29][I][esp_adf_pipeline:132]: [ decoder ] status: 12
[21:35:29][I][HTTPStreamReader:240]: [ * ] Receive music info from decoder, sample_rates=44100, bits=16, ch=2
[21:35:29][I][HTTPStreamReader:243]: [ * ] Receive music info from decoder, codec_fmt=3, bps=192000, duration=0, bytes=-93
[21:35:29][D][esp-idf:000][http]: W (1917524) HTTP_STREAM: No more data,errno:0, total_bytes:19527, rlen = 0
[21:35:29][I][esp_audio_sources:033][http]: Receive http event: 7
[21:35:29][D][esp-idf:000][http]: I (1917533) AUDIO_ELEMENT: IN-[http] AEL_IO_DONE,0
[21:35:29][I][esp_adf_pipeline:123]: [ http ] byte_pos: 19527, total: 19527
[21:35:29][I][esp_adf_pipeline:132]: [ http ] status: 15
[21:35:29][I][esp_adf_pipeline:135]: current state: RUNNING
[21:35:29][D][esp_adf_pipeline:448]: [MediaPlayer] Pipeline changed from RUNNING to FINISHING. (REQ: 0)
[21:35:29][I][adf_media_player:192]: got new pipeline state: 7, while in MP state PLAYING
[21:35:29][I][adf_media_player:256]: current mp state: PLAYING
[21:35:29][I][adf_media_player:257]: anouncement: false
[21:35:29][I][adf_media_player:258]: play_intent: false
[21:35:29][I][adf_media_player:259]: current_uri_: yes
[21:35:29][D][esp-idf:000][decoder]: I (1917684) AUDIO_ELEMENT: IN-[decoder] AEL_IO_DONE,-2
[21:35:29][D][esp-idf:000][decoder]: I (1917770) MP3_DECODER: Closed
[21:35:29][I][esp_adf_pipeline:123]: [ decoder ] byte_pos: 0, total: -93
[21:35:29][I][esp_adf_pipeline:132]: [ decoder ] status: 15
[21:35:29][I][esp_adf_pipeline:135]: current state: FINISHING
[21:35:29][D][esp-idf:000][resampler]: I (1917801) AUDIO_ELEMENT: IN-[resampler] AEL_IO_DONE,-2
[21:35:29][I][esp_adf_pipeline:132]: [ resampler ] status: 15
[21:35:29][I][esp_adf_pipeline:135]: current state: FINISHING
[21:35:29][D][esp-idf:000][i2s_out]: I (1917880) AUDIO_ELEMENT: IN-[i2s_out] AEL_IO_DONE,-2
[21:35:30][I][esp_adf_pipeline:123]: [ i2s_out ] byte_pos: 0, total: 0
[21:35:30][I][esp_adf_pipeline:132]: [ i2s_out ] status: 15
[21:35:30][I][esp_adf_pipeline:135]: current state: FINISHING
[21:35:30][D][esp_adf_pipeline:448]: [MediaPlayer] Pipeline changed from FINISHING to STOPPED. (REQ: 1)
[21:35:30][I][adf_media_player:192]: got new pipeline state: 4, while in MP state PLAYING
[21:35:30][I][adf_media_player:256]: current mp state: IDLE
[21:35:30][I][adf_media_player:257]: anouncement: false
[21:35:30][I][adf_media_player:258]: play_intent: false
[21:35:30][I][adf_media_player:259]: current_uri_: yes
[21:35:30][D][voice_assistant:637]: Event Type: 11
[21:35:30][D][voice_assistant:793]: Starting STT by VAD
[21:35:32][D][voice_assistant:637]: Event Type: 12
[21:35:32][D][voice_assistant:797]: STT by VAD end
[21:35:32][D][voice_assistant:514]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[21:35:32][D][voice_assistant:520]: Desired state set to AWAITING_RESPONSE
[21:35:32][D][esp_adf_pipeline:070]: Called 'stop' while in RUNNING state.
[21:35:32][D][voice_assistant:514]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[21:35:32][D][esp_adf_pipeline:448]: [ADFMicrophone] Pipeline changed from RUNNING to ABORTING. (REQ: 1)
[21:35:32][D][light:036]: 'top_led' Setting:
[21:35:32][D][light:051]: Brightness: 70%
[21:35:32][D][light:059]: Red: 0%, Green: 20%, Blue: 100%
[21:35:32][D][light:109]: Effect: 'processing'
[21:35:32][D][adf_audio_element:324]: [i2s_in] Checking State for stopping, got 3
[21:35:32][D][adf_audio_element:324]: [pcm_reader] Checking State for stopping, got 3
[21:35:32][D][esp_adf_pipeline:448]: [ADFMicrophone] Pipeline changed from ABORTING to STOPPED. (REQ: 1)
[21:35:32][D][voice_assistant:514]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[21:35:32][D][voice_assistant:637]: Event Type: 4
[21:35:32][D][voice_assistant:665]: Speech recognised as: "zgaś światło w kuchni"
[21:35:32][D][voice_assistant:637]: Event Type: 5
[21:35:32][D][voice_assistant:670]: Intent started
[21:35:32][D][voice_assistant:637]: Event Type: 6
[21:35:32][D][voice_assistant:637]: Event Type: 7
[21:35:32][D][voice_assistant:693]: Response: "Wyłączono światło"
[21:35:32][D][voice_assistant:637]: Event Type: 8
[21:35:32][D][voice_assistant:715]: Response URL: "http://10.20.1.4:8123/api/tts_proxy/721138292ba7c094c6131830fda7bea4a1865fb4_pl-pl_6d43988cf6_tts.piper.mp3"
[21:35:32][D][voice_assistant:514]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[21:35:32][D][voice_assistant:520]: Desired state set to STREAMING_RESPONSE
[21:35:32][D][media_player:061]: 'Kuchnia Onju Voice' - Setting
[21:35:32][D][media_player:068]: Media URL: http://10.20.1.4:8123/api/tts_proxy/721138292ba7c094c6131830fda7bea4a1865fb4_pl-pl_6d43988cf6_tts.piper.mp3
[21:35:32][D][media_player:074]: Announcement: yes
[21:35:32][D][adf_media_player:057]: Got control call in state IDLE
[21:35:32][D][adf_media_player:058]: req_track stream uri: http://10.20.1.4:8123/api/tts_proxy/721138292ba7c094c6131830fda7bea4a1865fb4_pl-pl_6d43988cf6_tts.piper.mp3
[21:35:32][D][esp_adf_pipeline:060]: Starting request, current state STOPPED
[21:35:32][D][light:036]: 'top_led' Setting:
[21:35:32][D][light:059]: Red: 20%, Green: 100%, Blue: 0%
[21:35:32][D][light:109]: Effect: 'speaking'
[21:35:32][D][voice_assistant:637]: Event Type: 2
[21:35:32][D][voice_assistant:729]: Assist Pipeline ended
[21:35:32][D][esp_adf_pipeline:448]: [MediaPlayer] Pipeline changed from STOPPED to PREPARING. (REQ: 0)
[21:35:32][I][adf_media_player:192]: got new pipeline state: 3, while in MP state IDLE
[21:35:32][D][adf_i2s_out:141]: Set final i2s settings: 16000
[21:35:32][D][esp_audio_processors:124]: Current settings: SRC: rate: 44100, ch: 2 bits: 16, DST: rate: 16000, ch: 1, bits 16
[21:35:32][I][adf_media_player:256]: current mp state: ANNOUNCING
[21:35:32][I][adf_media_player:257]: anouncement: yes
[21:35:32][I][adf_media_player:258]: play_intent: false
[21:35:32][I][adf_media_player:259]: current_uri_: yes
[21:35:32][D][light:036]: 'top_led' Setting:
[21:35:32][D][light:051]: Brightness: 60%
[21:35:32][D][light:059]: Red: 100%, Green: 0%, Blue: 100%
[21:35:32][D][light:109]: Effect: 'listening_ww'
[21:35:32][D][micro_wake_word:399]: Resetting buffers and probabilities
[21:35:32][D][micro_wake_word:195]: State changed from IDLE to START_MICROPHONE
[21:35:32][D][micro_wake_word:107]: Starting Microphone
[21:35:32][D][esp_adf_pipeline.microphone:025]: start request while ine state 0
[21:35:32][D][esp_adf_pipeline:060]: Starting request, current state STOPPED
[21:35:32][D][micro_wake_word:195]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[21:35:32][D][esp_adf_pipeline:448]: [ADFMicrophone] Pipeline changed from STOPPED to PREPARING. (REQ: 0)
[21:35:32][D][esp_audio_sources:103]: Prepare elements called (initial_call)!
[21:35:32][D][esp_audio_sources:137]: Use fixed settings: no
[21:35:32][D][esp_audio_sources:138]: Streamer status: 6
[21:35:32][D][esp_audio_sources:139]: decoder status: 6
[21:35:32][D][esp_audio_sources:140]: stream uri: http://10.20.1.4:8123/api/tts_proxy/721138292ba7c094c6131830fda7bea4a1865fb4_pl-pl_6d43988cf6_tts.piper.mp3
[21:35:32][D][adf_audio_element:108]: Preparing [http]...
[21:35:32][D][adf_audio_element:108]: Preparing [decoder]...
[21:35:32][D][adf_audio_element:108]: Preparing [i2s_in]...
[21:35:32][D][adf_audio_element:108]: Preparing [resampler]...
[21:35:32][D][adf_audio_element:108]: Preparing [pcm_reader]...
[21:35:32][D][adf_audio_element:108]: Preparing [i2s_out]...
[21:35:32][D][esp_adf_pipeline:342]: wait for preparation, done
[21:35:32][D][esp_adf_pipeline:448]: [ADFMicrophone] Pipeline changed from PREPARING to STARTING. (REQ: 0)
[21:35:32][D][adf_audio_element:165]: Resuming [i2s_in]...
[21:35:32][D][adf_audio_element:172]: [i2s_in] Sending resume command.
[21:35:32][D][esp-idf:000][i2s_in]: I (1920721) AUDIO_ELEMENT: [i2s_in] AEL_MSG_CMD_RESUME,state:1
[21:35:32][I][esp_audio_sources:033][http]: Receive http event: 1
[21:35:32][D][micro_wake_word:195]: State changed from STARTING_MICROPHONE to DETECTING_WAKE_WORD
[21:35:32][D][adf_audio_element:191]: [http] Checking State, got 79
[21:35:32][D][adf_audio_element:191]: [decoder] Checking State, got 79
[21:35:32][I][esp_audio_sources:033][http]: Receive http event: 2
[21:35:32][I][esp_audio_sources:033][http]: Receive http event: 4
[21:35:32][D][esp-idf:000][http]: I (1920778) HTTP_CLIENT: Body received in fetch header state, 0x3fcc8573, 1841
[21:35:32][D][esp-idf:000][http]: I (1920782) HTTP_STREAM: total_bytes=14223
[21:35:32][I][HTTPStreamReader:230]: Codec Format reported: 3.
[21:35:32][I][HTTPStreamReader:240]: [ * ] Receive music info from decoder, sample_rates=22050, bits=16, ch=1
[21:35:32][I][HTTPStreamReader:243]: [ * ] Receive music info from decoder, codec_fmt=3, bps=75000, duration=1384, bytes=-1147
[21:35:32][D][adf_i2s_out:141]: Set final i2s settings: 16000
[21:35:32][D][esp_audio_processors:108]: Received request from: HTTPStreamReader
[21:35:32][D][esp_audio_processors:113]: New settings: SRC: rate: 22050, ch: 1 bits: 16, DST: rate: 16000, ch: 1, bits 16
[21:35:32][D][esp_audio_processors:124]: Current settings: SRC: rate: 22050, ch: 1 bits: 16, DST: rate: 16000, ch: 1, bits 16
[21:35:32][D][adf_audio_element:108]: Preparing [http]...
[21:35:32][D][adf_audio_element:108]: Preparing [decoder]...
[21:35:32][D][esp-idf:000][decoder]: W (1920854) AUDIO_ELEMENT: OUT-[decoder] AEL_IO_ABORT
[21:35:32][D][esp-idf:000][decoder]: W (1920858) MP3_DECODER: output aborted -3
[21:35:32][D][esp-idf:000][decoder]: I (1920861) MP3_DECODER: Closed
[21:35:32][D][esp_audio_sources:193]: Preparation done!
[21:35:32][D][esp_adf_pipeline:342]: wait for preparation, done
[21:35:32][D][esp_adf_pipeline:448]: [MediaPlayer] Pipeline changed from PREPARING to STARTING. (REQ: 0)
[21:35:32][I][adf_media_player:192]: got new pipeline state: 5, while in MP state ANNOUNCING
[21:35:32][I][adf_media_player:256]: current mp state: ANNOUNCING
[21:35:32][I][adf_media_player:257]: anouncement: yes
[21:35:32][I][adf_media_player:258]: play_intent: false
[21:35:32][I][adf_media_player:259]: current_uri_: yes
[21:35:32][D][adf_audio_element:165]: Resuming [http]...
[21:35:32][D][adf_audio_element:172]: [http] Sending resume command.
[21:35:32][D][adf_audio_element:165]: Resuming [decoder]...
[21:35:33][D][adf_audio_element:172]: [decoder] Sending resume command.
[21:35:33][D][esp-idf:000][decoder]: I (1920988) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1
[21:35:33][D][esp-idf:000][decoder]: I (1921217) MP3_DECODER: MP3 opened
[21:35:33][D][esp-idf:000][http]: I (1921234) HTTP_CLIENT: Body received in fetch header state, 0x3fcca637, 1841
[21:35:33][D][esp-idf:000][http]: I (1921238) HTTP_STREAM: total_bytes=14223
[21:35:33][D][esp_adf_pipeline:448]: [MediaPlayer] Pipeline changed from STARTING to RUNNING. (REQ: 0)
[21:35:33][I][adf_media_player:192]: got new pipeline state: 6, while in MP state ANNOUNCING
[21:35:33][I][adf_media_player:256]: current mp state: ANNOUNCING
[21:35:33][I][adf_media_player:257]: anouncement: yes
[21:35:33][I][adf_media_player:258]: play_intent: false
[21:35:33][I][adf_media_player:259]: current_uri_: yes
[21:35:33][I][HTTPStreamReader:230]: Codec Format reported: 3.
[21:35:33][I][esp_adf_pipeline:132]: [ http ] status: 12
[21:35:33][I][esp_adf_pipeline:132]: [ decoder ] status: 12
[21:35:33][I][HTTPStreamReader:240]: [ * ] Receive music info from decoder, sample_rates=22050, bits=16, ch=1
[21:35:33][I][HTTPStreamReader:243]: [ * ] Receive music info from decoder, codec_fmt=3, bps=75000, duration=1384, bytes=-1147
[21:35:33][D][esp-idf:000][http]: W (1921648) HTTP_STREAM: No more data,errno:0, total_bytes:14223, rlen = 0
[21:35:33][I][esp_audio_sources:033][http]: Receive http event: 7
[21:35:33][D][esp-idf:000][http]: I (1921659) AUDIO_ELEMENT: IN-[http] AEL_IO_DONE,0
[21:35:33][I][esp_adf_pipeline:123]: [ http ] byte_pos: 0, total: 14223
[21:35:33][I][esp_adf_pipeline:132]: [ http ] status: 15
[21:35:33][I][esp_adf_pipeline:135]: current state: RUNNING
[21:35:33][D][esp_adf_pipeline:448]: [MediaPlayer] Pipeline changed from RUNNING to FINISHING. (REQ: 0)
[21:35:33][I][adf_media_player:192]: got new pipeline state: 7, while in MP state ANNOUNCING
[21:35:33][I][adf_media_player:256]: current mp state: ANNOUNCING
[21:35:33][I][adf_media_player:257]: anouncement: yes
[21:35:33][I][adf_media_player:258]: play_intent: false
[21:35:33][I][adf_media_player:259]: current_uri_: yes
[21:35:33][D][esp-idf:000][decoder]: I (1921996) AUDIO_ELEMENT: IN-[decoder] AEL_IO_DONE,-2
[21:35:34][D][esp-idf:000][decoder]: I (1922440) MP3_DECODER: Closed
[21:35:34][I][esp_adf_pipeline:123]: [ decoder ] byte_pos: 0, total: -1147
[21:35:34][I][esp_adf_pipeline:132]: [ decoder ] status: 15
[21:35:34][I][esp_adf_pipeline:135]: current state: FINISHING
[21:35:34][D][esp-idf:000][resampler]: I (1922568) AUDIO_ELEMENT: IN-[resampler] AEL_IO_DONE,-2
[21:35:34][I][esp_adf_pipeline:132]: [ resampler ] status: 15
[21:35:34][I][esp_adf_pipeline:135]: current state: FINISHING
[21:35:34][D][esp-idf:000][i2s_out]: I (1922631) AUDIO_ELEMENT: IN-[i2s_out] AEL_IO_DONE,-2
[21:35:34][I][esp_adf_pipeline:123]: [ i2s_out ] byte_pos: 0, total: 0
[21:35:34][I][esp_adf_pipeline:132]: [ i2s_out ] status: 15
[21:35:34][I][esp_adf_pipeline:135]: current state: FINISHING
[21:35:34][D][esp_adf_pipeline:448]: [MediaPlayer] Pipeline changed from FINISHING to STOPPED. (REQ: 1)
[21:35:34][I][adf_media_player:192]: got new pipeline state: 4, while in MP state ANNOUNCING
[21:35:34][I][adf_media_player:256]: current mp state: IDLE
[21:35:34][I][adf_media_player:257]: anouncement: false
[21:35:34][I][adf_media_player:258]: play_intent: false
[21:35:34][I][adf_media_player:259]: current_uri_: yes
[21:35:36][D][voice_assistant:514]: State changed from STREAMING_RESPONSE to IDLE
[21:35:36][D][voice_assistant:520]: Desired state set to IDLE
@TheStigh I have one more question. Currently, it works so quickly that the prefix bo
is added to each sentence - it is definitely a translation of the signal. Can I somehow delay listening to the command by 1 second? 😀
Currently we have: "bo zgaś światło w sypialni" but it should be: "zgaś światło w sypialni" 😀 I see this prefix bo
really in all sentences..
I found a solution.. I just add delay: 500ms
when wake word detected:
micro_wake_word:
models:
#- model: https://github.com/kahrendt/microWakeWord/releases/download/v2.1_models/alexa.json
# - model: https://github.com/kahrendt/microWakeWord/releases/download/v2.1_models/okay_nabu.json
- model: https://github.com/kahrendt/microWakeWord/releases/download/v2.1_models/hey_jarvis.json
#- model: https://github.com/kahrendt/microWakeWord/releases/download/v2.1_models/hey_mycroft.json
vad:
model: https://github.com/kahrendt/microWakeWord/releases/download/v2.1_models/vad.json
on_wake_word_detected:
- if:
condition: media_player.is_playing
then:
- media_player.pause
- media_player.play_media: "${wakeup_sound_url}"
- delay: 500ms
- wait_until:
not:
media_player.is_playing: onju_out
- voice_assistant.start:
wake_word: !lambda return wake_word;
All problems have been solved 😀 I think it is worth adding such information to the README.md file:
https
connection, only http
(encrypted connection delays communication very much)cc: @tetele
All problems have been solved 😀 I think it is worth adding such information to the README.md file:
- If possible, use an IP address instead of a domain name for an internal address
- in internal network, do not use
https
connection, onlyhttp
(encrypted connection delays communication very much)cc: @tetele
Great that it worked out :) Though, how many seconds does it take from you've finished talking until you hear the sound? Here, I think it is more time to cut as we do download the response as an mp3 file, and this is done through also the external URL which is https (as you still use).
I think I've found a way to cut with at least 1 second but I will need help from somebody better with the ESPHome code than me.
EDIT: @witold-gren I just looked through your logs and see the response is played from your INTERNAL url and not your EXPTERNAL? I'm baffled .... So I changed my configuration.yaml according to yours and tested with http: and 8123 and now it point to my INTERNAL url? But I don't understand WHY it changed from EXTERNAL to INTERNAL ?
I post new video with example how fast it works.. https://youtu.be/h63s-1HTkN8?feature=shared
Unfortunately, now due to the addition of a 500ms delay, the LED lighting works a bit strange - you can see it in the video. Unfortunately, I don't know how to properly deal with the listening delay. So far I haven't found any reasonable solution to deal with it..
Flavor
MicroWakeWord
Checklist
Describe the issue
Hi, I don't really understand why all the responses to my configuration are very slow. Every time I ask a question I have to wait a few seconds for an answer. You can see it very nicely in the video at the link: https://www.youtube.com/watch?v=TcIm_3co8XQ Is there anything I can do to make the sound effect trigger faster and the response to be generated faster?
I have Home Assistance installed on Proxmox running on an Intel NUC13ANHi7 13th Gen Core I7-1360P. I have no problem generating voices when I do it in the browser. You can also see in the photos that intent processing works really fast. Below is my configuration which I use in ESPHome:
Reproduction steps
Just run a voice command with Onju Voice.
Debug logs