media_player functionality using this config stopped working as of esphome 2024.5

rccoleman commented 5 months ago

Flavor

MicroWakeWord

Checklist

[X] This issue only contains 1 issue (if you have multiple issues, open one issue for each issue).
[X] This issue is not a duplicate issue of currently previous issues..
[X] I am able to consult debug logging for my ESPHome instance.

Describe the issue

I've been happily using the media_player support introduced with https://github.com/tetele/onju-voice-satellite/pull/39 for a while now to replace my Alexa devices for TTS and it no longer works with the just-released ESPHome 2024.5. I noticed this during the 2024.5 beta, but didn't have a chance to fully investigate. Media player supports starts working again when I go back to 2024.4.

The URL that's generated and shows up in the logs is playable in a browser without any trouble, so it's only the ESPHome device that's having issues with it. This probably belongs in the ESPHome repo, but since the media_player support was developed as part of this config and I don't have another ESPHome device that exposes a media_player to test basic functionality, I'm starting here.

Reproduction steps

Flash an Onju voice device with the config from here and this diff to pull in the latest microwakeword changes:

diff --git a/esphome/onju-voice-microwakeword.yaml b/esphome/onju-voice-microwakeword.yaml
index a2322fe..c530026 100644
--- a/esphome/onju-voice-microwakeword.yaml
+++ b/esphome/onju-voice-microwakeword.yaml
@@ -9,6 +9,10 @@ external_components:
       url: https://github.com/gnumpi/esphome_audio
       ref: main
     components: [ adf_pipeline, i2s_audio ]
+    refresh: 0s
+  - source: github://pr#6653
+    components: [micro_wake_word]
+    refresh: 0s

 esphome:
   name: "${name}"
@@ -196,9 +200,6 @@ media_player:
             old_volume = new_volume;

 micro_wake_word:
-  model: okay_nabu
-  # model: hey_jarvis
-  # model: alexa
   on_wake_word_detected:
     - if:
         condition: media_player.is_playing
@@ -206,6 +207,14 @@ micro_wake_word:
           - media_player.pause
     - voice_assistant.start:
         wake_word: !lambda return wake_word;
+  vad_model:
+    model: https://github.com/kahrendt/microWakeWord/releases/download/model/vad_model.json
+    sliding_window_average_size: 2
+    threshold:
+      upper: 0.95
+      lower: 0.5
+  models:
+    - model: okay_nabu

 voice_assistant:
   id: va

Debug logs

This is an attempt to use Nabu Casa cloud to generate an MP3 for TTS and then play that MP3 over the media_player entity exposed by the Onju (via the "browse media" popup in HA). I then tried to play a different media file the same way through the media browser in HA. The Onju device is able to play responses to commands, but I noticed that those come as WAV files and not MP3, so perhaps that's part of it.

I also understand from kahrendt that these bits aren't meaingful, so I'm ignoring them.

[10:04:00][W][micro_wake_word:205]: Not enough free bytes in ring buffer to store incoming audio data (free bytes=768, incoming bytes=1024). Resetting the ring buffer. Wake word detection accuracy will be reduced.

[10:00:00][I][app:100]: ESPHome version 2024.5.0 compiled on May 15 2024, 09:54:07
[10:00:00][I][app:102]: Project tetele.onju_voice_satellite version 1.0.0
[10:00:00][C][wifi:580]: WiFi:
[10:00:00][C][wifi:408]:   Local MAC: 80:65:99:A2:AD:A4
[10:00:00][C][wifi:413]:   SSID: [redacted]
[10:00:00][C][wifi:416]:   IP Address: 192.168.1.131
[10:00:00][C][wifi:420]:   BSSID: [redacted]
[10:00:00][C][wifi:421]:   Hostname: 'onju-voice-dr'
[10:00:00][C][wifi:423]:   Signal strength: -42 dB ▂▄▆█
[10:00:00][C][wifi:427]:   Channel: 6
[10:00:00][C][wifi:428]:   Subnet: 255.255.255.0
[10:00:00][C][wifi:429]:   Gateway: 192.168.1.1
[10:00:00][C][wifi:430]:   DNS1: 192.168.1.1
[10:00:00][C][wifi:431]:   DNS2: 0.0.0.0
[10:00:00][C][logger:185]: Logger:
[10:00:00][C][logger:186]:   Level: DEBUG
[10:00:00][C][logger:188]:   Log Baud Rate: 115200
[10:00:00][C][logger:189]:   Hardware UART: USB_SERIAL_JTAG
[10:00:00][C][template.number:050]: Template Number 'Touch threshold percentage'
[10:00:00][C][template.number:051]:   Optimistic: YES
[10:00:00][C][template.number:052]:   Update Interval: never
[10:00:00][C][esp32_rmt_led_strip:175]: ESP32 RMT LED Strip:
[10:00:00][C][esp32_rmt_led_strip:176]:   Pin: 11
[10:00:00][C][esp32_rmt_led_strip:177]:   Channel: 0
[10:00:00][C][esp32_rmt_led_strip:202]:   RGB Order: GRB
[10:00:00][C][esp32_rmt_led_strip:203]:   Max refresh rate: 0
[10:00:00][C][esp32_rmt_led_strip:204]:   Number of LEDs: 6
[10:00:00][C][switch.gpio:068]: GPIO Switch 'dac_mute'
[10:00:00][C][switch.gpio:091]:   Restore Mode: always OFF
[10:00:00][C][switch.gpio:031]:   Pin: GPIO21
[10:00:00][C][gpio.binary_sensor:015]: GPIO Binary Sensor 'Disable wake word'
[10:00:00][C][gpio.binary_sensor:016]:   Pin: GPIO38
[10:00:00][C][light:103]: Light 'leds'
[10:00:00][C][light:105]:   Default Transition Length: 0.0s
[10:00:00][C][light:106]:   Gamma Correct: 2.80
[10:00:00][C][light:103]: Light 'left_led'
[10:00:00][C][light:105]:   Default Transition Length: 0.1s
[10:00:00][C][light:106]:   Gamma Correct: 2.80
[10:00:00][C][light:103]: Light 'top_led'
[10:00:00][C][light:105]:   Default Transition Length: 0.1s
[10:00:00][C][light:106]:   Gamma Correct: 2.80
[10:00:00][C][light:103]: Light 'right_led'
[10:00:00][C][light:105]:   Default Transition Length: 0.1s
[10:00:00][C][light:106]:   Gamma Correct: 2.80
[10:00:00][C][template.switch:068]: Template Switch 'Use Wake Word'
[10:00:00][C][template.switch:091]:   Restore Mode: restore defaults to ON
[10:00:00][C][template.switch:057]:   Optimistic: YES
[10:00:00][C][psram:020]: PSRAM:
[10:00:00][C][psram:021]:   Available: YES
[10:00:00][C][psram:024]:   Size: 8191 KB
[10:00:00][C][i2s_audio:028]: I2SController:
[10:00:00][C][i2s_audio:029]:   AccessMode: duplex
[10:00:00][C][i2s_audio:030]:   Port: 0
[10:00:00][C][i2s_audio:032]:   Reader registered.
[10:00:00][C][i2s_audio:035]:   Writer registered.
[10:00:00][C][i2s_audio:138]: I2S-Writer (Fixed-CFG):
[10:00:00][C][i2s_audio:140]:   sample-rate: 16000 bits_per_sample: 32
[10:00:00][C][i2s_audio:141]:   channel_fmt: 0 channels: 2
[10:00:00][C][i2s_audio:142]:   use_apll: no, use_pdm: no
[10:00:00][C][i2s_audio:135]: I2S-Reader (Fixed-CFG):
[10:00:00][C][i2s_audio:140]:   sample-rate: 16000 bits_per_sample: 32
[10:00:00][C][i2s_audio:141]:   channel_fmt: 3 channels: 1
[10:00:00][C][i2s_audio:142]:   use_apll: no, use_pdm: no
[10:00:00][W][micro_wake_word:205]: Not enough free bytes in ring buffer to store incoming audio data (free bytes=640, incoming bytes=1024). Resetting the ring buffer. Wake word detection accuracy will be reduced.
[10:00:00][C][esp32_touch:073]: Config for ESP32 Touch Hub:
[10:00:00][C][esp32_touch:074]:   Meas cycle: 0.80ms
[10:00:00][C][esp32_touch:075]:   Sleep cycle: 2.00ms
[10:00:00][C][esp32_touch:095]:   Low Voltage Reference: 0.8V
[10:00:00][C][esp32_touch:115]:   High Voltage Reference: 2.4V
[10:00:00][C][esp32_touch:135]:   Voltage Attenuation: 0V
[10:00:00][C][esp32_touch:169]:   Filter mode: IIR_16
[10:00:00][C][esp32_touch:170]:   Debounce count: 2
[10:00:00][C][esp32_touch:171]:   Noise threshold coefficient: 0
[10:00:00][C][esp32_touch:172]:   Jitter filter step size: 0
[10:00:00][C][esp32_touch:191]:   Smooth level: IIR_2
[10:00:00][C][esp32_touch:213]:   Denoise grade: BIT8
[10:00:00][C][esp32_touch:245]:   Denoise capacitance level: L0
[10:00:00][C][esp32_touch:260]:   Touch Pad 'volume_down'
[10:00:00][C][esp32_touch:261]:     Pad: T4
[10:00:00][C][esp32_touch:262]:     Threshold: 500556
[10:00:00][C][esp32_touch:260]:   Touch Pad 'volume_up'
[10:00:00][C][esp32_touch:261]:     Pad: T2
[10:00:00][C][esp32_touch:262]:     Threshold: 559929
[10:00:00][C][esp32_touch:260]:   Touch Pad 'action'
[10:00:00][C][esp32_touch:261]:     Pad: T3
[10:00:00][C][esp32_touch:262]:     Threshold: 682873
[10:00:01][C][restart.button:017]: Restart Button 'Restart'
[10:00:01][C][captive_portal:088]: Captive Portal:
[10:00:01][C][mdns:115]: mDNS:
[10:00:01][C][mdns:116]:   Hostname: onju-voice-dr
[10:00:01][C][ota:096]: Over-The-Air Updates:
[10:00:01][C][ota:097]:   Address: onju-voice-dr.local:3232
[10:00:01][C][ota:100]:   Using Password.
[10:00:01][C][ota:103]:   OTA version: 2.
[10:00:01][C][api:139]: API Server:
[10:00:01][C][api:140]:   Address: onju-voice-dr.local:6053
[10:00:01][C][api:144]:   Using noise encryption: NO
[10:00:01][C][improv_serial:032]: Improv Serial:
[10:00:01][C][micro_wake_word:058]: microWakeWord models:
[10:00:01][C][micro_wake_word:023]:   - Wake Word: okay nabu
[10:00:01][C][micro_wake_word:024]:     Probability cutoff: 0.500
[10:00:01][C][micro_wake_word:025]:     Sliding window size: 10
[10:00:01][C][micro_wake_word:029]:   - VAD Model
[10:00:01][C][micro_wake_word:030]:     Upper threshold: 0.950
[10:00:01][C][micro_wake_word:031]:     Lower threshold: 0.500
[10:00:01][C][micro_wake_word:032]:     Sliding window size: 2
[10:00:01][C][esp_adf_pipeline.microphone:020]: ADF-Microphone
[10:00:01][C][adf_media_player:016]: ESP-ADF-MediaPlayer:
[10:00:01][C][adf_media_player:018]:   Number of ASPComponents: 3
[10:04:00][D][media_player:061]: 'Onju Voice Satellite Dining Room' - Setting
[10:04:00][D][media_player:068]:   Media URL: https://xxx/api/tts_proxy/56170f5429b35dea081bb659b884b475ca9329a9_en-us_5c97d21c48_cloud.mp3
[10:04:00][D][adf_media_player:030]: Got control call in state 1
[10:04:00][D][esp_adf_pipeline:050]: Starting request, current state UNINITIALIZED
[10:04:00][D][esp-idf:000]: I (248346) MP3_DECODER: MP3 init

[10:04:00][D][esp_adf_pipeline:358]: pipeline tag 0, http
[10:04:00][D][esp_adf_pipeline:358]: pipeline tag 1, decoder
[10:04:00][D][esp_adf_pipeline:358]: pipeline tag 2, resampler
[10:04:00][D][esp_adf_pipeline:358]: pipeline tag 3, i2s_out
[10:04:00][D][esp-idf:000]: I (248358) AUDIO_PIPELINE: link el->rb, el:0x3d81f0e0, tag:http, rb:0x3d81f768

[10:04:00][D][esp-idf:000]: I (248360) AUDIO_PIPELINE: link el->rb, el:0x3d81f2cc, tag:decoder, rb:0x3d8207a8

[10:04:00][D][esp-idf:000]: I (248362) AUDIO_PIPELINE: link el->rb, el:0x3d81f468, tag:resampler, rb:0x3d8217e8

[10:04:00][D][esp_adf_pipeline:370]: Setting up event listener.
[10:04:00][D][esp_adf_pipeline:302]: State changed from UNINITIALIZED to PREPARING
[10:04:00][I][adf_media_player:135]: got new pipeline state: 1
[10:04:00][D][adf_i2s_out:127]: Set final i2s settings: 16000
[10:04:00][D][esp_audio_processors:079]: New settings: SRC: rate: 16000, ch: 2 DST: rate: 16000, ch: 2 
[10:04:00][D][esp-idf:000]: I (248396) AUDIO_THREAD: The http task allocate stack on external memory

[10:04:00][D][esp-idf:000]: I (248398) AUDIO_ELEMENT: [http-0x3d81f0e0] Element task created

[10:04:00][D][esp-idf:000]: I (248401) AUDIO_THREAD: The decoder task allocate stack on external memory

[10:04:00][D][esp-idf:000]: I (248403) AUDIO_ELEMENT: [decoder-0x3d81f2cc] Element task created

[10:04:00][D][esp-idf:000][http]: I (248405) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1

[10:04:00][D][esp-idf:000][decoder]: I (248409) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[10:04:00][D][esp_audio_sources:097]: Streamer status: 2
[10:04:00][D][esp_audio_sources:098]: decoder status: 2
[10:04:00][W][micro_wake_word:205]: Not enough free bytes in ring buffer to store incoming audio data (free bytes=768, incoming bytes=1024). Resetting the ring buffer. Wake word detection accuracy will be reduced.
[10:04:00][W][micro_wake_word:205]: Not enough free bytes in ring buffer to store incoming audio data (free bytes=768, incoming bytes=1024). Resetting the ring buffer. Wake word detection accuracy will be reduced.
[10:04:00][W][micro_wake_word:205]: Not enough free bytes in ring buffer to store incoming audio data (free bytes=640, incoming bytes=1024). Resetting the ring buffer. Wake word detection accuracy will be reduced.
[10:04:01][W][micro_wake_word:205]: Not enough free bytes in ring buffer to store incoming audio data (free bytes=896, incoming bytes=1024). Resetting the ring buffer. Wake word detection accuracy will be reduced.
[10:04:01][W][micro_wake_word:205]: Not enough free bytes in ring buffer to store incoming audio data (free bytes=768, incoming bytes=1024). Resetting the ring buffer. Wake word detection accuracy will be reduced.
[10:04:02][W][micro_wake_word:205]: Not enough free bytes in ring buffer to store incoming audio data (free bytes=768, incoming bytes=1024). Resetting the ring buffer. Wake word detection accuracy will be reduced.
[10:04:02][W][micro_wake_word:205]: Not enough free bytes in ring buffer to store incoming audio data (free bytes=896, incoming bytes=1024). Resetting the ring buffer. Wake word detection accuracy will be reduced.
[10:04:02][D][esp-idf:000][http]: I (251090) HTTP_STREAM: total_bytes=10751

[10:04:02][I][HTTPStreamReader:129]: [ * ] Receive music info from mp3 decoder, sample_rates=24000, bits=16, ch=1
[10:04:02][D][adf_i2s_out:127]: Set final i2s settings: 16000
[10:04:02][D][esp_audio_processors:079]: New settings: SRC: rate: 24000, ch: 1 DST: rate: 16000, ch: 2 
[10:04:02][D][esp_audio_processors:088]: New settings: SRC: rate: 24000, ch: 1 DST: rate: 16000, ch: 2 
[10:04:02][D][esp-idf:000][decoder]: W (251184) AUDIO_ELEMENT: OUT-[decoder] AEL_IO_ABORT

[10:04:02][D][esp-idf:000][decoder]: W (251188) MP3_DECODER: output aborted -3

[10:04:02][D][esp-idf:000][decoder]: I (251193) MP3_DECODER: Closed

[10:04:02][D][esp-idf:000][http]: W (251197) AUDIO_ELEMENT: OUT-[http] AEL_IO_ABORT

[10:04:02][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[10:04:02][I][adf_media_player:135]: got new pipeline state: 2
[10:04:02][D][adf_i2s_out:127]: Set final i2s settings: 16000
[10:04:02][D][esp_audio_processors:079]: New settings: SRC: rate: 24000, ch: 1 DST: rate: 16000, ch: 2 
[10:04:02][D][esp-idf:000]: I (251224) AUDIO_THREAD: The resampler task allocate stack on external memory

[10:04:02][D][esp-idf:000]: I (251226) AUDIO_ELEMENT: [resampler-0x3d81f468] Element task created

[10:04:02][D][esp-idf:000]: I (251228) AUDIO_ELEMENT: [i2s_out-0x3d81f620] Element task created

[10:04:02][D][esp-idf:000]: I (251230) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8368303 Bytes, Inter:136672 Bytes, Dram:136672 Bytes

[10:04:03][D][esp-idf:000][http]: I (251234) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1

[10:04:03][D][esp-idf:000][decoder]: I (251237) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[10:04:03][D][esp-idf:000][decoder]: I (251239) MP3_DECODER: MP3 opened

[10:04:03][D][esp-idf:000][resampler]: I (251284) RSP_FILTER: sample rate of source data : 24000, channel of source data : 1, sample rate of destination data : 16000, channel of destination data : 2

[10:04:03][I][esp_adf_pipeline:214]: [ http ] status: 14
[10:04:03][I][esp_adf_pipeline:214]: [ i2s_out ] status: 12
[10:04:03][D][esp_adf_pipeline:131]: Check element [http] status, 2
[10:04:03][I][esp_adf_pipeline:214]: [ resampler ] status: 12
[10:04:03][D][esp_adf_pipeline:131]: Check element [http] status, 2
[10:04:03][W][micro_wake_word:205]: Not enough free bytes in ring buffer to store incoming audio data (free bytes=640, incoming bytes=1024). Resetting the ring buffer. Wake word detection accuracy will be reduced.
[10:04:03][W][micro_wake_word:205]: Not enough free bytes in ring buffer to store incoming audio data (free bytes=768, incoming bytes=1024). Resetting the ring buffer. Wake word detection accuracy will be reduced.
[10:04:03][W][micro_wake_word:205]: Not enough free bytes in ring buffer to store incoming audio data (free bytes=768, incoming bytes=1024). Resetting the ring buffer. Wake word detection accuracy will be reduced.
[10:04:04][W][micro_wake_word:205]: Not enough free bytes in ring buffer to store incoming audio data (free bytes=640, incoming bytes=1024). Resetting the ring buffer. Wake word detection accuracy will be reduced.
[10:04:04][W][micro_wake_word:205]: Not enough free bytes in ring buffer to store incoming audio data (free bytes=640, incoming bytes=1024). Resetting the ring buffer. Wake word detection accuracy will be reduced.
[10:04:05][W][micro_wake_word:205]: Not enough free bytes in ring buffer to store incoming audio data (free bytes=768, incoming bytes=1024). Resetting the ring buffer. Wake word detection accuracy will be reduced.
[10:04:05][D][esp-idf:000][http]: I (254037) HTTP_STREAM: total_bytes=10751

[10:04:05][I][esp_adf_pipeline:214]: [ http ] status: 12
[10:04:05][D][esp_adf_pipeline:131]: Check element [http] status, 3
[10:04:05][D][esp_adf_pipeline:131]: Check element [decoder] status, 3
[10:04:05][D][esp_adf_pipeline:131]: Check element [resampler] status, 3
[10:04:05][D][esp_adf_pipeline:131]: Check element [i2s_out] status, 3
[10:04:05][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[10:04:05][I][adf_media_player:135]: got new pipeline state: 3
[10:04:05][D][adf_i2s_out:127]: Set final i2s settings: 16000
[10:04:05][D][esp_audio_processors:079]: New settings: SRC: rate: 24000, ch: 1 DST: rate: 16000, ch: 2 
[10:04:05][W][micro_wake_word:205]: Not enough free bytes in ring buffer to store incoming audio data (free bytes=768, incoming bytes=1024). Resetting the ring buffer. Wake word detection accuracy will be reduced.
[10:04:05][I][esp_adf_pipeline:214]: [ decoder ] status: 12
[10:04:05][I][HTTPStreamReader:129]: [ * ] Receive music info from mp3 decoder, sample_rates=24000, bits=16, ch=1
[10:04:05][D][adf_i2s_out:127]: Set final i2s settings: 16000
[10:04:05][D][esp_audio_processors:079]: New settings: SRC: rate: 24000, ch: 1 DST: rate: 16000, ch: 2 
[10:04:06][D][esp-idf:000][http]: W (254502) HTTP_STREAM: No more data,errno:0, total_bytes:10751, rlen = 0

[10:04:06][D][esp-idf:000][http]: I (254506) AUDIO_ELEMENT: IN-[http] AEL_IO_DONE,0

[10:04:06][I][esp_adf_pipeline:214]: [ http ] status: 15
[10:04:06][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[10:04:06][I][adf_media_player:135]: got new pipeline state: 4
[10:04:06][W][micro_wake_word:205]: Not enough free bytes in ring buffer to store incoming audio data (free bytes=640, incoming bytes=1024). Resetting the ring buffer. Wake word detection accuracy will be reduced.
[10:04:06][D][esp-idf:000][decoder]: I (255188) AUDIO_ELEMENT: IN-[decoder] AEL_IO_DONE,-2

[10:04:07][D][esp-idf:000][decoder]: I (255553) MP3_DECODER: Closed

[10:04:07][D][esp-idf:000][resampler]: I (255649) AUDIO_ELEMENT: IN-[resampler] AEL_IO_DONE,-2

[10:04:07][D][esp-idf:000][i2s_out]: I (255696) AUDIO_ELEMENT: IN-[i2s_out] AEL_IO_DONE,-2

[10:04:07][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[10:04:07][I][adf_media_player:135]: got new pipeline state: 5
[10:04:53][I][ota:117]: Boot seems successful, resetting boot loop counter.
[10:04:53][D][esp32.preferences:114]: Saving 1 preferences to flash...
[10:04:53][D][esp32.preferences:143]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed

nyok92 commented 5 months ago

Same issue here

rccoleman commented 5 months ago

I tried playing a simple WAV file and it also failed. Looks like its trying to decode it as an MP3?

[08:43:48][D][media_player:061]: 'Onju Voice Satellite Dining Room' - Setting
[08:43:48][D][media_player:068]:   Media URL: https://xxx/media/local/broadcast.wav?authSig=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiIxNGUwYmQzMmVkZWM0ZjA4YWQwNjlkNjMyNzE4YWI5OSIsInBhdGgiOiIvbWVkaWEvbG9jYWwvYnJvYWRjYXN0LndhdiIsInBhcmFtcyI6W10sImlhdCI6MTcxNTg3NDIyOCwiZXhwIjoxNzE1OTYwNjI4fQ.fJvCswILFkyuPwHHoJU-shUs79ZXdUCV6gaVtNf4aTw
[08:43:48][D][adf_media_player:030]: Got control call in state 1
[08:43:48][D][esp_adf_pipeline:050]: Starting request, current state STOPPED
[08:43:48][D][esp_adf_pipeline:302]: State changed from STOPPED to PREPARING
[08:43:48][I][adf_media_player:135]: got new pipeline state: 1
[08:43:48][D][adf_i2s_out:127]: Set final i2s settings: 16000
[08:43:48][D][esp_audio_processors:079]: New settings: SRC: rate: 24000, ch: 1 DST: rate: 16000, ch: 2 
[08:43:48][D][esp-idf:000][http]: I (109097) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1

[08:43:48][D][esp-idf:000][decoder]: I (109099) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[08:43:48][D][esp_audio_sources:097]: Streamer status: 2
[08:43:48][D][esp_audio_sources:098]: decoder status: 2
[08:43:51][D][esp-idf:000][http]: I (112010) HTTP_CLIENT: Body received in fetch header state, 0x3fcc91cc, 1708

[08:43:51][D][esp-idf:000][http]: I (112014) HTTP_STREAM: total_bytes=44966

[08:43:51][D][esp-idf:000][http]: W (112124) HTTP_STREAM: No more data,errno:0, total_bytes:44966, rlen = 0

[08:43:51][D][esp-idf:000][http]: I (112127) AUDIO_ELEMENT: IN-[http] AEL_IO_DONE,0

[08:43:51][D][esp-idf:000][decoder]: I (112132) AUDIO_ELEMENT: IN-[decoder] AEL_IO_DONE,-2

[08:43:51][D][esp-idf:000][decoder]: E (112136) MPEG_READER: resync error (line 405)

[08:43:51][D][esp-idf:000][decoder]: E (112140) MP3_DECODER: Encountered error reading when MP3 init

[08:43:51][D][esp-idf:000][decoder]: E (112144) AUDIO_ELEMENT: [decoder] AEL_STATUS_ERROR_OPEN,-1

[08:43:51][D][esp-idf:000][decoder]: W (112148) AUDIO_ELEMENT: [decoder] audio_element_on_cmd_error,7

[08:43:51][D][esp-idf:000][decoder]: I (112153) MP3_DECODER: Closed

[08:43:58][I][esp_adf_pipeline:185]: Pipeline preparation timeout!
[08:43:58][D][esp-idf:000]: W (119081) AUDIO_PIPELINE: Without stop, st:1

[08:43:58][D][esp_adf_pipeline:302]: State changed from PREPARING to STOPPING
[08:43:58][I][adf_media_player:135]: got new pipeline state: 4
[08:43:58][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[08:43:58][I][adf_media_player:135]: got new pipeline state: 5

As before, the URL plays fine in a browser. Probably related to https://github.com/esphome/issues/issues/5791.

tbrasser commented 5 months ago

I changed from 32bit to 16bit output and changed the bitrate to 44100, matched that in music assistant player settings and now audio output is back!

using non-mww
using new esphome audio from gnumpi
using dev-next branch
latest MA / Esphome

Somehow the media_player doesn't know it's playing, but it is and also volume control works perfectly.

I'll try 24bits per sample and 48000 samplerate next

nyok92 commented 5 months ago

Hi @tbrasser , Could you share your ESP config ? tks

tbrasser commented 5 months ago

https://github.com/tbrasser/config/blob/main/esphome/includes/onju-voice.yaml

nyok92 commented 5 months ago

ok thanks @tbrasser using non-mww : what is it ?

tbrasser commented 5 months ago

Openwakeword instead of microwakeword

tbrasser commented 5 months ago

Basically I've grown to like using a custom wakeword but still have to train it longer, before thinking about trying mww. Also I have tablets running android ip camera apps rtsp streaming (via frigate) into streamAssist, so 1 or 2 extra openwakeword streams behind a local vad (or triggered via frigate motion/sound detection) doesn't hurt much.

rccoleman commented 5 months ago

This fixes it for me: https://github.com/esphome/issues/issues/5791#issuecomment-2118573396. Looks like bumping esp-idf to 4.4.7 is problematic.

tbrasser commented 5 months ago

So I cannot get sound output on .7 only on recommended version. Wthell...

Probably because of the esphome_audio branch I'm using @tetele any ideas?

Also, what samplerates & bit per sample does this support? I understand it (tries to) resample when there's a mismatch, but since I'm using music assistant I can match it on that end I'd like to configure everything nicely.

Until now I only got (correct) output when setting 44100 Hz & 16bit.

rccoleman commented 5 months ago

4.4.7 is the new recommended version as of ESPHome 2024.5, and I needed to downgrade to 4.4.6 to play audio through the media_player entity. I've been able to play 24KHz/32-bit mono TTS and 44.1KHz/32-bit stereo music MP3 content with just this change:

esp32:
  board: esp32-s3-devkitc-1
   framework:
     type: esp-idf
-    version: recommended
+    version: 4.4.6

gnumpi commented 5 months ago

Some background info: The cause of all these problems is the messy i2s driver of IDF-4. which has no way of setting different channel formats for RX and TX. This will change in IDF-5 (while probably introducing a bunch of new problems). That's why the duplex mode of https://github.com/gnumpi/esphome_audio was meant to be configured such, that both, TX and RX are set to the same maximum needed quality (e.g. stereo, 48kHz, 16bit) and let the re-samplers do the work of converting the actual audio to the needed format. But, as discussed here https://github.com/gnumpi/esphome_audio/issues/17, mWW seems to rely on the volume increase introduced by the microphone component when converting 32bit audio to the desired 16bit. This volume increase doesn't happen when the re-sampler is converting it beforehand. A dirty hack was to simply set both TX and RX i2s settings to the ones needed by the microphone and remove the re-sampler from the microphone pipeline. With IDF-4.4.7 the impact of setting the channel format on the TX direction changed, which causes the hack not to work anymore. What you can try is setting the channel config of audio_in to all_right instead of right*. But if it's working, its just another hack. Unfortunately, I haven't fully understand the impact of the change in 4.4.7 yet and there is no documentation at all. To get a better understanding what's going on, @rccoleman could you share your config and logs when playing 24kHz/32-bit mono TTS with 4.4.6 and 4.4.7? Thanks!

s00500 commented 5 months ago

Just ran into this very same issue, trying the downgrade to .6 today, might later be able to do some testing of the channel config, thanks for all your awesome work here!

rccoleman commented 5 months ago

To get a better understanding what's going on, @rccoleman could you share your config and logs when playing 24kHz/32-bit mono TTS with 4.4.6 and 4.4.7? Thanks!

Sure, will try to collect those later today. BTW, the sample that I'm using for that is just the TTS output from Nabu Casa Cloud. Hopefully wifi logs are enough? Getting serial logs from my Onju devices requires disassembling them, which I'd prefer to avoid.

Edit: See attached. logs_onju-voice-dr_logs-4.4.7.txt logs_onju-voice-dr_logs-4.4.6.txt onju-voice-microwakeword.yaml.txt

gnumpi commented 5 months ago

Thanks a lot for doing the tests and sharing the logs. I think I do understand now what's going on here. The onju only has one speaker, right? So setting the output also to mono will be sufficient, right? Can anyone with an onju try to test setting the channel of the adf-output also to right:

adf_pipeline:
  - platform: i2s_audio
    type: audio_out
    id: adf_i2s_out
    i2s_audio_id: i2s_shared
    i2s_dout_pin: GPIO12
    sample_rate: 16000
    adf_alc: true
    bits_per_sample: 32bit
    fixed_settings: true
    channel: right

  - platform: i2s_audio
    type: audio_in
    id: adf_i2s_in
    i2s_audio_id: i2s_shared
    i2s_din_pin: GPIO17
    channel: right
    pdm: false
    sample_rate: 16000
    bits_per_sample: 32bit
    fixed_settings: true

Unfortunately, there was also a bug in the ADF-I2SReader, you might also want to try using the dev version of the ADF-Pipeline:

external_components:
  - source:
      type: git
      url: https://github.com/gnumpi/esphome_audio
      ref: dev-next

nyok92 commented 5 months ago

With the previous config (esp-idf version 4.4.7) with dev-next & Chanel: right changes : STT and intents are working on my setup but audio out (TTS or music from music from Music assistant is not working.

Here are the logs when trying to play music from MA (enforce MP3 is set) :

INFO ESPHome 2024.5.2
INFO Reading configuration /config/esphome/nest-mini.yaml...
INFO Starting log output from XXX using esphome API
INFO Successfully connected to nest-mini @ XXX in 0.142s
INFO Successful handshake with nest-mini @ XXX in 0.068s
[14:25:18][I][app:100]: ESPHome version 2024.5.2 compiled on May 24 2024, 14:14:49
[14:25:18][I][app:102]: Project tetele.onju_voice_satellite version 1.0.0
[14:25:18][C][wifi:580]: WiFi:
[14:25:18][C][wifi:408]:   Local MAC: XXX
[14:25:18][C][wifi:413]:   SSID: [redacted]
[14:25:18][C][wifi:416]:   IP Address: XXX
[14:25:18][C][wifi:420]:   BSSID: [redacted]
[14:25:18][C][wifi:421]:   Hostname: 'nest-mini'
[14:25:18][C][wifi:423]:   Signal strength: -59 dB ▂▄▆█
[14:25:18][V][wifi:425]:   Priority: 0.0
[14:25:18][C][wifi:427]:   Channel: 6
[14:25:18][C][wifi:428]:   Subnet: 255.255.255.0
[14:25:18][C][wifi:429]:   Gateway: 192.168.1.254
[14:25:18][C][wifi:430]:   DNS1: 1.1.1.1
[14:25:18][C][wifi:431]:   DNS2: 1.0.0.1
[14:25:18][C][logger:185]: Logger:
[14:25:18][C][logger:186]:   Level: VERBOSE
[14:25:18][C][logger:188]:   Log Baud Rate: 115200
[14:25:18][C][logger:189]:   Hardware UART: USB_SERIAL_JTAG
[14:25:18][C][logger:193]:   Level for 'micro_wake_word': DEBUG
[14:25:18][C][template.number:050]: Template Number 'Touch threshold percentage'
[14:25:18][C][template.number:051]:   Optimistic: YES
[14:25:18][C][template.number:052]:   Update Interval: never
[14:25:18][C][esp32_rmt_led_strip:175]: ESP32 RMT LED Strip:
[14:25:18][C][esp32_rmt_led_strip:176]:   Pin: 11
[14:25:18][C][esp32_rmt_led_strip:177]:   Channel: 0
[14:25:18][C][esp32_rmt_led_strip:202]:   RGB Order: GRB
[14:25:18][C][esp32_rmt_led_strip:203]:   Max refresh rate: 0
[14:25:18][C][esp32_rmt_led_strip:204]:   Number of LEDs: 6
[14:25:18][C][switch.gpio:068]: GPIO Switch 'dac_mute'
[14:25:18][C][switch.gpio:091]:   Restore Mode: always OFF
[14:25:18][C][switch.gpio:031]:   Pin: GPIO21
[14:25:18][C][gpio.binary_sensor:015]: GPIO Binary Sensor 'Disable wake word'
[14:25:18][C][gpio.binary_sensor:016]:   Pin: GPIO38
[14:25:18][C][light:103]: Light 'leds'
[14:25:18][C][light:105]:   Default Transition Length: 0.0s
[14:25:18][C][light:106]:   Gamma Correct: 2.80
[14:25:18][C][light:103]: Light 'left_led'
[14:25:18][C][light:105]:   Default Transition Length: 0.1s
[14:25:18][C][light:106]:   Gamma Correct: 2.80
[14:25:18][C][light:103]: Light 'top_led'
[14:25:18][C][light:105]:   Default Transition Length: 0.1s
[14:25:18][C][light:106]:   Gamma Correct: 2.80
[14:25:18][C][light:103]: Light 'right_led'
[14:25:18][C][light:105]:   Default Transition Length: 0.1s
[14:25:18][C][light:106]:   Gamma Correct: 2.80
[14:25:18][C][template.switch:068]: Template Switch 'Use Wake Word'
[14:25:18][C][template.switch:091]:   Restore Mode: restore defaults to ON
[14:25:18][C][template.switch:057]:   Optimistic: YES
[14:25:18][C][psram:020]: PSRAM:
[14:25:18][C][psram:021]:   Available: YES
[14:25:18][C][psram:024]:   Size: 8191 KB
[14:25:18][C][i2s_audio:028]: I2SController:
[14:25:18][C][i2s_audio:029]:   AccessMode: duplex
[14:25:18][C][i2s_audio:030]:   Port: 0
[14:25:18][C][i2s_audio:032]:   Reader registered.
[14:25:18][C][i2s_audio:035]:   Writer registered.
[14:25:18][C][i2s_audio:139]: I2S-Writer (Fixed-CFG):
[14:25:18][C][i2s_audio:141]:   sample-rate: 16000 bits_per_sample: 32
[14:25:18][C][i2s_audio:142]:   channel_fmt: 3 channels: 1
[14:25:18][C][i2s_audio:143]:   use_apll: no, use_pdm: no
[14:25:18][C][i2s_audio:136]: I2S-Reader (Fixed-CFG):
[14:25:18][C][i2s_audio:141]:   sample-rate: 16000 bits_per_sample: 32
[14:25:18][C][i2s_audio:142]:   channel_fmt: 3 channels: 1
[14:25:18][C][i2s_audio:143]:   use_apll: no, use_pdm: no
[14:25:18][C][esp32_touch:073]: Config for ESP32 Touch Hub:
[14:25:18][C][esp32_touch:074]:   Meas cycle: 0.80ms
[14:25:18][C][esp32_touch:075]:   Sleep cycle: 2.00ms
[14:25:18][C][esp32_touch:095]:   Low Voltage Reference: 0.8V
[14:25:18][C][esp32_touch:115]:   High Voltage Reference: 2.4V
[14:25:18][C][esp32_touch:135]:   Voltage Attenuation: 0V
[14:25:18][C][esp32_touch:169]:   Filter mode: IIR_16
[14:25:18][C][esp32_touch:170]:   Debounce count: 2
[14:25:18][C][esp32_touch:171]:   Noise threshold coefficient: 0
[14:25:18][C][esp32_touch:172]:   Jitter filter step size: 0
[14:25:18][C][esp32_touch:191]:   Smooth level: IIR_2
[14:25:18][C][esp32_touch:213]:   Denoise grade: BIT8
[14:25:18][C][esp32_touch:245]:   Denoise capacitance level: L0
[14:25:18][C][esp32_touch:260]:   Touch Pad 'volume_down'
[14:25:18][C][esp32_touch:261]:     Pad: T4
[14:25:18][C][esp32_touch:262]:     Threshold: 467415
[14:25:18][C][esp32_touch:260]:   Touch Pad 'volume_up'
[14:25:18][C][esp32_touch:261]:     Pad: T2
[14:25:18][C][esp32_touch:262]:     Threshold: 501065
[14:25:18][C][esp32_touch:260]:   Touch Pad 'action'
[14:25:18][C][esp32_touch:261]:     Pad: T3
[14:25:18][C][esp32_touch:262]:     Threshold: 611391
[14:25:19][C][captive_portal:088]: Captive Portal:
[14:25:19][C][mdns:115]: mDNS:
[14:25:19][C][mdns:116]:   Hostname: nest-mini
[14:25:19][V][mdns:117]:   Services:
[14:25:19][V][mdns:119]:   - _esphomelib, _tcp, 6053
[14:25:19][V][mdns:121]:     TXT: friendly_name = Onju Voice Satellite
[14:25:19][V][mdns:121]:     TXT: version = 2024.5.2
[14:25:19][V][mdns:121]:     TXT: mac = XXX
[14:25:19][V][mdns:121]:     TXT: platform = ESP32
[14:25:19][V][mdns:121]:     TXT: board = esp32-s3-devkitc-1
[14:25:19][V][mdns:121]:     TXT: network = wifi
[14:25:19][V][mdns:121]:     TXT: api_encryption = Noise_NNpsk0_25519_ChaChaPoly_SHA256
[14:25:19][V][mdns:121]:     TXT: project_name = tetele.onju_voice_satellite
[14:25:19][V][mdns:121]:     TXT: project_version = 1.0.0
[14:25:19][V][mdns:121]:     TXT: package_import_url = github://tetele/onju-voice-satellite/esphome/onju-voice-microwakeword.yaml@main
[14:25:19][C][ota:096]: Over-The-Air Updates:
[14:25:19][C][ota:097]:   Address: nest-mini.local:3232
[14:25:19][C][ota:100]:   Using Password.
[14:25:19][C][ota:103]:   OTA version: 2.
[14:25:19][C][api:139]: API Server:
[14:25:19][C][api:140]:   Address: nest-mini.local:6053
[14:25:19][C][api:142]:   Using noise encryption: YES
[14:25:19][C][improv_serial:032]: Improv Serial:
[14:25:19][C][micro_wake_word:058]: microWakeWord models:
[14:25:19][C][micro_wake_word:023]:   - Wake Word: hey mycroft
[14:25:19][C][micro_wake_word:024]:     Probability cutoff: 0.840
[14:25:19][C][micro_wake_word:025]:     Sliding window size: 10
[14:25:19][C][micro_wake_word:029]:   - VAD Model
[14:25:19][C][micro_wake_word:030]:     Upper threshold: 0.950
[14:25:19][C][micro_wake_word:031]:     Lower threshold: 0.500
[14:25:19][C][micro_wake_word:032]:     Sliding window size: 2
[14:25:19][C][esp_adf_pipeline.microphone:020]: ADF-Microphone
[14:25:19][C][adf_media_player:016]: ESP-ADF-MediaPlayer:
[14:25:19][C][adf_media_player:018]:   MP_ANNOUNCE enabled
[14:25:19][C][adf_media_player:021]:   Number of ADFComponents: 3
[14:26:48][D][media_player:061]: 'Onju Voice Satellite' - Setting
[14:26:48][D][media_player:068]:   Media URL: http://192.168.1.5:8097/single/media_player.nest_mini_onju_voice_satellite/dc5aaf30037145ccaef0908456ac87cc.mp3?ts=1716553608
[14:26:48][D][esp_audio_sources:058]: Set new uri: http://192.168.1.5:8097/single/media_player.nest_mini_onju_voice_satellite/dc5aaf30037145ccaef0908456ac87cc.mp3?ts=1716553608
[14:26:48][D][adf_media_player:054]: Got control call in state IDLE
[14:26:48][D][adf_media_player:055]: req_track stream uri: http://192.168.1.5:8097/single/media_player.nest_mini_onju_voice_satellite/dc5aaf30037145ccaef0908456ac87cc.mp3?ts=1716553608
[14:26:48][D][esp_adf_pipeline:060]: Starting request, current state STOPPED
[14:26:48][D][esp_adf_pipeline:437]: [MediaPlayer] Pipeline changed from STOPPED to PREPARING. (REQ: 0)
[14:26:48][I][adf_media_player:189]: got new pipeline state: 3, while in MP state IDLE
[14:26:48][D][adf_i2s_out:127]: Set final i2s settings: 16000
[14:26:48][I][adf_media_player:252]: current mp state: PLAYING
[14:26:48][I][adf_media_player:253]: anouncement: false
[14:26:48][I][adf_media_player:254]: play_intent: false
[14:26:48][I][adf_media_player:255]: current_uri_: yes
[14:26:48][D][esp_audio_sources:063]: Prepare elements called (initial_call)!
[14:26:48][D][esp_audio_sources:097]: Use fixed settings: no
[14:26:48][D][esp_audio_sources:098]: Streamer status: 6
[14:26:48][D][esp_audio_sources:099]: decoder status: 6
[14:26:48][D][esp_audio_sources:100]: stream uri: http://192.168.1.5:8097/single/media_player.nest_mini_onju_voice_satellite/dc5aaf30037145ccaef0908456ac87cc.mp3?ts=1716553608
[14:26:48][D][adf_audio_element:108]: Preparing [http]...
[14:26:48][D][adf_audio_element:108]: Preparing [decoder]...
[14:26:48][D][adf_audio_element:108]: Preparing [resampler]...
[14:26:48][D][adf_audio_element:108]: Preparing [i2s_out]...
[14:26:48][D][adf_audio_element:165]: Resuming [http]...
[14:26:48][D][adf_audio_element:172]: [http] Sending resume command.
[14:26:48][V][adf_audio_element:035]: [http]evt internal cmd = 5
[14:26:48][D][adf_audio_element:165]: Resuming [decoder]...
[14:26:48][D][adf_audio_element:172]: [decoder] Sending resume command.
[14:26:48][V][adf_audio_element:035]: [decoder]evt internal cmd = 5
[14:26:48][V][esp-idf:000][decoder]: I (393518) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[14:26:48][V][esp-idf:000][decoder]: I (393521) MP3_DECODER: MP3 opened

[14:26:48][D][adf_audio_element:191]: [http] Checking State, got 79
[14:26:48][D][adf_audio_element:191]: [decoder] Checking State, got 79
[14:26:50][I][HTTPStreamReader:193]: [ * ] Receive music info from mp3 decoder, sample_rates=44100, bits=16, ch=2
[14:26:50][D][adf_i2s_out:127]: Set final i2s settings: 16000
[14:26:50][D][esp_audio_processors:090]: Received request from: HTTPStreamReader
[14:26:50][D][esp_audio_processors:095]: New settings: SRC: rate: 44100, ch: 2 DST: rate: 16000, ch: 1 
[14:26:50][D][adf_audio_element:108]: Preparing [http]...
[14:26:50][D][adf_audio_element:108]: Preparing [decoder]...
[14:26:50][V][esp-idf:000][decoder]: W (395431) AUDIO_ELEMENT: OUT-[decoder] AEL_IO_ABORT

[14:26:50][V][esp-idf:000][decoder]: W (395434) MP3_DECODER: output aborted -3

[14:26:50][V][esp-idf:000][decoder]: I (395436) MP3_DECODER: Closed

[14:26:50][D][esp_audio_sources:153]: Preparation done!
[14:26:51][D][esp_adf_pipeline:334]: wait for preparation, done
[14:26:51][D][esp_adf_pipeline:437]: [MediaPlayer] Pipeline changed from PREPARING to STARTING. (REQ: 0)
[14:26:51][I][adf_media_player:189]: got new pipeline state: 5, while in MP state PLAYING
[14:26:51][I][adf_media_player:252]: current mp state: PLAYING
[14:26:51][I][adf_media_player:253]: anouncement: false
[14:26:51][I][adf_media_player:254]: play_intent: false
[14:26:51][I][adf_media_player:255]: current_uri_: yes
[14:26:51][D][adf_audio_element:165]: Resuming [http]...
[14:26:51][D][adf_audio_element:172]: [http] Sending resume command.
[14:26:51][V][adf_audio_element:035]: [http]evt internal cmd = 5
[14:26:51][D][adf_audio_element:165]: Resuming [decoder]...
[14:26:51][D][adf_audio_element:172]: [decoder] Sending resume command.
[14:26:51][V][adf_audio_element:035]: [decoder]evt internal cmd = 5
[14:26:51][V][esp-idf:000][decoder]: I (395593) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[14:26:51][V][esp-idf:000][resampler]: I (395596) AUDIO_ELEMENT: [resampler] AEL_MSG_CMD_RESUME,state:1

[14:26:51][D][adf_audio_element:165]: Resuming [i2s_out]...
[14:26:51][D][adf_audio_element:172]: [i2s_out] Sending resume command.
[14:26:51][V][adf_audio_element:035]: [i2s_out]evt internal cmd = 5
[14:26:51][V][esp-idf:000][i2s_out]: I (395608) AUDIO_ELEMENT: [i2s_out] AEL_MSG_CMD_RESUME,state:1

[14:26:51][V][esp-idf:000][i2s_out]: I (395610) I2S_STREAM: AUDIO_STREAM_WRITER

[14:26:51][D][adf_audio_element:191]: [http] Checking State, got 79
[14:26:51][D][adf_audio_element:191]: [decoder] Checking State, got 15
[14:26:51][D][adf_audio_element:191]: [resampler] Checking State, got 79
[14:26:51][D][adf_audio_element:191]: [i2s_out] Checking State, got 75
[14:26:51][D][adf_audio_element:191]: [http] Checking State, got 79
[14:26:51][D][adf_audio_element:191]: [decoder] Checking State, got 15
[14:26:51][I][esp_adf_pipeline:124]: [ i2s_out ] status: 12
[14:26:51][D][adf_audio_element:191]: [http] Checking State, got 79
[14:26:51][D][adf_audio_element:191]: [decoder] Checking State, got 15
[14:26:51][D][adf_audio_element:191]: [http] Checking State, got 79
[14:26:51][D][adf_audio_element:191]: [decoder] Checking State, got 15
[14:26:51][D][adf_audio_element:191]: [http] Checking State, got 79
[14:26:51][D][adf_audio_element:191]: [decoder] Checking State, got 15
[14:26:51][D][adf_audio_element:191]: [http] Checking State, got 79
[14:26:51][D][adf_audio_element:191]: [decoder] Checking State, got 15
[14:26:51][D][adf_audio_element:191]: [http] Checking State, got 79
[14:26:51][D][adf_audio_element:191]: [decoder] Checking State, got 15
[14:26:51][V][esp-idf:000][resampler]: I (395797) RSP_FILTER: sample rate of source data : 44100, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1

[14:26:51][V][esp-idf:000][decoder]: I (395803) MP3_DECODER: MP3 opened

[14:26:51][I][esp_adf_pipeline:124]: [ resampler ] status: 12
[14:26:51][V][esp-idf:000][http]: I (395810) HTTP_STREAM: total_bytes=0

[14:26:51][D][adf_audio_element:191]: [http] Checking State, got 78
[14:26:51][D][adf_audio_element:191]: [decoder] Checking State, got 79
[14:26:51][I][esp_adf_pipeline:124]: [ http ] status: 12
[14:26:51][D][esp_adf_pipeline:437]: [MediaPlayer] Pipeline changed from STARTING to RUNNING. (REQ: 0)
[14:26:51][I][adf_media_player:189]: got new pipeline state: 6, while in MP state PLAYING
[14:26:51][I][adf_media_player:252]: current mp state: PLAYING
[14:26:51][I][adf_media_player:253]: anouncement: false
[14:26:51][I][adf_media_player:254]: play_intent: false
[14:26:51][I][adf_media_player:255]: current_uri_: yes
[14:26:53][I][esp_adf_pipeline:124]: [ decoder ] status: 12
[14:26:53][I][HTTPStreamReader:193]: [ * ] Receive music info from mp3 decoder, sample_rates=44100, bits=16, ch=2

Then with cloud TTS from Nabucasa :

[14:30:21][D][media_player:061]: 'Onju Voice Satellite' - Setting
[14:30:21][D][media_player:068]:   Media URL: http://192.168.1.5:8123/api/tts_proxy/08429f50ce351cb272692f18bc3917d8caf9b842_fr-fr_e43c31a13b_cloud.mp3
[14:30:21][D][esp_audio_sources:058]: Set new uri: http://192.168.1.5:8123/api/tts_proxy/08429f50ce351cb272692f18bc3917d8caf9b842_fr-fr_e43c31a13b_cloud.mp3
[14:30:21][D][adf_media_player:054]: Got control call in state IDLE
[14:30:21][D][adf_media_player:055]: req_track stream uri: http://192.168.1.5:8123/api/tts_proxy/08429f50ce351cb272692f18bc3917d8caf9b842_fr-fr_e43c31a13b_cloud.mp3
[14:30:21][D][esp_adf_pipeline:060]: Starting request, current state STOPPED
[14:30:21][D][esp_adf_pipeline:437]: [MediaPlayer] Pipeline changed from STOPPED to PREPARING. (REQ: 0)
[14:30:21][I][adf_media_player:189]: got new pipeline state: 3, while in MP state IDLE
[14:30:21][D][adf_i2s_out:127]: Set final i2s settings: 16000
[14:30:21][I][adf_media_player:252]: current mp state: PLAYING
[14:30:21][I][adf_media_player:253]: anouncement: false
[14:30:21][I][adf_media_player:254]: play_intent: false
[14:30:21][I][adf_media_player:255]: current_uri_: yes
[14:30:21][D][esp_audio_sources:063]: Prepare elements called (initial_call)!
[14:30:21][D][esp_audio_sources:097]: Use fixed settings: no
[14:30:21][D][esp_audio_sources:098]: Streamer status: 5
[14:30:21][D][esp_audio_sources:099]: decoder status: 5
[14:30:21][D][esp_audio_sources:100]: stream uri: http://192.168.1.5:8123/api/tts_proxy/08429f50ce351cb272692f18bc3917d8caf9b842_fr-fr_e43c31a13b_cloud.mp3
[14:30:21][D][adf_audio_element:108]: Preparing [http]...
[14:30:21][D][adf_audio_element:108]: Preparing [decoder]...
[14:30:21][D][adf_audio_element:108]: Preparing [resampler]...
[14:30:21][D][adf_audio_element:108]: Preparing [i2s_out]...
[14:30:21][D][adf_audio_element:165]: Resuming [http]...
[14:30:21][D][adf_audio_element:172]: [http] Sending resume command.
[14:30:21][V][adf_audio_element:035]: [http]evt internal cmd = 5
[14:30:21][D][adf_audio_element:165]: Resuming [decoder]...
[14:30:21][D][adf_audio_element:172]: [decoder] Sending resume command.
[14:30:21][V][adf_audio_element:035]: [decoder]evt internal cmd = 5
[14:30:21][V][esp-idf:000][decoder]: I (606223) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[14:30:21][V][esp-idf:000][decoder]: I (606230) MP3_DECODER: MP3 opened

[14:30:21][V][esp-idf:000][http]: I (606236) HTTP_CLIENT: Body received in fetch header state, 0x3fcc8947, 1841

[14:30:21][V][esp-idf:000][http]: I (606238) HTTP_STREAM: total_bytes=13065

[14:30:21][D][adf_audio_element:191]: [http] Checking State, got 78
[14:30:21][D][adf_audio_element:191]: [decoder] Checking State, got 79
[14:30:21][I][HTTPStreamReader:193]: [ * ] Receive music info from mp3 decoder, sample_rates=24000, bits=16, ch=1
[14:30:21][D][adf_i2s_out:127]: Set final i2s settings: 16000
[14:30:21][D][esp_audio_processors:090]: Received request from: HTTPStreamReader
[14:30:21][D][esp_audio_processors:095]: New settings: SRC: rate: 24000, ch: 1 DST: rate: 16000, ch: 1 
[14:30:21][D][adf_audio_element:108]: Preparing [http]...
[14:30:21][D][adf_audio_element:108]: Preparing [decoder]...
[14:30:21][V][esp-idf:000][decoder]: W (606286) AUDIO_ELEMENT: OUT-[decoder] AEL_IO_ABORT

[14:30:21][V][esp-idf:000][decoder]: W (606289) MP3_DECODER: output aborted -3

[14:30:21][V][esp-idf:000][decoder]: I (606291) MP3_DECODER: Closed

[14:30:21][D][esp_audio_sources:153]: Preparation done!
[14:30:21][D][esp_adf_pipeline:334]: wait for preparation, done
[14:30:21][D][esp_adf_pipeline:437]: [MediaPlayer] Pipeline changed from PREPARING to STARTING. (REQ: 0)
[14:30:21][I][adf_media_player:189]: got new pipeline state: 5, while in MP state PLAYING
[14:30:21][I][adf_media_player:252]: current mp state: PLAYING
[14:30:21][I][adf_media_player:253]: anouncement: false
[14:30:21][I][adf_media_player:254]: play_intent: false
[14:30:21][I][adf_media_player:255]: current_uri_: yes
[14:30:21][D][adf_audio_element:165]: Resuming [http]...
[14:30:21][D][adf_audio_element:172]: [http] Sending resume command.
[14:30:21][V][adf_audio_element:035]: [http]evt internal cmd = 5
[14:30:21][D][adf_audio_element:165]: Resuming [decoder]...
[14:30:21][D][adf_audio_element:172]: [decoder] Sending resume command.
[14:30:21][V][adf_audio_element:035]: [decoder]evt internal cmd = 5
[14:30:21][V][esp-idf:000][decoder]: I (606448) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[14:30:21][V][esp-idf:000][decoder]: I (606460) MP3_DECODER: MP3 opened

[14:30:21][D][adf_audio_element:165]: Resuming [i2s_out]...
[14:30:21][D][adf_audio_element:172]: [i2s_out] Sending resume command.
[14:30:21][V][adf_audio_element:035]: [i2s_out]evt internal cmd = 5
[14:30:21][I][esp_adf:000]]: [ resampler ] status: 12
ERROR Fatal error: protocol.data_received() call failed.
protocol: <aioesphomeapi._frame_helper.noise.APINoiseFrameHelper object at 0xffff7df0f690>
transport: <_SelectorSocketTransport fd=6 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "/usr/lib/python3.11/asyncio/selector_events.py", line 1009, in _read_ready__data_received
    self._protocol.data_received(data)
  File "aioesphomeapi/_frame_helper/noise.py", line 136, in aioesphomeapi._frame_helper.noise.APINoiseFrameHelper.data_received
  File "aioesphomeapi/_frame_helper/noise.py", line 163, in aioesphomeapi._frame_helper.noise.APINoiseFrameHelper.data_received
  File "aioesphomeapi/_frame_helper/noise.py", line 319, in aioesphomeapi._frame_helper.noise.APINoiseFrameHelper._handle_frame
  File "/usr/local/lib/python3.11/dist-packages/noise/state.py", line 74, in decrypt_with_ad
    plaintext = self.cipher.decrypt(self.k, self.n, ad, ciphertext)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/noise/backends/default/ciphers.py", line 13, in decrypt
    return self.cipher.decrypt(nonce=self.format_nonce(n), data=ciphertext, associated_data=ad)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/chacha20poly1305_reuseable/__init__.py", line 127, in chacha20poly1305_reuseable.ChaCha20Poly1305Reusable.decrypt
  File "src/chacha20poly1305_reuseable/__init__.py", line 147, in chacha20poly1305_reuseable.ChaCha20Poly1305Reusable.decrypt
  File "src/chacha20poly1305_reuseable/__init__.py", line 263, in chacha20poly1305_reuseable._decrypt_with_fixed_nonce_len
  File "src/chacha20poly1305_reuseable/__init__.py", line 273, in chacha20poly1305_reuseable._decrypt_data
cryptography.exceptions.InvalidTag
WARNING nest-mini @ 192.168.1.132: Connection error occurred: nest-mini @ 192.168.1.132: Invalid encryption key: received_name=nest-mini
INFO Processing unexpected disconnect from ESPHome API for nest-mini @ 192.168.1.132
WARNING Disconnected from API
INFO Successfully connected to nest-mini @ 192.168.1.132 in 0.006s
INFO Successful handshake with nest-mini @ 192.168.1.132 in 0.087s
[14:30:22][V][esp-idf:000][http]: W (606848) HTTP_STREAM: No more data,errno:0, total_bytes:13065, rlen = 0

[14:30:22][V][esp-idf:000][http]: I (606850) AUDIO_ELEMENT: IN-[http] AEL_IO_DONE,0

[14:30:22][I][esp_adf_pipeline:124]: [ http ] status: 15
[14:30:22][I][esp_adf_pipeline:127]: current state: RUNNING
[14:30:22][D][esp_adf_pipeline:437]: [MediaPlayer] Pipeline changed from RUNNING to FINISHING. (REQ: 0)
[14:30:22][I][adf_media_player:189]: got new pipeline state: 7, while in MP state PLAYING
[14:30:22][I][adf_media_player:252]: current mp state: PLAYING
[14:30:22][I][adf_media_player:253]: anouncement: false
[14:30:22][I][adf_media_player:254]: play_intent: false
[14:30:22][I][adf_media_player:255]: current_uri_: yes
[14:30:22][V][esp-idf:000][decoder]: I (607198) AUDIO_ELEMENT: IN-[decoder] AEL_IO_DONE,-2

[14:30:22][V][esp-idf:000][decoder]: I (607389) MP3_DECODER: Closed

[14:30:22][I][esp_adf_pipeline:124]: [ decoder ] status: 15
[14:30:22][I][esp_adf_pipeline:127]: current state: FINISHING
[14:30:22][V][esp-idf:000][resampler]: I (607436) AUDIO_ELEMENT: IN-[resampler] AEL_IO_DONE,-2

[14:30:22][I][esp_adf_pipeline:124]: [ resampler ] status: 15
[14:30:22][I][esp_adf_pipeline:127]: current state: FINISHING
[14:30:22][V][esp-idf:000][i2s_out]: I (607484) AUDIO_ELEMENT: IN-[i2s_out] AEL_IO_DONE,-2

[14:30:23][I][esp_adf_pipeline:124]: [ i2s_out ] status: 15
[14:30:23][I][esp_adf_pipeline:127]: current state: FINISHING
[14:30:23][D][esp_adf_pipeline:437]: [MediaPlayer] Pipeline changed from FINISHING to STOPPED. (REQ: 1)
[14:30:23][I][adf_media_player:189]: got new pipeline state: 4, while in MP state PLAYING
[14:30:23][I][adf_media_player:252]: current mp state: IDLE
[14:30:23][I][adf_media_player:253]: anouncement: false
[14:30:23][I][adf_media_player:254]: play_intent: false
[14:30:23][I][adf_media_player:255]: current_uri_: yes

So not working (no output signal) with :

out: Channel right & 44100Hz 16bits
out: Channel left_right & 16000Hz 16bits

Working but output signal speed up and random signal loss :

out: Channel left & 16000Hz 32bits
in: Mic is working correctly

Which is working is :

out: Channel left & 16000Hz 16bits
in: Mic is not working anymore

--> it seems that bits/sample settings need to match between in & out

Out working bu still no mic :

out: Channel left & 16000Hz 16bits
in: 16000 & 16bits

With out in mono mode (left) & 16kHz 32bits, out signal is not correct but mic is working. Lowering down Out to 16 bits (out signal is correct) but mic is not working anymore. OUT mono mode is only working with LEFT channel @ 16bits If IN & OUT have different bit/sample : mic is not working anymore & Mic is not working with another settings than 16Khz 32bits : is ti because of duplex mode ?

==> Increase MIC Frenquency to 44.1 or lowering down Bit/sample to 16bits : buffer warrning when enabling wake word [W][micro_wake_word:205]: Not enough free bytes in ring buffer to store incoming audio data (free bytes=896, incoming bytes=1024). Resetting the ring buffer. Wake word detection accuracy will be reduced.

With wake word disable and manual voice start :

[16:31:01][D][binary_sensor:036]: 'action': Sending state ON
[16:31:01][D][binary_sensor:036]: 'action': Sending state OFF
[16:31:01][D][action_click:368]: Voice assistant is running: no
[16:31:01][D][voice_assistant:439]: State changed from IDLE to START_MICROPHONE
[16:31:01][D][voice_assistant:445]: Desired state set to START_PIPELINE
[16:31:01][D][voice_assistant:163]: Starting Microphone
[16:31:01][D][esp_adf_pipeline:060]: Starting request, current state STOPPED
[16:31:01][D][voice_assistant:439]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[16:31:01][D][esp_adf_pipeline:437]: [ADFMicrophone] Pipeline changed from STOPPED to PREPARING. (REQ: 0)
[16:31:01][D][adf_audio_element:108]: Preparing [i2s_in]...
[16:31:01][D][adf_audio_element:108]: Preparing [pcm_reader]...
[16:31:01][D][esp_adf_pipeline:334]: wait for preparation, done
[16:31:01][D][esp_adf_pipeline:437]: [ADFMicrophone] Pipeline changed from PREPARING to STARTING. (REQ: 0)
[16:31:01][D][adf_audio_element:165]: Resuming [i2s_in]...
[16:31:01][D][adf_audio_element:172]: [i2s_in] Sending resume command.
[16:31:01][V][adf_audio_element:035]: [i2s_in]evt internal cmd = 5
[16:31:01][V][esp-idf:000][i2s_in]: I (135731) AUDIO_ELEMENT: [i2s_in] AEL_MSG_CMD_RESUME,state:1

[16:31:01][D][adf_audio_element:191]: [i2s_in] Checking State, got 78
[16:31:01][I][esp_adf_pipeline:124]: [ i2s_in ] status: 12
[16:31:01][D][esp_adf_pipeline:437]: [ADFMicrophone] Pipeline changed from STARTING to RUNNING. (REQ: 0)
[16:31:01][D][voice_assistant:439]: State changed from STARTING_MICROPHONE to START_PIPELINE
[16:31:01][D][voice_assistant:210]: Requesting start...
[16:31:01][D][voice_assistant:439]: State changed from START_PIPELINE to STARTING_PIPELINE
[16:31:01][D][voice_assistant:460]: Client started, streaming microphone
[16:31:01][D][voice_assistant:439]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[16:31:01][D][voice_assistant:445]: Desired state set to STREAMING_MICROPHONE
[16:31:01][D][voice_assistant:563]: Event Type: 1
[16:31:01][D][voice_assistant:566]: Assist Pipeline running
[16:31:01][D][voice_assistant:563]: Event Type: 3
[16:31:01][D][voice_assistant:577]: STT started
[16:31:01][D][light:036]: 'top_led' Setting:
[16:31:01][D][light:047]:   State: ON
[16:31:01][D][light:051]:   Brightness: 100%
[16:31:01][D][light:059]:   Red: 100%, Green: 100%, Blue: 100%
[16:31:01][D][light:109]:   Effect: 'listening'
[16:31:01][D][voice_assistant:563]: Event Type: 11
[16:31:01][D][voice_assistant:717]: Starting STT by VAD
[16:31:03][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:03][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:03][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:03][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:03][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:03][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:03][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:03][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:03][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:03][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][V][api.connection:1409]: Cannot send message because of TCP buffer space
[16:31:04][D][voice_assistant:563]: Event Type: 12
[16:31:04][D][voice_assistant:721]: STT by VAD end
[16:31:04][D][voice_assistant:439]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[16:31:04][D][voice_assistant:445]: Desired state set to AWAITING_RESPONSE
[16:31:04][D][esp_adf_pipeline:070]: Called 'stop' while in RUNNING state.
[16:31:04][D][voice_assistant:439]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[16:31:04][D][esp_adf_pipeline:437]: [ADFMicrophone] Pipeline changed from RUNNING to ABORTING. (REQ: 1)
[16:31:04][D][light:036]: 'top_led' Setting:
[16:31:04][D][light:051]:   Brightness: 70%
[16:31:04][D][light:059]:   Red: 0%, Green: 20%, Blue: 100%
[16:31:04][D][light:109]:   Effect: 'processing'
[16:31:04][D][adf_audio_element:324]: [i2s_in] Checking State for stopping, got 3
[16:31:04][V][esp-idf:000][i2s_in]: W (138701) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[16:31:04][V][esp-idf:000][i2s_in]: W (138704) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[16:31:04][V][esp-idf:000][i2s_in]: W (138706) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[16:31:04][V][esp-idf:000][i2s_in]: W (138709) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[16:31:04][V][esp-idf:000][i2s_in]: W (138711) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[16:31:04][D][esp_adf_pipeline:437]: [ADFMicrophone] Pipeline changed from ABORTING to STOPPED. (REQ: 1)
[16:31:04][D][voice_assistant:439]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[16:31:04][D][voice_assistant:563]: Event Type: 0
[16:31:04][E][voice_assistant:693]: Error: stt-no-text-recognized - No text recognized
[16:31:04][D][voice_assistant:556]: Signaling stop...
[16:31:04][D][voice_assistant:439]: State changed from AWAITING_RESPONSE to STOP_MICROPHONE
[16:31:04][D][voice_assistant:445]: Desired state set to IDLE
[16:31:04][D][voice_assistant:439]: State changed from STOP_MICROPHONE to IDLE
[16:31:04][D][light:036]: 'top_led' Setting:
[16:31:04][D][light:059]:   Red: 100%, Green: 0%, Blue: 0%
[16:31:04][D][light:085]:   Transition length: 0.1s
[16:31:04][D][light:091]:   Effect: 'None'
[16:31:04][D][voice_assistant:563]: Event Type: 2
[16:31:04][D][voice_assistant:653]: Assist Pipeline ended
[16:31:04][D][light:036]: 'top_led' Setting:
[16:31:04][D][light:047]:   State: OFF
[16:31:04][D][light:085]:   Transition length: 0.1s
[16:31:05][D][light:036]: 'top_led' Setting:
[16:31:05][D][light:085]:   Transition length: 0.1s

Is buffer limited because of hardware specs ? or could we increase its value ?

Hope it will help to debug this issue by the way thanks for all this work :)

gnumpi commented 5 months ago

Thanks for testing, it showed that I missed adapting the bit-depth of the re-sampler correctly! I just uploaded a fix (dev-next branch).

Yes, you are right in duplex mode the output and input sample_rate, bit_depth and number of channels must be the same.

rccoleman commented 5 months ago

Thanks a lot for doing the tests and sharing the logs. I think I do understand now what's going on here. The onju only has one speaker, right? So setting the output also to mono will be sufficient, right? Can anyone with an onju try to test setting the channel of the adf-output also to right:
adf_pipeline:
  - platform: i2s_audio
    type: audio_out
    id: adf_i2s_out
    i2s_audio_id: i2s_shared
    i2s_dout_pin: GPIO12
    sample_rate: 16000
    adf_alc: true
    bits_per_sample: 32bit
    fixed_settings: true
    channel: right

  - platform: i2s_audio
    type: audio_in
    id: adf_i2s_in
    i2s_audio_id: i2s_shared
    i2s_din_pin: GPIO17
    channel: right
    pdm: false
    sample_rate: 16000
    bits_per_sample: 32bit
    fixed_settings: true
Unfortunately, there was also a bug in the ADF-I2SReader, you might also want to try using the dev version of the ADF-Pipeline:
external_components:
  - source:
      type: git
      url: https://github.com/gnumpi/esphome_audio
      ref: dev-next

I made those changes to my Onju config along to moving back to the recommended version of esp-idf and now I get no audio at all. The lighting pattern appears to indicate that it's not even trying to speak the response (it just goes right back to the "listening for the wakeword" pattern), but it looks like it is trying to speak from the logs.

logs_onju-voice-dr_logs.txt onju-voice-microwakeword.yaml.txt

I just went back to the main branch of your PR and now it transitions to the "speaking" stage based on the light pattern, but I don't get any audio.

logs_onju-voice-dr_logs (1).txt

When I go back to 4.4.6, with channel: right the audio response is very fast and high-pitched. When I remove channel: right, audio plays normally.

nyok92 commented 5 months ago

@rccoleman You have to select channel: left for audio_out

@gnumpi I cleaned build files and installed again but output sound is still acceelerated with this following config:

onju-voice.txt

INFO OTA successful
INFO Successfully uploaded program.
INFO Starting log output from 192.168.1.132 using esphome API
INFO Successfully connected to nest-mini @ 192.168.1.132 in 7.211s
INFO Successful handshake with nest-mini @ 192.168.1.132 in 0.081s
[17:35:19][I][app:100]: ESPHome version 2024.5.2 compiled on May 24 2024, 17:29:59
[17:35:19][I][app:102]: Project tetele.onju_voice_satellite version 1.0.0
[17:35:19][C][wifi:580]: WiFi:
[17:35:19][C][wifi:408]:   Local MAC: XXX
[17:35:19][C][wifi:413]:   SSID: [redacted]
[17:35:19][C][wifi:416]:   IP Address: 192.168.1.132
[17:35:19][C][wifi:420]:   BSSID: [redacted]
[17:35:19][C][wifi:421]:   Hostname: 'nest-mini'
[17:35:19][C][wifi:423]:   Signal strength: -70 dB ▂▄▆█
[17:35:19][V][wifi:425]:   Priority: 0.0
[17:35:19][C][wifi:427]:   Channel: 6
[17:35:19][C][wifi:428]:   Subnet: 255.255.255.0
[17:35:19][C][wifi:429]:   Gateway: 192.168.1.254
[17:35:19][C][wifi:430]:   DNS1: 1.1.1.1
[17:35:19][C][wifi:431]:   DNS2: 1.0.0.1
[17:35:19][C][logger:185]: Logger:
[17:35:19][C][logger:186]:   Level: VERBOSE
[17:35:19][C][logger:188]:   Log Baud Rate: 115200
[17:35:19][C][logger:189]:   Hardware UART: USB_SERIAL_JTAG
[17:35:19][C][logger:193]:   Level for 'micro_wake_word': DEBUG
[17:35:19][C][template.number:050]: Template Number 'Touch threshold percentage'
[17:35:19][C][template.number:051]:   Optimistic: YES
[17:35:19][C][template.number:052]:   Update Interval: never
[17:35:19][C][esp32_rmt_led_strip:175]: ESP32 RMT LED Strip:
[17:35:19][C][esp32_rmt_led_strip:176]:   Pin: 11
[17:35:19][C][esp32_rmt_led_strip:177]:   Channel: 0
[17:35:19][C][esp32_rmt_led_strip:202]:   RGB Order: GRB
[17:35:19][C][esp32_rmt_led_strip:203]:   Max refresh rate: 0
[17:35:19][C][esp32_rmt_led_strip:204]:   Number of LEDs: 6
[17:35:19][C][switch.gpio:068]: GPIO Switch 'dac_mute'
[17:35:19][C][switch.gpio:091]:   Restore Mode: always OFF
[17:35:19][C][switch.gpio:031]:   Pin: GPIO21
[17:35:20][C][gpio.binary_sensor:015]: GPIO Binary Sensor 'Disable wake word'
[17:35:20][C][gpio.binary_sensor:016]:   Pin: GPIO38
[17:35:20][C][light:103]: Light 'leds'
[17:35:20][C][light:105]:   Default Transition Length: 0.0s
[17:35:20][C][light:106]:   Gamma Correct: 2.80
[17:35:20][C][light:103]: Light 'left_led'
[17:35:20][C][light:105]:   Default Transition Length: 0.1s
[17:35:20][C][light:106]:   Gamma Correct: 2.80
[17:35:20][C][light:103]: Light 'top_led'
[17:35:20][C][light:105]:   Default Transition Length: 0.1s
[17:35:20][C][light:106]:   Gamma Correct: 2.80
[17:35:20][C][light:103]: Light 'right_led'
[17:35:20][C][light:105]:   Default Transition Length: 0.1s
[17:35:20][C][light:106]:   Gamma Correct: 2.80
[17:35:20][C][template.switch:068]: Template Switch 'Use Wake Word'
[17:35:20][C][template.switch:091]:   Restore Mode: restore defaults to ON
[17:35:20][C][template.switch:057]:   Optimistic: YES
[17:35:20][C][psram:020]: PSRAM:
[17:35:20][C][psram:021]:   Available: YES
[17:35:20][C][psram:024]:   Size: 8191 KB
[17:35:20][C][i2s_audio:028]: I2SController:
[17:35:20][C][i2s_audio:029]:   AccessMode: duplex
[17:35:20][C][i2s_audio:030]:   Port: 0
[17:35:20][C][i2s_audio:032]:   Reader registered.
[17:35:20][C][i2s_audio:035]:   Writer registered.
[17:35:20][C][i2s_audio:139]: I2S-Writer (Fixed-CFG):
[17:35:20][C][i2s_audio:141]:   sample-rate: 16000 bits_per_sample: 32
[17:35:20][C][i2s_audio:142]:   channel_fmt: 4 channels: 1
[17:35:20][C][i2s_audio:143]:   use_apll: no, use_pdm: no
[17:35:20][C][i2s_audio:136]: I2S-Reader (Fixed-CFG):
[17:35:20][C][i2s_audio:141]:   sample-rate: 16000 bits_per_sample: 32
[17:35:20][C][i2s_audio:142]:   channel_fmt: 4 channels: 1
[17:35:20][C][i2s_audio:143]:   use_apll: no, use_pdm: no
[17:35:20][C][esp32_touch:073]: Config for ESP32 Touch Hub:
[17:35:20][C][esp32_touch:074]:   Meas cycle: 0.80ms
[17:35:20][C][esp32_touch:075]:   Sleep cycle: 2.00ms
[17:35:20][C][esp32_touch:095]:   Low Voltage Reference: 0.8V
[17:35:20][C][esp32_touch:115]:   High Voltage Reference: 2.4V
[17:35:20][C][esp32_touch:135]:   Voltage Attenuation: 0V
[17:35:20][C][esp32_touch:169]:   Filter mode: IIR_16
[17:35:20][C][esp32_touch:170]:   Debounce count: 2
[17:35:20][C][esp32_touch:171]:   Noise threshold coefficient: 0
[17:35:20][C][esp32_touch:172]:   Jitter filter step size: 0
[17:35:20][C][esp32_touch:191]:   Smooth level: IIR_2
[17:35:20][C][esp32_touch:213]:   Denoise grade: BIT8
[17:35:20][C][esp32_touch:245]:   Denoise capacitance level: L0
[17:35:20][C][esp32_touch:261]:     Pad: T2
[17:35:20][C][esp32_touch:262]:     Threshold: 502900
[17:35:20][C][esp32_touch:260]:   Touch Pad 'action'
[17:35:20][C][esp32_touch:261]:     Pad: T3
[17:35:20][C][esp32_touch:262]:     Threshold: 611335
[17:35:20][C][captive_portal:088]: Captive Portal:
[17:35:20][C][mdns:115]: mDNS:
[17:35:20][C][mdns:116]:   Hostname: nest-mini
[17:35:20][V][mdns:117]:   Services:
[17:35:20][V][mdns:119]:   - _esphomelib, _tcp, 6053
[17:35:20][V][mdns:121]:     TXT: friendly_name = Onju Voice Satellite
[17:35:20][V][mdns:121]:     TXT: version = 2024.5.2
[17:35:20][V][mdns:121]:     TXT: mac = XXX
[17:35:20][V][mdns:121]:     TXT: platform = ESP32
[17:35:20][V][mdns:121]:     TXT: board = esp32-s3-devkitc-1
[17:35:20][V][mdns:121]:     TXT: network = wifi
[17:35:20][V][mdns:121]:     TXT: api_encryption = Noise_NNpsk0_25519_ChaChaPoly_SHA256
[17:35:20][V][mdns:121]:     TXT: project_name = tetele.onju_voice_satellite
[17:35:20][V][mdns:121]:     TXT: project_version = 1.0.0
[17:35:20][V][mdns:121]:     TXT: package_import_url = github://tetele/onju-voice-satellite/esphome/onju-voice-microwakeword.yaml@main
[17:35:20][C][ota:096]: Over-The-Air Updates:
[17:35:20][C][ota:097]:   Address: nest-mini.local:3232
[17:35:20][C][ota:100]:   Using Password.
[17:35:20][C][ota:103]:   OTA version: 2.
[17:35:20][C][api:139]: API Server:
[17:35:20][C][api:140]:   Address: nest-mini.local:6053
[17:35:20][C][api:142]:   Using noise encryption: YES
[17:35:20][C][improv_serial:032]: Improv Serial:
[17:35:20][C][micro_wake_word:058]: microWakeWord models:
[17:35:20][C][micro_wake_word:023]:   - Wake Word: hey mycroft
[17:35:20][C][micro_wake_word:024]:     Probability cutoff: 0.840
[17:35:20][C][micro_wake_word:025]:     Sliding window size: 10
[17:35:20][C][micro_wake_word:029]:   - VAD Model
[17:35:20][C][micro_wake_word:030]:     Upper threshold: 0.950
[17:35:20][C][micro_wake_word:031]:     Lower threshold: 0.500
[17:35:20][C][micro_wake_word:032]:     Sliding window size: 2
[17:35:20][C][esp_adf_pipeline.microphone:020]: ADF-Microphone
[17:35:20][C][adf_media_player:016]: ESP-ADF-MediaPlayer:
[17:35:20][C][adf_media_player:018]:   MP_ANNOUNCE enabled
[17:35:20][C][adf_media_player:021]:   Number of ADFComponents: 3
[17:35:20][D][light:036]: 'top_led' Setting:
[17:35:20][D][light:047]:   State: OFF
[17:35:20][D][light:085]:   Transition length: 0.1s
[17:35:20][D][media_player:061]: 'Onju Voice Satellite' - Setting
[17:35:20][D][media_player:071]:   Volume: 0.35
[17:35:40][D][media_player:061]: 'Onju Voice Satellite' - Setting
[17:35:40][D][media_player:068]:   Media URL: http://192.168.1.5:8097/single/media_player.nest_mini_onju_voice_satellite/2ce7a8299d48445caafc28956f912de9.mp3?ts=1716564940
[17:35:40][D][esp_audio_sources:058]: Set new uri: http://192.168.1.5:8097/single/media_player.nest_mini_onju_voice_satellite/2ce7a8299d48445caafc28956f912de9.mp3?ts=1716564940
[17:35:40][D][adf_media_player:054]: Got control call in state IDLE
[17:35:40][D][adf_media_player:055]: req_track stream uri: http://192.168.1.5:8097/single/media_player.nest_mini_onju_voice_satellite/2ce7a8299d48445caafc28956f912de9.mp3?ts=1716564940
[17:35:40][D][esp_adf_pipeline:060]: Starting request, current state UNINITIALIZED
[17:35:40][D][esp_adf_pipeline:437]: [MediaPlayer] Pipeline changed from UNINITIALIZED to INITIALIZING. (REQ: 0)
[17:35:40][I][adf_media_player:189]: got new pipeline state: 1, while in MP state IDLE
[17:35:40][I][adf_media_player:252]: current mp state: IDLE
[17:35:40][I][adf_media_player:253]: anouncement: false
[17:35:40][I][adf_media_player:254]: play_intent: false
[17:35:40][I][adf_media_player:255]: current_uri_: yes
[17:35:40][V][esp-idf:000]: I (29074) MP3_DECODER: MP3 init

[17:35:40][D][i2s_audio:067]: Install driver requested by Writer
[17:35:40][V][esp-idf:000]: I (29079) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4

[17:35:40][V][esp-idf:000]: I (29081) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4

[17:35:40][D][i2s_audio:073]: Installing driver : yes
[17:35:40][D][esp_adf_pipeline:486]: pipeline tag 0, http
[17:35:40][D][esp_adf_pipeline:486]: pipeline tag 1, decoder
[17:35:40][D][esp_adf_pipeline:486]: pipeline tag 2, resampler
[17:35:40][D][esp_adf_pipeline:486]: pipeline tag 3, i2s_out
[17:35:40][V][esp-idf:000]: I (29092) AUDIO_PIPELINE: link el->rb, el:0x3d8094ac, tag:http, rb:0x3d809bc8

[17:35:40][V][esp-idf:000]: I (29094) AUDIO_PIPELINE: link el->rb, el:0x3d8096ac, tag:decoder, rb:0x3d80ac08

[17:35:40][V][esp-idf:000]: I (29097) AUDIO_PIPELINE: link el->rb, el:0x3d809848, tag:resampler, rb:0x3d80bc48

[17:35:40][D][esp_adf_pipeline:496]: Setting up event listener.
[17:35:40][D][esp_adf_pipeline:437]: [MediaPlayer] Pipeline changed from INITIALIZING to CREATED. (REQ: 0)
[17:35:40][I][adf_media_player:189]: got new pipeline state: 2, while in MP state IDLE
[17:35:40][I][adf_media_player:252]: current mp state: IDLE
[17:35:40][I][adf_media_player:253]: anouncement: false
[17:35:40][I][adf_media_player:254]: play_intent: false
[17:35:40][I][adf_media_player:255]: current_uri_: yes
[17:35:40][D][esp_adf_pipeline:437]: [MediaPlayer] Pipeline changed from CREATED to PREPARING. (REQ: 0)
[17:35:40][I][adf_media_player:189]: got new pipeline state: 3, while in MP state IDLE
[17:35:40][D][adf_i2s_out:127]: Set final i2s settings: 16000
[17:35:40][I][adf_media_player:252]: current mp state: PLAYING
[17:35:40][I][adf_media_player:253]: anouncement: false
[17:35:40][I][adf_media_player:254]: play_intent: false
[17:35:40][I][adf_media_player:255]: current_uri_: yes
[17:35:40][D][esp_audio_sources:063]: Prepare elements called (initial_call)!
[17:35:40][D][esp_audio_sources:097]: Use fixed settings: no
[17:35:40][D][esp_audio_sources:098]: Streamer status: 1
[17:35:40][D][esp_audio_sources:099]: decoder status: 1
[17:35:40][D][esp_audio_sources:100]: stream uri: http://192.168.1.5:8097/single/media_player.nest_mini_onju_voice_satellite/2ce7a8299d48445caafc28956f912de9.mp3?ts=1716564940
[17:35:40][D][adf_audio_element:108]: Preparing [http]...
[17:35:40][V][esp-idf:000]: I (29138) AUDIO_THREAD: The http task allocate stack on external memory

[17:35:40][V][esp-idf:000]: I (29140) AUDIO_ELEMENT: [http-0x3d8094ac] Element task created

[17:35:40][D][adf_audio_element:108]: Preparing [decoder]...
[17:35:40][V][esp-idf:000]: I (29143) AUDIO_THREAD: The decoder task allocate stack on external memory

[17:35:40][V][esp-idf:000]: I (29145) AUDIO_ELEMENT: [decoder-0x3d8096ac] Element task created

[17:35:40][D][adf_audio_element:108]: Preparing [resampler]...
[17:35:40][V][esp-idf:000]: I (29149) AUDIO_THREAD: The resampler task allocate stack on external memory

[17:35:40][V][esp-idf:000]: I (29151) AUDIO_ELEMENT: [resampler-0x3d809848] Element task created

[17:35:40][D][adf_audio_element:108]: Preparing [i2s_out]...
[17:35:40][V][esp-idf:000]: I (29164) AUDIO_ELEMENT: [i2s_out-0x3d809a00] Element task created

[17:35:40][D][adf_audio_element:165]: Resuming [http]...
[17:35:40][D][adf_audio_element:172]: [http] Sending resume command.
[17:35:40][V][adf_audio_element:035]: [http]evt internal cmd = 5
[17:35:40][D][adf_audio_element:165]: Resuming [decoder]...
[17:35:40][D][adf_audio_element:172]: [decoder] Sending resume command.
[17:35:40][V][adf_audio_element:035]: [decoder]evt internal cmd = 5
[17:35:40][V][esp-idf:000][decoder]: I (29209) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[17:35:40][V][esp-idf:000][decoder]: I (29212) MP3_DECODER: MP3 opened

[17:35:40][V][esp-idf:000][http]: I (29224) HTTP_STREAM: total_bytes=0

[17:35:40][D][adf_audio_element:191]: [http] Checking State, got 74
[17:35:40][D][adf_audio_element:191]: [decoder] Checking State, got 72
[17:35:42][I][HTTPStreamReader:193]: [ * ] Receive music info from mp3 decoder, sample_rates=44100, bits=16, ch=2
[17:35:42][D][adf_i2s_out:127]: Set final i2s settings: 16000
[17:35:42][D][esp_audio_processors:090]: Received request from: HTTPStreamReader
[17:35:42][D][esp_audio_processors:095]: New settings: SRC: rate: 44100, ch: 2 DST: rate: 16000, ch: 1 
[17:35:42][D][adf_audio_element:108]: Preparing [http]...
[17:35:42][D][adf_audio_element:108]: Preparing [decoder]...
[17:35:42][V][esp-idf:000][decoder]: W (31302) AUDIO_ELEMENT: OUT-[decoder] AEL_IO_ABORT

[17:35:42][V][esp-idf:000][decoder]: W (31305) MP3_DECODER: output aborted -3

[17:35:42][V][esp-idf:000][decoder]: I (31307) MP3_DECODER: Closed

[17:35:42][D][esp_audio_sources:153]: Preparation done!
[17:35:42][D][esp_adf_pipeline:334]: wait for preparation, done
[17:35:42][D][esp_adf_pipeline:437]: [MediaPlayer] Pipeline changed from PREPARING to STARTING. (REQ: 0)
[17:35:42][I][adf_media_player:189]: got new pipeline state: 5, while in MP state PLAYING
[17:35:42][I][adf_media_player:252]: current mp state: PLAYING
[17:35:42][I][adf_media_player:253]: anouncement: false
[17:35:42][I][adf_media_player:254]: play_intent: false
[17:35:42][I][adf_media_player:255]: current_uri_: yes
[17:35:42][D][adf_audio_element:165]: Resuming [http]...
[17:35:42][D][adf_audio_element:172]: [http] Sending resume command.
[17:35:42][V][adf_audio_element:035]: [http]evt internal cmd = 5
[17:35:42][D][adf_audio_element:165]: Resuming [decoder]...
[17:35:42][D][adf_audio_element:172]: [decoder] Sending resume command.
[17:35:42][V][adf_audio_element:035]: [decoder]evt internal cmd = 5
[17:35:42][V][esp-idf:000][decoder]: I (31449) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[17:35:42][V][esp-idf:000][decoder]: I (31625) MP3_DECODER: MP3 opened

[17:35:42][V][esp-idf:000][http]: I (31633) HTTP_STREAM: total_bytes=0

[17:35:42][D][adf_audio_element:191]: [http] Checking State, got 78
[17:35:42][D][adf_audio_element:191]: [decoder] Checking State, got 79
[17:35:42][I][esp_adf_pipeline:124]: [ http ] status: 12
[17:35:42][D][esp_adf_pipeline:437]: [MediaPlayer] Pipeline changed from STARTING to RUNNING. (REQ: 0)
[17:35:42][I][adf_media_player:189]: got new pipeline state: 6, while in MP state PLAYING
[17:35:42][I][adf_media_player:252]: current mp state: PLAYING
[17:35:42][I][adf_media_player:253]: anouncement: false
[17:35:42][I][adf_media_player:254]: play_intent: false
[17:35:42][I][adf_media_player:255]: current_uri_: yes
[17:35:44][I][esp_adf_pipeline:124]: [ decoder ] status: 12
[17:35:44][I][HTTPStreamReader:193]: [ * ] Receive music info from mp3 decoder, sample_rates=44100, bits=16, ch=2
[17:35:46][D][media_player:061]: 'Onju Voice Satellite' - Setting
[17:35:47][D][media_player:065]:   Command: STOP
[17:35:47][D][esp_adf_pipeline:070]: Called 'stop' while in RUNNING state.
[17:35:47][D][esp_adf_pipeline:437]: [MediaPlayer] Pipeline changed from RUNNING to ABORTING. (REQ: 1)
[17:35:47][I][adf_media_player:189]: got new pipeline state: 10, while in MP state PLAYING
[17:35:47][I][adf_media_player:252]: current mp state: PLAYING
[17:35:47][I][adf_media_player:253]: anouncement: false
[17:35:47][I][adf_media_player:254]: play_intent: false
[17:35:47][I][adf_media_player:255]: current_uri_: false
[17:35:47][D][adf_audio_element:324]: [http] Checking State for stopping, got 3
[17:35:47][D][adf_audio_element:324]: [decoder] Checking State for stopping, got 3
[17:35:47][V][esp-idf:000][resampler]: W (35755) AUDIO_ELEMENT: IN-[resampler] AEL_IO_ABORT

[17:35:47][V][esp-idf:000][decoder]: I (35758) MP3_DECODER: Closed

[17:35:47][D][adf_audio_element:324]: [i2s_out] Checking State for stopping, got 3
[17:35:47][D][esp_adf_pipeline:437]: [MediaPlayer] Pipeline changed from ABORTING to STOPPED. (REQ: 1)
[17:35:47][I][adf_media_player:189]: got new pipeline state: 4, while in MP state PLAYING
[17:35:47][I][adf_media_player:252]: current mp state: IDLE
[17:35:47][I][adf_media_player:253]: anouncement: false
[17:35:47][I][adf_media_player:254]: play_intent: false
[17:35:47][I][adf_media_player:255]: current_uri_: false

gnumpi commented 5 months ago

@nyok92 it seems it did not load the version with the latest fixes. Can you add a refresh: 0s to the external source?

nyok92 commented 5 months ago

@gnumpi ok it seeems to have beeen updated now. But now with the newer version i don't have out audio anymore

INFO ESPHome 2024.5.2
INFO Reading configuration /config/esphome/nest-mini.yaml...
INFO Updating https://github.com/gnumpi/esphome_audio@dev-next
INFO Starting log output from 192.168.1.132 using esphome API
INFO Successfully connected to nest-mini @ 192.168.1.132 in 0.215s
INFO Successful handshake with nest-mini @ 192.168.1.132 in 0.087s
[18:30:32][I][app:100]: ESPHome version 2024.5.2 compiled on May 24 2024, 18:18:50
[18:30:32][I][app:102]: Project tetele.onju_voice_satellite version 1.0.0
[18:30:32][C][wifi:580]: WiFi:
[18:30:32][C][wifi:408]:   Local MAC: XXX
[18:30:32][C][wifi:413]:   SSID: [redacted]
[18:30:32][C][wifi:416]:   IP Address: 192.168.1.132
[18:30:32][C][wifi:420]:   BSSID: [redacted]
[18:30:32][C][wifi:421]:   Hostname: 'nest-mini'
[18:30:32][C][wifi:423]:   Signal strength: -63 dB ▂▄▆█
[18:30:32][V][wifi:425]:   Priority: 0.0
[18:30:32][C][wifi:427]:   Channel: 6
[18:30:32][C][wifi:428]:   Subnet: 255.255.255.0
[18:30:32][C][wifi:429]:   Gateway: 192.168.1.254
[18:30:32][C][wifi:430]:   DNS1: 1.1.1.1
[18:30:32][C][wifi:431]:   DNS2: 1.0.0.1
[18:30:32][C][logger:185]: Logger:
[18:30:32][C][logger:186]:   Level: VERBOSE
[18:30:32][C][logger:188]:   Log Baud Rate: 115200
[18:30:32][C][logger:189]:   Hardware UART: USB_SERIAL_JTAG
[18:30:32][C][logger:193]:   Level for 'micro_wake_word': DEBUG
[18:30:32][C][template.number:050]: Template Number 'Touch threshold percentage'
[18:30:32][C][template.number:051]:   Optimistic: YES
[18:30:32][C][template.number:052]:   Update Interval: never
[18:30:32][C][esp32_rmt_led_strip:175]: ESP32 RMT LED Strip:
[18:30:32][C][esp32_rmt_led_strip:176]:   Pin: 11
[18:30:32][C][esp32_rmt_led_strip:177]:   Channel: 0
[18:30:32][C][esp32_rmt_led_strip:202]:   RGB Order: GRB
[18:30:32][C][esp32_rmt_led_strip:203]:   Max refresh rate: 0
[18:30:32][C][esp32_rmt_led_strip:204]:   Number of LEDs: 6
[18:30:32][C][switch.gpio:068]: GPIO Switch 'dac_mute'
[18:30:32][C][switch.gpio:091]:   Restore Mode: always OFF
[18:30:32][C][switch.gpio:031]:   Pin: GPIO21
[18:30:32][C][gpio.binary_sensor:015]: GPIO Binary Sensor 'Disable wake word'
[18:30:32][C][gpio.binary_sensor:016]:   Pin: GPIO38
[18:30:32][C][light:103]: Light 'leds'
[18:30:32][C][light:105]:   Default Transition Length: 0.0s
[18:30:32][C][light:106]:   Gamma Correct: 2.80
[18:30:32][C][light:103]: Light 'left_led'
[18:30:32][C][light:105]:   Default Transition Length: 0.1s
[18:30:32][C][light:106]:   Gamma Correct: 2.80
[18:30:32][C][light:103]: Light 'top_led'
[18:30:32][C][light:105]:   Default Transition Length: 0.1s
[18:30:32][C][light:106]:   Gamma Correct: 2.80
[18:30:32][C][light:103]: Light 'right_led'
[18:30:32][C][light:105]:   Default Transition Length: 0.1s
[18:30:32][C][light:106]:   Gamma Correct: 2.80
[18:30:32][C][template.switch:068]: Template Switch 'Use Wake Word'
[18:30:32][C][template.switch:091]:   Restore Mode: restore defaults to ON
[18:30:32][C][template.switch:057]:   Optimistic: YES
[18:30:32][C][psram:020]: PSRAM:
[18:30:32][C][psram:021]:   Available: YES
[18:30:32][C][psram:024]:   Size: 8191 KB
[18:30:32][C][i2s_audio:028]: I2SController:
[18:30:32][C][i2s_audio:029]:   AccessMode: duplex
[18:30:32][C][i2s_audio:030]:   Port: 0
[18:30:32][C][i2s_audio:032]:   Reader registered.
[18:30:32][C][i2s_audio:035]:   Writer registered.
[18:30:32][C][i2s_audio:139]: I2S-Writer (Fixed-CFG):
[18:30:32][C][i2s_audio:141]:   sample-rate: 16000 bits_per_sample: 32
[18:30:32][C][i2s_audio:142]:   channel_fmt: 4 channels: 1
[18:30:32][C][i2s_audio:143]:   use_apll: no, use_pdm: no
[18:30:32][C][i2s_audio:136]: I2S-Reader (Fixed-CFG):
[18:30:32][C][i2s_audio:141]:   sample-rate: 16000 bits_per_sample: 32
[18:30:32][C][i2s_audio:142]:   channel_fmt: 4 channels: 1
[18:30:32][C][i2s_audio:143]:   use_apll: no, use_pdm: no
[18:30:32][C][esp32_touch:073]: Config for ESP32 Touch Hub:
[18:30:32][C][esp32_touch:074]:   Meas cycle: 0.80ms
[18:30:32][C][esp32_touch:075]:   Sleep cycle: 2.00ms
[18:30:32][C][esp32_touch:095]:   Low Voltage Reference: 0.8V
[18:30:32][C][esp32_touch:115]:   High Voltage Reference: 2.4V
[18:30:32][C][esp32_touch:135]:   Voltage Attenuation: 0V
[18:30:32][C][esp32_touch:169]:   Filter mode: IIR_16
[18:30:32][C][esp32_touch:170]:   Debounce count: 2
[18:30:32][C][esp32_touch:171]:   Noise threshold coefficient: 0
[18:30:32][C][esp32_touch:172]:   Jitter filter step size: 0
[18:30:32][C][esp32_touch:191]:   Smooth level: IIR_2
[18:30:32][C][esp32_touch:213]:   Denoise grade: BIT8
[18:30:32][C][esp32_touch:245]:   Denoise capacitance level: L0
[18:30:32][C][esp32_touch:260]:   Touch Pad 'volume_down'
[18:30:32][C][esp32_touch:261]:     Pad: T4
[18:30:32][C][esp32_touch:262]:     Threshold: 466323
[18:30:32][C][esp32_touch:260]:   Touch Pad 'volume_up'
[18:30:32][C][esp32_touch:261]:     Pad: T2
[18:30:32][C][esp32_touch:262]:     Threshold: 502038
[18:30:32][C][esp32_touch:260]:   Touch Pad 'action'
[18:30:32][C][esp32_touch:261]:     Pad: T3
[18:30:32][C][esp32_touch:262]:     Threshold: 610780
[18:30:32][C][captive_portal:088]: Captive Portal:
[18:30:32][C][mdns:115]: mDNS:
[18:30:32][C][mdns:116]:   Hostname: nest-mini
[18:30:32][V][mdns:117]:   Services:
[18:30:32][V][mdns:119]:   - _esphomelib, _tcp, 6053
[18:30:32][V][mdns:121]:     TXT: friendly_name = Onju Voice Satellite
[18:30:32][V][mdns:121]:     TXT: version = 2024.5.2
[18:30:32][V][mdns:121]:     TXT: mac = XXX
[18:30:32][V][mdns:121]:     TXT: platform = ESP32
[18:30:32][V][mdns:121]:     TXT: board = esp32-s3-devkitc-1
[18:30:32][V][mdns:121]:     TXT: network = wifi
[18:30:32][V][mdns:121]:     TXT: api_encryption = Noise_NNpsk0_25519_ChaChaPoly_SHA256
[18:30:32][V][mdns:121]:     TXT: project_name = tetele.onju_voice_satellite
[18:30:32][V][mdns:121]:     TXT: project_version = 1.0.0
[18:30:32][V][mdns:121]:     TXT: package_import_url = github://tetele/onju-voice-satellite/esphome/onju-voice-microwakeword.yaml@main
[18:30:32][C][ota:096]: Over-The-Air Updates:
[18:30:32][C][ota:097]:   Address: nest-mini.local:3232
[18:30:32][C][ota:100]:   Using Password.
[18:30:32][C][ota:103]:   OTA version: 2.
[18:30:32][C][api:139]: API Server:
[18:30:32][C][api:140]:   Address: nest-mini.local:6053
[18:30:32][C][api:142]:   Using noise encryption: YES
[18:30:32][C][improv_serial:032]: Improv Serial:
[18:30:32][C][micro_wake_word:058]: microWakeWord models:
[18:30:32][C][micro_wake_word:023]:   - Wake Word: hey mycroft
[18:30:32][C][micro_wake_word:024]:     Probability cutoff: 0.840
[18:30:32][C][micro_wake_word:025]:     Sliding window size: 10
[18:30:32][C][micro_wake_word:029]:   - VAD Model
[18:30:32][C][micro_wake_word:030]:     Upper threshold: 0.950
[18:30:32][C][micro_wake_word:031]:     Lower threshold: 0.500
[18:30:32][C][micro_wake_word:032]:     Sliding window size: 2
[18:30:32][C][esp_adf_pipeline.microphone:020]: ADF-Microphone
[18:30:32][C][adf_media_player:016]: ESP-ADF-MediaPlayer:
[18:30:32][C][adf_media_player:018]:   MP_ANNOUNCE enabled
[18:30:32][C][adf_media_player:021]:   Number of ADFComponents: 3
[18:30:42][D][media_player:061]: 'Onju Voice Satellite' - Setting
[18:30:42][D][media_player:068]:   Media URL: http://192.168.1.5:8097/single/media_player.nest_mini_onju_voice_satellite/ecf53884115c4117a37bda1e80a9055b.mp3?ts=1716568242
[18:30:42][D][esp_audio_sources:058]: Set new uri: http://192.168.1.5:8097/single/media_player.nest_mini_onju_voice_satellite/ecf53884115c4117a37bda1e80a9055b.mp3?ts=1716568242
[18:30:42][D][adf_media_player:054]: Got control call in state IDLE
[18:30:42][D][adf_media_player:055]: req_track stream uri: http://192.168.1.5:8097/single/media_player.nest_mini_onju_voice_satellite/ecf53884115c4117a37bda1e80a9055b.mp3?ts=1716568242
[18:30:42][D][esp_adf_pipeline:060]: Starting request, current state UNINITIALIZED
[18:30:42][D][esp_adf_pipeline:437]: [MediaPlayer] Pipeline changed from UNINITIALIZED to INITIALIZING. (REQ: 0)
[18:30:42][I][adf_media_player:189]: got new pipeline state: 1, while in MP state IDLE
[18:30:42][I][adf_media_player:252]: current mp state: IDLE
[18:30:42][I][adf_media_player:253]: anouncement: false
[18:30:42][I][adf_media_player:254]: play_intent: false
[18:30:42][I][adf_media_player:255]: current_uri_: yes
[18:30:42][V][esp-idf:000]: I (39733) MP3_DECODER: MP3 init

[18:30:42][D][i2s_audio:067]: Install driver requested by Writer
[18:30:42][V][esp-idf:000]: I (39739) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4

[18:30:42][V][esp-idf:000]: I (39741) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4

[18:30:42][D][i2s_audio:073]: Installing driver : yes
[18:30:42][D][esp_adf_pipeline:486]: pipeline tag 0, http
[18:30:42][D][esp_adf_pipeline:486]: pipeline tag 1, decoder
[18:30:42][D][esp_adf_pipeline:486]: pipeline tag 2, resampler
[18:30:42][D][esp_adf_pipeline:486]: pipeline tag 3, i2s_out
[18:30:42][V][esp-idf:000]: I (39750) AUDIO_PIPELINE: link el->rb, el:0x3d8094ac, tag:http, rb:0x3d809bc8

[18:30:42][V][esp-idf:000]: I (39752) AUDIO_PIPELINE: link el->rb, el:0x3d8096ac, tag:decoder, rb:0x3d80ac08

[18:30:42][V][esp-idf:000]: I (39755) AUDIO_PIPELINE: link el->rb, el:0x3d809848, tag:resampler, rb:0x3d80bc48

[18:30:42][D][esp_adf_pipeline:496]: Setting up event listener.
[18:30:42][D][esp_adf_pipeline:437]: [MediaPlayer] Pipeline changed from INITIALIZING to CREATED. (REQ: 0)
[18:30:42][I][adf_media_player:189]: got new pipeline state: 2, while in MP state IDLE
[18:30:42][I][adf_media_player:252]: current mp state: IDLE
[18:30:42][I][adf_media_player:253]: anouncement: false
[18:30:42][I][adf_media_player:254]: play_intent: false
[18:30:42][I][adf_media_player:255]: current_uri_: yes
[18:30:42][D][esp_adf_pipeline:437]: [MediaPlayer] Pipeline changed from CREATED to PREPARING. (REQ: 0)
[18:30:42][I][adf_media_player:189]: got new pipeline state: 3, while in MP state IDLE
[18:30:42][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:30:42][I][adf_media_player:252]: current mp state: PLAYING
[18:30:42][I][adf_media_player:253]: anouncement: false
[18:30:42][I][adf_media_player:254]: play_intent: false
[18:30:42][I][adf_media_player:255]: current_uri_: yes
[18:30:42][D][esp_audio_sources:063]: Prepare elements called (initial_call)!
[18:30:42][D][esp_audio_sources:097]: Use fixed settings: no
[18:30:42][D][esp_audio_sources:098]: Streamer status: 1
[18:30:42][D][esp_audio_sources:099]: decoder status: 1
[18:30:42][D][esp_audio_sources:100]: stream uri: http://192.168.1.5:8097/single/media_player.nest_mini_onju_voice_satellite/ecf53884115c4117a37bda1e80a9055b.mp3?ts=1716568242
[18:30:42][D][adf_audio_element:108]: Preparing [http]...
[18:30:42][V][esp-idf:000]: I (39796) AUDIO_THREAD: The http task allocate stack on external memory

[18:30:42][V][esp-idf:000]: I (39798) AUDIO_ELEMENT: [http-0x3d8094ac] Element task created

[18:30:42][D][adf_audio_element:108]: Preparing [decoder]...
[18:30:42][V][esp-idf:000]: I (39802) AUDIO_THREAD: The decoder task allocate stack on external memory

[18:30:42][V][esp-idf:000]: I (39804) AUDIO_ELEMENT: [decoder-0x3d8096ac] Element task created

[18:30:42][D][adf_audio_element:108]: Preparing [resampler]...
[18:30:42][V][esp-idf:000]: I (39808) AUDIO_THREAD: The resampler task allocate stack on external memory

[18:30:42][V][esp-idf:000]: I (39810) AUDIO_ELEMENT: [resampler-0x3d809848] Element task created

[18:30:42][D][adf_audio_element:108]: Preparing [i2s_out]...
[18:30:42][V][esp-idf:000]: I (39824) AUDIO_ELEMENT: [i2s_out-0x3d809a00] Element task created

[18:30:42][D][adf_audio_element:165]: Resuming [http]...
[18:30:42][D][adf_audio_element:172]: [http] Sending resume command.
[18:30:42][V][adf_audio_element:035]: [http]evt internal cmd = 5
[18:30:42][D][adf_audio_element:165]: Resuming [decoder]...
[18:30:42][D][adf_audio_element:172]: [decoder] Sending resume command.
[18:30:42][V][adf_audio_element:035]: [decoder]evt internal cmd = 5
[18:30:42][V][esp-idf:000][decoder]: I (39871) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[18:30:42][V][esp-idf:000][decoder]: I (39877) MP3_DECODER: MP3 opened

[18:30:42][D][adf_audio_element:191]: [http] Checking State, got 74
[18:30:42][D][adf_audio_element:191]: [decoder] Checking State, got 72
[18:30:44][I][HTTPStreamReader:193]: [ * ] Receive music info from mp3 decoder, sample_rates=44100, bits=16, ch=2
[18:30:44][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:30:44][D][esp_audio_processors:104]: Received request from: HTTPStreamReader
[18:30:44][D][esp_audio_processors:109]: New settings: SRC: rate: 44100, ch: 2 bits: 16, DST: rate: 16000, ch: 1, bits 32
[18:30:44][D][adf_audio_element:108]: Preparing [http]...
[18:30:44][D][adf_audio_element:108]: Preparing [decoder]...
[18:30:44][V][esp-idf:000][decoder]: W (41732) AUDIO_ELEMENT: OUT-[decoder] AEL_IO_ABORT

[18:30:44][V][esp-idf:000][decoder]: W (41735) MP3_DECODER: output aborted -3

[18:30:44][V][esp-idf:000][decoder]: I (41737) MP3_DECODER: Closed

[18:30:44][D][esp_audio_sources:153]: Preparation done!
[18:30:44][D][esp_adf_pipeline:334]: wait for preparation, done
[18:30:44][D][esp_adf_pipeline:437]: [MediaPlayer] Pipeline changed from PREPARING to STARTING. (REQ: 0)
[18:30:44][I][adf_media_player:189]: got new pipeline state: 5, while in MP state PLAYING
[18:30:44][I][adf_media_player:252]: current mp state: PLAYING
[18:30:44][I][adf_media_player:253]: anouncement: false
[18:30:44][I][adf_media_player:254]: play_intent: false
[18:30:44][I][adf_media_player:255]: current_uri_: yes
[18:30:44][D][adf_audio_element:165]: Resuming [http]...
[18:30:44][D][adf_audio_element:172]: [http] Sending resume command.
[18:30:44][V][adf_audio_element:035]: [http]evt internal cmd = 5
[18:30:44][D][adf_audio_element:165]: Resuming [decoder]...
[18:30:44][D][adf_audio_element:172]: [decoder] Sending resume command.
[18:30:44][V][adf_audio_element:035]: [decoder]evt internal cmd = 5
[18:30:44][V][esp-idf:000][decoder]: I (41896) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[18:30:44][V][esp-idf:000][decoder]: I (41898) MP3_DECODER: MP3 opened

[18:30:44][D][adf_audio_element:165]: Resuming [i2s_out]...
[18:30:44][D][adf_audio_element:172]: [i2s_out] Sending resume command.
[18:30:44][V][adf_audio_element:035]: [i2s_out]evt internal cmd = 5
[18:30:44][I][esp_adf_pipeline:124]: [ resampler ] status: 1
[18:30:44][D][esp_adf_pipeline:437]: [MediaPlayer] Pipeline changed from STARTING to ABORTING. (REQ: 1)
[18:30:44][I][adf_media_player:189]: got new pipeline state: 10, while in MP state PLAYING
[18:30:44][I][adf_media_player:252]: current mp state: PLAYING
[18:30:44][I][adf_media_player:253]: anouncement: false
[18:30:44][I][adf_media_player:254]: play_intent: false
[18:30:44][I][adf_media_player:255]: current_uri_: yes
[18:30:44][D][adf_audio_element:324]: [http] Checking State for stopping, got 2
[18:30:44][D][adf_audio_element:324]: [decoder] Checking State for stopping, got 2
[18:30:44][E][esp_adf_pipeline:215]: HTTPStreamReader got in error state while STOPPING. Stopping pipeline!
[18:30:44][D][esp_adf_pipeline:437]: [MediaPlayer] Pipeline changed from ABORTING to DESTROYING. (REQ: 4)
[18:30:44][I][adf_media_player:189]: got new pipeline state: 11, while in MP state PLAYING
[18:30:44][I][adf_media_player:252]: current mp state: PLAYING
[18:30:44][I][adf_media_player:253]: anouncement: false
[18:30:44][I][adf_media_player:254]: play_intent: false
[18:30:44][I][adf_media_player:255]: current_uri_: yes
[18:30:44][E][esp_adf_pipeline:238]: Timeout while STOPPING. Stopping pipeline!
[18:30:44][D][esp_adf_pipeline:437]: [MediaPlayer] Pipeline changed from DESTROYING to DESTROYING. (REQ: 4)
[18:30:44][I][adf_media_player:189]: got new pipeline state: 11, while in MP state PLAYING
[18:30:44][I][adf_media_player:252]: current mp state: PLAYING
[18:30:44][I][adf_media_player:253]: anouncement: false
[18:30:44][I][adf_media_player:254]: play_intent: false
[18:30:44][I][adf_media_player:255]: current_uri_: yes
[18:30:44][D][esp_adf_pipeline:507]: Called deinit_all
[18:30:44][V][esp-idf:000][i2s_out]: W (41966) AUDIO_ELEMENT: IN-[i2s_out] AEL_IO_ABORT

[18:30:44][V][esp-idf:000][http]: I (41969) HTTP_STREAM: total_bytes=0

[18:30:44][V][esp-idf:000][decoder]: W (41968) AUDIO_ELEMENT: IN-[decoder] AEL_IO_ABORT

[18:30:44][V][esp-idf:000][decoder]: E (41975) MP3_DECODER: failed to read audio data (line 117)

[18:30:44][V][esp-idf:000][decoder]: W (41977) AUDIO_ELEMENT: [decoder] AEL_IO_ABORT, -3

[18:30:44][V][esp-idf:000][decoder]: I (41979) MP3_DECODER: Closed

[18:30:46][V][esp-idf:000][http]: W (43655) HTTP_STREAM: No output due to stopping

[18:30:46][V][esp-idf:000]: I (43660) AUDIO_PIPELINE: audio_pipeline_unlinked

[18:30:46][V][esp-idf:000]: W (43663) AUDIO_ELEMENT: [http] Element has not create when AUDIO_ELEMENT_TERMINATE

[18:30:46][V][esp-idf:000]: W (43664) AUDIO_ELEMENT: [decoder] Element has not create when AUDIO_ELEMENT_TERMINATE

[18:30:46][V][esp-idf:000]: W (43666) AUDIO_ELEMENT: [resampler] Element has not create when AUDIO_ELEMENT_TERMINATE

[18:30:46][V][esp-idf:000]: W (43668) AUDIO_ELEMENT: [i2s_out] Element has not create when AUDIO_ELEMENT_TERMINATE

[18:30:46][V][esp-idf:000]: I (43671) I2S: DMA queue destroyed

[18:30:46][V][esp-idf:000]: I (43673) I2S: DMA queue destroyed

[18:30:46][D][esp_adf_pipeline:437]: [MediaPlayer] Pipeline changed from DESTROYING to UNINITIALIZED. (REQ: 4)
[18:30:46][I][adf_media_player:189]: got new pipeline state: 0, while in MP state PLAYING
[18:30:46][I][adf_media_player:252]: current mp state: IDLE
[18:30:46][I][adf_media_player:253]: anouncement: false
[18:30:46][I][adf_media_player:254]: play_intent: false
[18:30:46][I][adf_media_player:255]: current_uri_: yes
[18:30:46][D][esp_adf_pipeline:437]: [MediaPlayer] Pipeline changed from UNINITIALIZED to UNINITIALIZED. (REQ: 4)
[18:30:46][I][adf_media_player:189]: got new pipeline state: 0, while in MP state IDLE
[18:30:46][I][adf_media_player:252]: current mp state: IDLE
[18:30:46][I][adf_media_player:253]: anouncement: false
[18:30:46][I][adf_media_player:254]: play_intent: false
[18:30:46][I][adf_media_player:255]: current_uri_: yes
[18:30:46][W][component:237]: Component adf_pipeline.media_player took a long time for an operation (1737 ms).
[18:30:46][W][component:238]: Components should block for at most 30 ms.

gnumpi commented 5 months ago

hmmm, I googled and found that the adf-resampler only supports 16bit as output. That's why it gets into an error state. So we need to find another solution... Probably a dirty hack as before and then wait for IDF-5 support in ESPHome.

nyok92 commented 5 months ago

ok, can't we use the mic with 44.1khz 16bits ? & set a bigger buffer size ?

gnumpi commented 5 months ago

It depends on which mic/adc is used by the onju. You can try it, you will probably need to include the re-sampler to the mic again then, as the microphone components expects 16kHz input signals.

gnumpi commented 5 months ago

I was just told that this config should be working with IDF 4.4.7 and the next-dev branch:

https://github.com/tbrasser/config/blob/main/esphome%2Fincludes%2Fonju-voice.yaml

tbrasser commented 5 months ago

Kind of, mostly testing audio output, which seems to be working at 16bit 44100Hz from MA (also selected 16/44.1 there), the esphome media_player shows as playing and volume control works. On Music Assistant side it doesn't show as playing though.

Reading through above stuff seems I misconfigured my mic input so I'll try to test some more things this weekend.

gnumpi commented 4 months ago

It seems to work now with the latest fixes in dev-next branch and this config:

- source:
      type: git
      url: https://github.com/gnumpi/esphome_audio
      ref: dev-next
    components: [ adf_pipeline, i2s_audio ]
    refresh: 0s

adf_pipeline:
  - platform: i2s_audio
    type: audio_out
    id: adf_i2s_out
    i2s_audio_id: i2s_shared
    i2s_dout_pin: GPIO12
    sample_rate: 16000
    adf_alc: true
    bits_per_sample: 32bit
    fixed_settings: true
    channel: left

  - platform: i2s_audio
    type: audio_in
    id: adf_i2s_in
    i2s_audio_id: i2s_shared
    i2s_din_pin: GPIO17
    channel: left
    pdm: false
    sample_rate: 16000
    bits_per_sample: 32bit
    fixed_settings: true


microphone:
  - platform: adf_pipeline
    id: onju_microphone
    keep_pipeline_alive: true
    gain_log2: 3
    pipeline:
      - adf_i2s_in
      - self

media_player:
  - platform: adf_pipeline
    id: onju_out
    name: None
    internal: false
    keep_pipeline_alive: true
    pipeline:
      - self
      - resampler
      - adf_i2s_out

nyok92 commented 4 months ago

Thanks, confirmed mww/voice assist & media player are working correctly with duplex mode ! :) :) :)

tetele / onju-voice-satellite