Closed craigcabrey closed 12 months ago
The problem
I have setup an Atom Echo as a voice assistant interface following the tutorial (https://www.home-assistant.io/projects/thirteen-usd-voice-remote/).
This works, however generated responses are not played back. Looking at the ESP logs, the Piper response URL sometimes ends in .raw, sometimes .mp3. The response needs to be in wav format, which I can manually do by changing the URL (see attached log snippet).
Unclear if related to #92528.
What version of Home Assistant Core has the issue?
core-2023.5.2
What was the last working version of Home Assistant Core?
No response
What type of installation are you running?
Home Assistant Container
Integration causing the issue
wyoming
Link to integration documentation on our website
No response
Diagnostics information
No response
Example YAML snippet
No response
Anything in the logs that might be useful for us?
[00:11:41][D][voice_assistant:112]: Response: "Sorry, I couldn't understand that" [00:11:41][D][voice_assistant:127]: Response URL: "http://ha.k8s.services.lan/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_47f5ba5b18_tts.piper.raw" [00:11:41][D][voice_assistant:132]: Assist Pipeline ended
Additional information
No response
Two additional notes:
So I suspect the pipeline itself is somehow corrupting the response audio stream.
Hi all, exactly the same here. I think esphome cannot handle the ".raw" audio file.
Manually playing back a Piper TTS (via the media entity) works fine.
What exactly are you playing? I tried to play back media-source://tts/tts.piper?message=Office+door+is+on&language=en-us&voice=en-us-ryan-medium&audio_output=mp3 (from the pipeline debug and from the ESPHome logs) and it doesn't work. I can't play that anywhere (Google Home, ESPHome media player, browser, VLC etc.). What does work is media-source://tts/tts.piper?message=Office+door+is+on&language=en-us&voice=en-us-ryan-medium&audio_output=wav (notice the wav
value of audio_output
at the end).
However, it looks like the Nabu Casa cloud TTS generated MP3s work on all devices above. I think that the format is messed up, it's not that the pipelines do anything to alter the format.
To me it looks like Piper is declaring one format and rendering another, thus creating invalid files. Also, I can't figure out which system states that the audio_output
should be mp3
. It seems that Wyoming's default is wav
.
It is fixed with esphome 2023.5.0. There is now speaker integration that supports Piper's raw stream.
I just upgraded to 2023.5.0 and reflashed my atom echo. It’s the same behavior — a raw stream does not play anything. Manually playing through the piper TTS integration produces a wav that plays fine.
You have to use the new "speaker" integration not "media player" Here is my working yaml for home brew hardware with a inmp441 and max98357a. The only issue is a missing volume control for the speaker. I hope it will come in the future.
i2s_audio:
i2s_lrclk_pin: GPIO18
i2s_bclk_pin: GPIO05
voice_assistant:
microphone: mic01
speaker: speaker01
microphone:
- platform: i2s_audio
id: mic01
i2s_din_pin: GPIO21
adc_type: external
pdm: false
speaker:
- platform: i2s_audio
id: speaker01
dac_type: external
i2s_dout_pin: GPIO17
mode: mono
@anekinloewe don't you get audio dropouts? I have the same issue on both Piper and Nabu Casa cloud TTS.
@anekinloewe don't you get audio dropouts? I have the same issue on both Piper and Nabu Casa cloud TTS.
I think that is a problem off piper voice quality. I'am waiting for medium or high quality voices.
Nope. Piper works great in my browser, for instance. And I use en-us-ryan-medium
.
Do you mind posting your entire config? It seems like the example config is completely broken :/
You mean the ESPHome config? I have a Muse Luxe, not an Atom Echo, but it has the same issue.
Here is the config https://gist.github.com/tetele/5cac735174527c3b373b10db8d9c8d77
I'm having this exact same problem, with both the Muse Luxe and the Atom Echo (the Luxe for a few weeks, and the Echo only after it arrived today). No TTS speech comes out of the ESPHome media_player, and the wyoming-piper integration is generating .mp3 and .raw files (seemingly at random), and never .wav.
Here is a snippet of my ESPHome log from the Echo:
[23:24:43][D][binary_sensor:036]: 'Button': Sending state ON
[23:24:43][D][voice_assistant:105]: Requesting start...
[23:24:43][D][voice_assistant:085]: Starting...
[23:24:43][D][voice_assistant:123]: Assist Pipeline running
[23:24:43][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:24:43][D][light:046]: State: ON
[23:24:43][D][light:058]: Red: 0%, Green: 0%, Blue: 100%
[23:24:45][D][binary_sensor:036]: 'Button': Sending state OFF
[23:24:45][D][voice_assistant:113]: Signaling stop...
[23:24:46][D][voice_assistant:137]: Speech recognised as: " Turn off the family room ceiling."
[23:24:46][D][voice_assistant:152]: Response: "Turned off light"
[23:24:46][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:24:46][D][light:058]: Red: 0%, Green: 100%, Blue: 0%
[23:24:46][D][voice_assistant:167]: Response URL: "http://192.168.17.10:8123/api/tts_proxy/db98156bf572727274889253f275cea21c83824c_en-us_718a3c601f_tts.piper.mp3"
[23:24:46][D][media_player:059]: 'atomecho-voice-assist-1' - Setting
[23:24:46][D][media_player:066]: Media URL: http://192.168.17.10:8123/api/tts_proxy/db98156bf572727274889253f275cea21c83824c_en-us_718a3c601f_tts.piper.mp3
[23:24:46][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:24:46][D][light:058]: Red: 0%, Green: 100%, Blue: 0%
[23:24:46][D][light:108]: Effect: 'Pulse'
[23:24:46][D][voice_assistant:172]: Assist Pipeline ended
[23:24:51][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:24:51][D][light:046]: State: OFF
[23:24:51][D][light:108]: Effect: 'None'
[23:25:01][D][binary_sensor:036]: 'Button': Sending state ON
[23:25:01][D][voice_assistant:105]: Requesting start...
[23:25:01][D][voice_assistant:085]: Starting...
[23:25:01][D][voice_assistant:123]: Assist Pipeline running
[23:25:01][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:25:01][D][light:046]: State: ON
[23:25:01][D][light:058]: Red: 0%, Green: 0%, Blue: 100%
[23:25:03][D][binary_sensor:036]: 'Button': Sending state OFF
[23:25:03][D][voice_assistant:113]: Signaling stop...
[23:25:04][D][voice_assistant:137]: Speech recognised as: " Turn on the family room feeling."
[23:25:04][D][voice_assistant:152]: Response: "Sorry, I couldn't understand that"
[23:25:04][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:25:04][D][light:058]: Red: 0%, Green: 100%, Blue: 0%
[23:25:04][D][voice_assistant:167]: Response URL: "http://192.168.17.10:8123/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_718a3c601f_tts.piper.mp3"
[23:25:04][D][media_player:059]: 'atomecho-voice-assist-1' - Setting
[23:25:04][D][media_player:066]: Media URL: http://192.168.17.10:8123/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_718a3c601f_tts.piper.mp3
[23:25:04][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:25:04][D][light:058]: Red: 0%, Green: 100%, Blue: 0%
[23:25:04][D][light:108]: Effect: 'Pulse'
[23:25:04][D][voice_assistant:172]: Assist Pipeline ended
. . .
[23:25:41][D][binary_sensor:036]: 'Button': Sending state ON
[23:25:41][D][voice_assistant:105]: Requesting start...
[23:25:41][D][voice_assistant:085]: Starting...
[23:25:41][D][voice_assistant:123]: Assist Pipeline running
[23:25:41][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:25:41][D][light:046]: State: ON
[23:25:42][D][light:058]: Red: 0%, Green: 0%, Blue: 100%
[23:25:44][D][binary_sensor:036]: 'Button': Sending state OFF
[23:25:44][D][voice_assistant:113]: Signaling stop...
[23:25:44][D][voice_assistant:137]: Speech recognised as: " Turn on the family room ceiling."
[23:25:44][D][voice_assistant:152]: Response: "Turned on light"
[23:25:44][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:25:44][D][light:058]: Red: 0%, Green: 100%, Blue: 0%
[23:25:44][D][voice_assistant:167]: Response URL: "http://192.168.17.10:8123/api/tts_proxy/c9423eae01959b2af87c0b8d21f861b36e9b0fec_en-us_718a3c601f_tts.piper.mp3"
[23:25:44][D][media_player:059]: 'atomecho-voice-assist-1' - Setting
[23:25:45][D][media_player:066]: Media URL: http://192.168.17.10:8123/api/tts_proxy/c9423eae01959b2af87c0b8d21f861b36e9b0fec_en-us_718a3c601f_tts.piper.mp3
[23:25:45][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:25:45][D][light:058]: Red: 0%, Green: 100%, Blue: 0%
[23:25:45][D][light:108]: Effect: 'Pulse'
[23:25:45][D][voice_assistant:172]: Assist Pipeline ended
[23:25:50][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:25:50][D][light:046]: State: OFF
[23:25:50][D][light:108]: Effect: 'None'
[23:25:54][D][binary_sensor:036]: 'Button': Sending state ON
[23:25:54][D][voice_assistant:105]: Requesting start...
[23:25:54][D][voice_assistant:085]: Starting...
[23:25:54][D][voice_assistant:123]: Assist Pipeline running
[23:25:54][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:25:54][D][light:046]: State: ON
[23:25:54][D][light:058]: Red: 0%, Green: 0%, Blue: 100%
[23:25:57][D][binary_sensor:036]: 'Button': Sending state OFF
[23:25:57][D][voice_assistant:113]: Signaling stop...
[23:25:57][D][voice_assistant:137]: Speech recognised as: " Turn off the family room soon."
[23:25:57][D][voice_assistant:152]: Response: "Sorry, I couldn't understand that"
[23:25:57][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:25:57][D][light:058]: Red: 0%, Green: 100%, Blue: 0%
[23:25:57][D][voice_assistant:167]: Response URL: "http://192.168.17.10:8123/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_718a3c601f_tts.piper.raw"
[23:25:57][D][media_player:059]: 'atomecho-voice-assist-1' - Setting
[23:25:57][D][media_player:066]: Media URL: http://192.168.17.10:8123/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_718a3c601f_tts.piper.raw
[23:25:57][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:25:57][D][light:058]: Red: 0%, Green: 100%, Blue: 0%
[23:25:57][D][light:108]: Effect: 'Pulse'
[23:25:58][D][voice_assistant:172]: Assist Pipeline ended
[23:25:59][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:25:59][D][light:046]: State: OFF
[23:25:59][D][light:108]: Effect: 'None'
[23:26:02][D][binary_sensor:036]: 'Button': Sending state ON
[23:26:03][D][voice_assistant:105]: Requesting start...
[23:26:03][D][voice_assistant:085]: Starting...
[23:26:03][D][voice_assistant:123]: Assist Pipeline running
[23:26:03][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:26:03][D][light:046]: State: ON
[23:26:03][D][light:058]: Red: 0%, Green: 0%, Blue: 100%
[23:26:05][D][binary_sensor:036]: 'Button': Sending state OFF
[23:26:05][D][voice_assistant:113]: Signaling stop...
[23:26:06][D][voice_assistant:137]: Speech recognised as: " Turn off the family room ceiling."
[23:26:06][D][voice_assistant:152]: Response: "Turned off light"
[23:26:06][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:26:06][D][light:058]: Red: 0%, Green: 100%, Blue: 0%
[23:26:06][D][voice_assistant:167]: Response URL: "http://192.168.17.10:8123/api/tts_proxy/db98156bf572727274889253f275cea21c83824c_en-us_718a3c601f_tts.piper.raw"
[23:26:06][D][media_player:059]: 'atomecho-voice-assist-1' - Setting
[23:26:06][D][media_player:066]: Media URL: http://192.168.17.10:8123/api/tts_proxy/db98156bf572727274889253f275cea21c83824c_en-us_718a3c601f_tts.piper.raw
[23:26:06][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:26:06][D][light:058]: Red: 0%, Green: 100%, Blue: 0%
[23:26:06][D][light:108]: Effect: 'Pulse'
[23:26:06][D][voice_assistant:172]: Assist Pipeline ended
[23:26:07][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:26:07][D][light:046]: State: OFF
[23:26:07][D][light:108]: Effect: 'None'
Using the Developer Tools in Home Assistant to generate piper TTS speech and sending it to either the Muse Luxe or the Atom Echo works fine. It's audible. With both the Luxe and the Echo it appears STT is working fine, as it does recognize my speech. But using the voice assistant pipeline TTS results in silence.
It appears that the ESPHome media_player is trying to play something - when I release the button, the green light pulses for a few seconds before the state reverts back to idle.
And for the record, I'm using a rhasspy/wyoming-piper container in a containerized (unsupervised) instance of Home Assistant, currently version 2023.5.2. I'm running the standard Atom Echo voice assistant yaml configuration, with ESPHome 2023.5.2. As I noted above, the mic works, and the media_player works standalone, so I think my devices are OK.
You have to use the new "speaker" integration not "media player" Here is my working yaml for home brew hardware with a inmp441 and max98357a. The only issue is a missing volume control for the speaker. I hope it will come in the future.
i2s_audio: i2s_lrclk_pin: GPIO18 i2s_bclk_pin: GPIO05 voice_assistant: microphone: mic01 speaker: speaker01 microphone: - platform: i2s_audio id: mic01 i2s_din_pin: GPIO21 adc_type: external pdm: false speaker: - platform: i2s_audio id: speaker01 dac_type: external i2s_dout_pin: GPIO17 mode: mono
Hello there! Yea now it works when I use the speaker component and not the media_player. Now I lost the ability to use the speaker I have as a media player for home assistant. I don’t see anymore any entity in HomeAssistant neither media player nor speaker. would be good if voice_assistant could use media_player component also to play the replies from voice assistant, not only the speaker component. Hopefully this will get fixed in the future.
Hey there @balloob, @synesthesiam, mind taking a look at this issue as it has been labeled with an integration (wyoming
) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)
wyoming documentation wyoming source (message by IssueLinks)
This should be fixed now in the latest ESPHome.
@synesthesiam latest = 2023.7?
There should be a new ESPHome version out today.
It's still not working for me when I use a media_player
for output, as the default voice assistant configuration does. A speaker
works fine but the volume can't be controlled.
And as noted in my comment above, two things: (1) no TTS comes out of the voice assistant, but I can send audio (including piper and other TTS!) directly from Home Assistant to the ESPHome media_player
and it works; and (2) the ESP device thinks it's playing audio, as the green pulsating LED continues pulsating for some period and then stops, as if it were speaking.
This is all using ESPHome 2023.7.0 and HA 2023.7.2.
Here are some logs from ESPHome 2023.7.0 on an Atom Echo:
[19:20:24][I][app:102]: ESPHome version 2023.7.0 compiled on Jul 19 2023, 19:13:04
[19:20:24][I][app:104]: Project m5stack.atom-echo version 1.0
[19:20:24][C][wifi:543]: WiFi:
[19:20:24][C][wifi:379]: Local MAC: 64:B7:08:80:31:68
[19:20:24][C][wifi:380]: SSID: [redacted]
[19:20:24][C][wifi:381]: IP Address: 192.168.11.16
[19:20:24][C][wifi:383]: BSSID: [redacted]
[19:20:24][C][wifi:384]: Hostname: 'atomecho-voice-assist-1'
[19:20:24][C][wifi:386]: Signal strength: -67 dB ▂▄▆█
[19:20:24][C][wifi:390]: Channel: 1
[19:20:24][C][wifi:391]: Subnet: 255.255.0.0
[19:20:24][C][wifi:392]: Gateway: 192.168.17.1
[19:20:24][C][wifi:393]: DNS1: 192.168.17.1
[19:20:24][C][wifi:394]: DNS2: 0.0.0.0
[19:20:24][C][logger:301]: Logger:
[19:20:24][C][logger:302]: Level: DEBUG
[19:20:24][C][logger:303]: Log Baud Rate: 115200
[19:20:24][C][logger:305]: Hardware UART: UART0
[19:20:24][C][esp32_rmt_led_strip:171]: ESP32 RMT LED Strip:
[19:20:24][C][esp32_rmt_led_strip:172]: Pin: 27
[19:20:24][C][esp32_rmt_led_strip:173]: Channel: 0
[19:20:24][C][esp32_rmt_led_strip:198]: RGB Order: GRB
[19:20:24][C][esp32_rmt_led_strip:199]: Max refresh rate: 0
[19:20:24][C][esp32_rmt_led_strip:200]: Number of LEDs: 1
[19:20:25][C][gpio.binary_sensor:015]: GPIO Binary Sensor 'Button'
[19:20:25][C][gpio.binary_sensor:016]: Pin: GPIO39
[19:20:25][C][light:103]: Light 'atomecho-voice-assist-1'
[19:20:25][C][light:105]: Default Transition Length: 0.0s
[19:20:25][C][light:106]: Gamma Correct: 2.80
[19:20:25][C][captive_portal:088]: Captive Portal:
[19:20:25][C][mdns:112]: mDNS:
[19:20:25][C][mdns:113]: Hostname: atomecho-voice-assist-1
[19:20:25][C][ota:093]: Over-The-Air Updates:
[19:20:25][C][ota:094]: Address: atomecho-voice-assist-1.local:3232
[19:20:25][C][api:138]: API Server:
[19:20:25][C][api:139]: Address: atomecho-voice-assist-1.local:6053
[19:20:25][C][api:141]: Using noise encryption: YES
[19:20:25][C][improv_serial:032]: Improv Serial:
[19:20:25][C][audio:203]: Audio:
[19:20:25][C][audio:225]: External DAC channels: 1
[19:20:25][C][audio:226]: I2S DOUT Pin: 22
[19:20:30][D][binary_sensor:036]: 'Button': Sending state ON
[19:20:31][D][voice_assistant:132]: Requesting start...
[19:20:31][D][voice_assistant:111]: Starting...
[19:20:32][D][voice_assistant:154]: Assist Pipeline running
[19:20:32][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:20:32][D][light:047]: State: ON
[19:20:32][D][light:059]: Red: 0%, Green: 0%, Blue: 100%
[19:20:33][D][binary_sensor:036]: 'Button': Sending state OFF
[19:20:33][D][voice_assistant:144]: Signaling stop...
[19:20:34][D][voice_assistant:168]: Speech recognised as: " Set the office desk lamp to white."
[19:20:34][D][voice_assistant:144]: Signaling stop...
[19:20:34][D][voice_assistant:192]: Response: "Color set"
[19:20:34][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:20:34][D][light:059]: Red: 0%, Green: 100%, Blue: 0%
[19:20:34][D][voice_assistant:207]: Response URL: "http://192.168.17.10:8123/api/tts_proxy/9a9f96af14ebec28fdc2f47c5ae5cfa7b4e512a4_en-us_718a3c601f_tts.piper.mp3"
[19:20:34][D][media_player:059]: 'atomecho-voice-assist-1' - Setting
[19:20:34][D][media_player:066]: Media URL: http://192.168.17.10:8123/api/tts_proxy/9a9f96af14ebec28fdc2f47c5ae5cfa7b4e512a4_en-us_718a3c601f_tts.piper.mp3
[19:20:34][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:20:34][D][light:059]: Red: 0%, Green: 100%, Blue: 0%
[19:20:34][D][light:109]: Effect: 'Pulse'
[19:20:34][W][component:204]: Component api took a long time for an operation (0.05 s).
[19:20:34][W][component:205]: Components should block for at most 20-30ms.
[19:20:35][W][component:204]: Component i2s_audio.media_player took a long time for an operation (0.55 s).
[19:20:35][W][component:205]: Components should block for at most 20-30ms.
[19:20:35][D][voice_assistant:218]: Assist Pipeline ended
[19:20:54][W][component:204]: Component i2s_audio.media_player took a long time for an operation (0.06 s).
[19:20:54][W][component:205]: Components should block for at most 20-30ms.
[19:20:55][W][component:204]: Component i2s_audio.media_player took a long time for an operation (0.46 s).
[19:20:55][W][component:205]: Components should block for at most 20-30ms.
[19:20:55][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:20:55][D][light:047]: State: OFF
[19:20:55][D][light:109]: Effect: 'None'
[19:21:07][D][binary_sensor:036]: 'Button': Sending state ON
[19:21:07][D][voice_assistant:132]: Requesting start...
[19:21:07][D][voice_assistant:111]: Starting...
[19:21:08][D][voice_assistant:154]: Assist Pipeline running
[19:21:08][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:21:08][D][light:047]: State: ON
[19:21:08][D][light:059]: Red: 0%, Green: 0%, Blue: 100%
[19:21:09][D][binary_sensor:036]: 'Button': Sending state OFF
[19:21:09][D][voice_assistant:144]: Signaling stop...
[19:21:10][D][voice_assistant:168]: Speech recognised as: " Set the office desk lamp to red."
[19:21:10][D][voice_assistant:144]: Signaling stop...
[19:21:10][D][voice_assistant:192]: Response: "Color set"
[19:21:10][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:21:10][D][light:059]: Red: 0%, Green: 100%, Blue: 0%
[19:21:10][D][voice_assistant:207]: Response URL: "http://192.168.17.10:8123/api/tts_proxy/9a9f96af14ebec28fdc2f47c5ae5cfa7b4e512a4_en-us_718a3c601f_tts.piper.raw"
[19:21:10][D][media_player:059]: 'atomecho-voice-assist-1' - Setting
[19:21:10][D][media_player:066]: Media URL: http://192.168.17.10:8123/api/tts_proxy/9a9f96af14ebec28fdc2f47c5ae5cfa7b4e512a4_en-us_718a3c601f_tts.piper.raw
[19:21:10][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:21:10][D][light:059]: Red: 0%, Green: 100%, Blue: 0%
[19:21:10][D][light:109]: Effect: 'Pulse'
[19:21:10][W][component:204]: Component api took a long time for an operation (0.05 s).
[19:21:10][W][component:205]: Components should block for at most 20-30ms.
[19:21:11][W][component:204]: Component i2s_audio.media_player took a long time for an operation (0.54 s).
[19:21:11][W][component:205]: Components should block for at most 20-30ms.
[19:21:11][D][voice_assistant:218]: Assist Pipeline ended
[19:21:11][W][component:204]: Component i2s_audio.media_player took a long time for an operation (0.46 s).
[19:21:11][W][component:205]: Components should block for at most 20-30ms.
[19:21:12][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:21:12][D][light:047]: State: OFF
[19:21:12][D][light:109]: Effect: 'None'
[19:21:28][D][binary_sensor:036]: 'Button': Sending state ON
[19:21:29][D][voice_assistant:132]: Requesting start...
[19:21:29][D][voice_assistant:111]: Starting...
[19:21:29][D][voice_assistant:154]: Assist Pipeline running
[19:21:29][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:21:29][D][light:047]: State: ON
[19:21:29][D][light:059]: Red: 0%, Green: 0%, Blue: 100%
[19:21:30][D][binary_sensor:036]: 'Button': Sending state OFF
[19:21:30][D][voice_assistant:144]: Signaling stop...
[19:21:31][D][voice_assistant:168]: Speech recognised as: " Turn off the office desk lamp."
[19:21:31][D][voice_assistant:144]: Signaling stop...
[19:21:31][D][voice_assistant:192]: Response: "Turned off light"
[19:21:31][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:21:31][D][light:059]: Red: 0%, Green: 100%, Blue: 0%
[19:21:31][D][voice_assistant:207]: Response URL: "http://192.168.17.10:8123/api/tts_proxy/db98156bf572727274889253f275cea21c83824c_en-us_718a3c601f_tts.piper.mp3"
[19:21:31][D][media_player:059]: 'atomecho-voice-assist-1' - Setting
[19:21:31][D][media_player:066]: Media URL: http://192.168.17.10:8123/api/tts_proxy/db98156bf572727274889253f275cea21c83824c_en-us_718a3c601f_tts.piper.mp3
[19:21:31][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:21:31][D][light:059]: Red: 0%, Green: 100%, Blue: 0%
[19:21:31][D][light:109]: Effect: 'Pulse'
[19:21:32][W][component:204]: Component i2s_audio.media_player took a long time for an operation (0.54 s).
[19:21:32][W][component:205]: Components should block for at most 20-30ms.
[19:21:32][D][voice_assistant:218]: Assist Pipeline ended
[19:21:39][W][component:204]: Component i2s_audio.media_player took a long time for an operation (0.06 s).
[19:21:39][W][component:205]: Components should block for at most 20-30ms.
[19:21:39][W][component:204]: Component i2s_audio.media_player took a long time for an operation (0.47 s).
[19:21:39][W][component:205]: Components should block for at most 20-30ms.
[19:21:39][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:21:39][D][light:047]: State: OFF
[19:21:39][D][light:109]: Effect: 'None'
I'm also getting the .raw
format when using either media_player
or speaker
in ESPHome, however the speaker component does playback the audio where the media player does not...
media_player log:
[10:36:24][D][voice_assistant:192]: Response: "Turned off light"
[10:36:24][D][voice_assistant:207]: Response URL: "http://192.168.1.9:8123/api/tts_proxy/db98156bf572727274889253f275cea21c83824c_en-gb_71e0aacf05_tts.piper.raw"
[10:36:24][D][media_player:059]: 'Media Player' - Setting
[10:36:24][D][media_player:066]: Media URL: http://192.168.1.9:8123/api/tts_proxy/db98156bf572727274889253f275cea21c83824c_en-gb_71e0aacf05_tts.piper.raw
speaker log:
[10:49:05][D][voice_assistant:192]: Response: "Turned on light"
[10:49:05][D][voice_assistant:207]: Response URL: "http://192.168.1.9:8123/api/tts_proxy/c9423eae01959b2af87c0b8d21f861b36e9b0fec_en-gb_4a8f4e3e86_tts.piper.raw"
[10:49:05][D][voice_assistant:218]: Assist Pipeline ended
I can confirm @grahambrown11, i have the same issue.
I also can confirm that i have the same issue. It is problematic because such messages cannot be played on other media_player..
Same problems here:
Maybe a solution/workaround would be to add raw format support to ESP32-audioI2S? This way, the media player would be able to play raw files, eliminating the need for the speaker component altogether.
I see a few references to CODEC_MP3 in https://github.com/esphome/ESP32-audioI2S/blob/07cb6eb71fbc47d45185270b5c84c762a126bbc3/src/Audio.cpp. Adding a new raw "codec" shouldn't be that hard, since it's exactly what the i2s function expects. I might give it a try soonish.
This was actually trivial to implement: https://github.com/esphome/ESP32-audioI2S/pull/12
You can test it with the following steps:
i2s_audio
componentlogin
to get a shelldocker exec -it addon_5c53de3b_esphome bash
mkdir -p /config/esphome/my_components/
i2s_audio
component: cp -ar /esphome/esphome/components/i2s_audio /config/esphome/my_components/
ESP32-audioI2S
librarymkdir -p /config/esphome/my_libs/
cd /config/esphome/my_libs/ && git clone https://github.com/robin-thoni/ESP32-audioI2S
ESP32-audioI2S
libraryesphome/my_components/i2s_audio
and esphome/my_libs/ESP32-audioI2S
foldersesphome/my_components/i2s_audio/media_player/__init__.py
(NOT the __init__.py
file at the root of the component, the one in the media_player
folder)cg.add_library("esphome/ESP32-audioI2S", "2.0.7")
by cg.add_library("file:///config/esphome/my_libs/ESP32-audioI2S", None)
external_components:
- source:
type: local
path: my_components
components: [i2s_audio]
It should now play the .raw files generated by HA, without eating the end of the file. Here's a quick demo: https://owncloud.rthoni.com/s/DfcrJXLZoRLpFpq
Oh please someone merge that PR!
Thanks a lot for the PR @synesthesiam
Your changes are merged to the dev branch.
So just using the dev version of HA, will it generate mp3 instead of raw or do I need to set parameters?
It will generate mp3 by default now, except for when ESPHome is streaming the response to the device (WAV in that case).
Thank's a lot for the PR @synesthesiam. Will it solve that ESPHome usecase, when the tts source is from https url or we still need to wait for the issue's secured solution?
You're welcome @cociweb!
I don't think this will help Internet radio station playing. The fix only covers text to speech.
I mean tts from https source.
@synesthesiam , I tested the fix using the de version of HA and it is working fine with an esphome media player.
thank you once again for the fix.
The problem
I have setup an Atom Echo as a voice assistant interface following the tutorial (https://www.home-assistant.io/projects/thirteen-usd-voice-remote/).
This works, however generated responses are not played back. Looking at the ESP logs, the Piper response URL sometimes ends in .raw, sometimes .mp3. The response needs to be in wav format, which I can manually do by changing the URL (see attached log snippet).
Unclear if related to https://github.com/home-assistant/core/issues/92528.
What version of Home Assistant Core has the issue?
core-2023.5.2
What was the last working version of Home Assistant Core?
No response
What type of installation are you running?
Home Assistant Container
Integration causing the issue
wyoming
Link to integration documentation on our website
No response
Diagnostics information
No response
Example YAML snippet
No response
Anything in the logs that might be useful for us?
Additional information
No response