home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
73.41k stars 30.65k forks source link

Wyoming integration returning incorrect URLs from piper #92969

Closed craigcabrey closed 12 months ago

craigcabrey commented 1 year ago

The problem

I have setup an Atom Echo as a voice assistant interface following the tutorial (https://www.home-assistant.io/projects/thirteen-usd-voice-remote/).

This works, however generated responses are not played back. Looking at the ESP logs, the Piper response URL sometimes ends in .raw, sometimes .mp3. The response needs to be in wav format, which I can manually do by changing the URL (see attached log snippet).

Unclear if related to https://github.com/home-assistant/core/issues/92528.

What version of Home Assistant Core has the issue?

core-2023.5.2

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Container

Integration causing the issue

wyoming

Link to integration documentation on our website

No response

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

[00:11:41][D][voice_assistant:112]: Response: "Sorry, I couldn't understand that"
[00:11:41][D][voice_assistant:127]: Response URL: "http://ha.k8s.services.lan/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_47f5ba5b18_tts.piper.raw"
[00:11:41][D][voice_assistant:132]: Assist Pipeline ended

Additional information

No response

evohe4848 commented 1 year ago

The problem

I have setup an Atom Echo as a voice assistant interface following the tutorial (https://www.home-assistant.io/projects/thirteen-usd-voice-remote/).

This works, however generated responses are not played back. Looking at the ESP logs, the Piper response URL sometimes ends in .raw, sometimes .mp3. The response needs to be in wav format, which I can manually do by changing the URL (see attached log snippet).

Unclear if related to #92528.

What version of Home Assistant Core has the issue?

core-2023.5.2

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Container

Integration causing the issue

wyoming

Link to integration documentation on our website

No response

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

[00:11:41][D][voice_assistant:112]: Response: "Sorry, I couldn't understand that"
[00:11:41][D][voice_assistant:127]: Response URL: "http://ha.k8s.services.lan/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_47f5ba5b18_tts.piper.raw"
[00:11:41][D][voice_assistant:132]: Assist Pipeline ended

Additional information

No response

craigcabrey commented 1 year ago

Two additional notes:

  1. Manually playing back a Piper TTS (via the media entity) works fine.
  2. Playing back the audio via the assist debugger results in an error.

So I suspect the pipeline itself is somehow corrupting the response audio stream.

anekinloewe commented 1 year ago

Hi all, exactly the same here. I think esphome cannot handle the ".raw" audio file.

tetele commented 1 year ago

Manually playing back a Piper TTS (via the media entity) works fine.

What exactly are you playing? I tried to play back media-source://tts/tts.piper?message=Office+door+is+on&language=en-us&voice=en-us-ryan-medium&audio_output=mp3 (from the pipeline debug and from the ESPHome logs) and it doesn't work. I can't play that anywhere (Google Home, ESPHome media player, browser, VLC etc.). What does work is media-source://tts/tts.piper?message=Office+door+is+on&language=en-us&voice=en-us-ryan-medium&audio_output=wav (notice the wav value of audio_output at the end).

However, it looks like the Nabu Casa cloud TTS generated MP3s work on all devices above. I think that the format is messed up, it's not that the pipelines do anything to alter the format.

To me it looks like Piper is declaring one format and rendering another, thus creating invalid files. Also, I can't figure out which system states that the audio_output should be mp3. It seems that Wyoming's default is wav.

anekinloewe commented 1 year ago

It is fixed with esphome 2023.5.0. There is now speaker integration that supports Piper's raw stream.

craigcabrey commented 1 year ago

I just upgraded to 2023.5.0 and reflashed my atom echo. It’s the same behavior — a raw stream does not play anything. Manually playing through the piper TTS integration produces a wav that plays fine.

anekinloewe commented 1 year ago

You have to use the new "speaker" integration not "media player" Here is my working yaml for home brew hardware with a inmp441 and max98357a. The only issue is a missing volume control for the speaker. I hope it will come in the future.

i2s_audio:
  i2s_lrclk_pin: GPIO18
  i2s_bclk_pin: GPIO05

voice_assistant:
  microphone: mic01
  speaker: speaker01

microphone:
  - platform: i2s_audio
    id: mic01
    i2s_din_pin: GPIO21
    adc_type: external
    pdm: false

speaker:
  - platform: i2s_audio
    id: speaker01
    dac_type: external
    i2s_dout_pin: GPIO17
    mode: mono 
tetele commented 1 year ago

@anekinloewe don't you get audio dropouts? I have the same issue on both Piper and Nabu Casa cloud TTS.

https://imgur.com/a/9dc6AyJ

anekinloewe commented 1 year ago

@anekinloewe don't you get audio dropouts? I have the same issue on both Piper and Nabu Casa cloud TTS.

I think that is a problem off piper voice quality. I'am waiting for medium or high quality voices.

tetele commented 1 year ago

Nope. Piper works great in my browser, for instance. And I use en-us-ryan-medium.

craigcabrey commented 1 year ago

Do you mind posting your entire config? It seems like the example config is completely broken :/

tetele commented 1 year ago

You mean the ESPHome config? I have a Muse Luxe, not an Atom Echo, but it has the same issue.

Here is the config https://gist.github.com/tetele/5cac735174527c3b373b10db8d9c8d77

wixoff commented 1 year ago

I'm having this exact same problem, with both the Muse Luxe and the Atom Echo (the Luxe for a few weeks, and the Echo only after it arrived today). No TTS speech comes out of the ESPHome media_player, and the wyoming-piper integration is generating .mp3 and .raw files (seemingly at random), and never .wav.

Here is a snippet of my ESPHome log from the Echo:

[23:24:43][D][binary_sensor:036]: 'Button': Sending state ON
[23:24:43][D][voice_assistant:105]: Requesting start...
[23:24:43][D][voice_assistant:085]: Starting...
[23:24:43][D][voice_assistant:123]: Assist Pipeline running
[23:24:43][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:24:43][D][light:046]:   State: ON
[23:24:43][D][light:058]:   Red: 0%, Green: 0%, Blue: 100%
[23:24:45][D][binary_sensor:036]: 'Button': Sending state OFF
[23:24:45][D][voice_assistant:113]: Signaling stop...
[23:24:46][D][voice_assistant:137]: Speech recognised as: " Turn off the family room ceiling."
[23:24:46][D][voice_assistant:152]: Response: "Turned off light"
[23:24:46][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:24:46][D][light:058]:   Red: 0%, Green: 100%, Blue: 0%
[23:24:46][D][voice_assistant:167]: Response URL: "http://192.168.17.10:8123/api/tts_proxy/db98156bf572727274889253f275cea21c83824c_en-us_718a3c601f_tts.piper.mp3"
[23:24:46][D][media_player:059]: 'atomecho-voice-assist-1' - Setting
[23:24:46][D][media_player:066]:   Media URL: http://192.168.17.10:8123/api/tts_proxy/db98156bf572727274889253f275cea21c83824c_en-us_718a3c601f_tts.piper.mp3
[23:24:46][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:24:46][D][light:058]:   Red: 0%, Green: 100%, Blue: 0%
[23:24:46][D][light:108]:   Effect: 'Pulse'
[23:24:46][D][voice_assistant:172]: Assist Pipeline ended
[23:24:51][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:24:51][D][light:046]:   State: OFF
[23:24:51][D][light:108]:   Effect: 'None'
[23:25:01][D][binary_sensor:036]: 'Button': Sending state ON
[23:25:01][D][voice_assistant:105]: Requesting start...
[23:25:01][D][voice_assistant:085]: Starting...
[23:25:01][D][voice_assistant:123]: Assist Pipeline running
[23:25:01][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:25:01][D][light:046]:   State: ON
[23:25:01][D][light:058]:   Red: 0%, Green: 0%, Blue: 100%
[23:25:03][D][binary_sensor:036]: 'Button': Sending state OFF
[23:25:03][D][voice_assistant:113]: Signaling stop...
[23:25:04][D][voice_assistant:137]: Speech recognised as: " Turn on the family room feeling."
[23:25:04][D][voice_assistant:152]: Response: "Sorry, I couldn't understand that"
[23:25:04][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:25:04][D][light:058]:   Red: 0%, Green: 100%, Blue: 0%
[23:25:04][D][voice_assistant:167]: Response URL: "http://192.168.17.10:8123/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_718a3c601f_tts.piper.mp3"
[23:25:04][D][media_player:059]: 'atomecho-voice-assist-1' - Setting
[23:25:04][D][media_player:066]:   Media URL: http://192.168.17.10:8123/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_718a3c601f_tts.piper.mp3
[23:25:04][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:25:04][D][light:058]:   Red: 0%, Green: 100%, Blue: 0%
[23:25:04][D][light:108]:   Effect: 'Pulse'
[23:25:04][D][voice_assistant:172]: Assist Pipeline ended

. . .

[23:25:41][D][binary_sensor:036]: 'Button': Sending state ON
[23:25:41][D][voice_assistant:105]: Requesting start...
[23:25:41][D][voice_assistant:085]: Starting...
[23:25:41][D][voice_assistant:123]: Assist Pipeline running
[23:25:41][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:25:41][D][light:046]:   State: ON
[23:25:42][D][light:058]:   Red: 0%, Green: 0%, Blue: 100%
[23:25:44][D][binary_sensor:036]: 'Button': Sending state OFF
[23:25:44][D][voice_assistant:113]: Signaling stop...
[23:25:44][D][voice_assistant:137]: Speech recognised as: " Turn on the family room ceiling."
[23:25:44][D][voice_assistant:152]: Response: "Turned on light"
[23:25:44][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:25:44][D][light:058]:   Red: 0%, Green: 100%, Blue: 0%
[23:25:44][D][voice_assistant:167]: Response URL: "http://192.168.17.10:8123/api/tts_proxy/c9423eae01959b2af87c0b8d21f861b36e9b0fec_en-us_718a3c601f_tts.piper.mp3"
[23:25:44][D][media_player:059]: 'atomecho-voice-assist-1' - Setting
[23:25:45][D][media_player:066]:   Media URL: http://192.168.17.10:8123/api/tts_proxy/c9423eae01959b2af87c0b8d21f861b36e9b0fec_en-us_718a3c601f_tts.piper.mp3
[23:25:45][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:25:45][D][light:058]:   Red: 0%, Green: 100%, Blue: 0%
[23:25:45][D][light:108]:   Effect: 'Pulse'
[23:25:45][D][voice_assistant:172]: Assist Pipeline ended
[23:25:50][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:25:50][D][light:046]:   State: OFF
[23:25:50][D][light:108]:   Effect: 'None'
[23:25:54][D][binary_sensor:036]: 'Button': Sending state ON
[23:25:54][D][voice_assistant:105]: Requesting start...
[23:25:54][D][voice_assistant:085]: Starting...
[23:25:54][D][voice_assistant:123]: Assist Pipeline running
[23:25:54][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:25:54][D][light:046]:   State: ON
[23:25:54][D][light:058]:   Red: 0%, Green: 0%, Blue: 100%
[23:25:57][D][binary_sensor:036]: 'Button': Sending state OFF
[23:25:57][D][voice_assistant:113]: Signaling stop...
[23:25:57][D][voice_assistant:137]: Speech recognised as: " Turn off the family room soon."
[23:25:57][D][voice_assistant:152]: Response: "Sorry, I couldn't understand that"
[23:25:57][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:25:57][D][light:058]:   Red: 0%, Green: 100%, Blue: 0%
[23:25:57][D][voice_assistant:167]: Response URL: "http://192.168.17.10:8123/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_718a3c601f_tts.piper.raw"
[23:25:57][D][media_player:059]: 'atomecho-voice-assist-1' - Setting
[23:25:57][D][media_player:066]:   Media URL: http://192.168.17.10:8123/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_718a3c601f_tts.piper.raw
[23:25:57][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:25:57][D][light:058]:   Red: 0%, Green: 100%, Blue: 0%
[23:25:57][D][light:108]:   Effect: 'Pulse'
[23:25:58][D][voice_assistant:172]: Assist Pipeline ended
[23:25:59][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:25:59][D][light:046]:   State: OFF
[23:25:59][D][light:108]:   Effect: 'None'
[23:26:02][D][binary_sensor:036]: 'Button': Sending state ON
[23:26:03][D][voice_assistant:105]: Requesting start...
[23:26:03][D][voice_assistant:085]: Starting...
[23:26:03][D][voice_assistant:123]: Assist Pipeline running
[23:26:03][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:26:03][D][light:046]:   State: ON
[23:26:03][D][light:058]:   Red: 0%, Green: 0%, Blue: 100%
[23:26:05][D][binary_sensor:036]: 'Button': Sending state OFF
[23:26:05][D][voice_assistant:113]: Signaling stop...
[23:26:06][D][voice_assistant:137]: Speech recognised as: " Turn off the family room ceiling."
[23:26:06][D][voice_assistant:152]: Response: "Turned off light"
[23:26:06][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:26:06][D][light:058]:   Red: 0%, Green: 100%, Blue: 0%
[23:26:06][D][voice_assistant:167]: Response URL: "http://192.168.17.10:8123/api/tts_proxy/db98156bf572727274889253f275cea21c83824c_en-us_718a3c601f_tts.piper.raw"
[23:26:06][D][media_player:059]: 'atomecho-voice-assist-1' - Setting
[23:26:06][D][media_player:066]:   Media URL: http://192.168.17.10:8123/api/tts_proxy/db98156bf572727274889253f275cea21c83824c_en-us_718a3c601f_tts.piper.raw
[23:26:06][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:26:06][D][light:058]:   Red: 0%, Green: 100%, Blue: 0%
[23:26:06][D][light:108]:   Effect: 'Pulse'
[23:26:06][D][voice_assistant:172]: Assist Pipeline ended
[23:26:07][D][light:035]: 'atomecho-voice-assist-1' Setting:
[23:26:07][D][light:046]:   State: OFF
[23:26:07][D][light:108]:   Effect: 'None'

Using the Developer Tools in Home Assistant to generate piper TTS speech and sending it to either the Muse Luxe or the Atom Echo works fine. It's audible. With both the Luxe and the Echo it appears STT is working fine, as it does recognize my speech. But using the voice assistant pipeline TTS results in silence.

It appears that the ESPHome media_player is trying to play something - when I release the button, the green light pulses for a few seconds before the state reverts back to idle.

And for the record, I'm using a rhasspy/wyoming-piper container in a containerized (unsupervised) instance of Home Assistant, currently version 2023.5.2. I'm running the standard Atom Echo voice assistant yaml configuration, with ESPHome 2023.5.2. As I noted above, the mic works, and the media_player works standalone, so I think my devices are OK.

Blendi commented 1 year ago

You have to use the new "speaker" integration not "media player" Here is my working yaml for home brew hardware with a inmp441 and max98357a. The only issue is a missing volume control for the speaker. I hope it will come in the future.

i2s_audio:
  i2s_lrclk_pin: GPIO18
  i2s_bclk_pin: GPIO05

voice_assistant:
  microphone: mic01
  speaker: speaker01

microphone:
  - platform: i2s_audio
    id: mic01
    i2s_din_pin: GPIO21
    adc_type: external
    pdm: false

speaker:
  - platform: i2s_audio
    id: speaker01
    dac_type: external
    i2s_dout_pin: GPIO17
    mode: mono 

Hello there! Yea now it works when I use the speaker component and not the media_player. Now I lost the ability to use the speaker I have as a media player for home assistant. I don’t see anymore any entity in HomeAssistant neither media player nor speaker. would be good if voice_assistant could use media_player component also to play the replies from voice assistant, not only the speaker component. Hopefully this will get fixed in the future.

home-assistant[bot] commented 1 year ago

Hey there @balloob, @synesthesiam, mind taking a look at this issue as it has been labeled with an integration (wyoming) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `wyoming` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign wyoming` Removes the current integration label and assignees on the issue, add the integration domain after the command.

(message by CodeOwnersMention)


wyoming documentation wyoming source (message by IssueLinks)

synesthesiam commented 1 year ago

This should be fixed now in the latest ESPHome.

tetele commented 1 year ago

@synesthesiam latest = 2023.7?

synesthesiam commented 1 year ago

There should be a new ESPHome version out today.

wixoff commented 1 year ago

It's still not working for me when I use a media_player for output, as the default voice assistant configuration does. A speaker works fine but the volume can't be controlled.

And as noted in my comment above, two things: (1) no TTS comes out of the voice assistant, but I can send audio (including piper and other TTS!) directly from Home Assistant to the ESPHome media_player and it works; and (2) the ESP device thinks it's playing audio, as the green pulsating LED continues pulsating for some period and then stops, as if it were speaking.

This is all using ESPHome 2023.7.0 and HA 2023.7.2.

Here are some logs from ESPHome 2023.7.0 on an Atom Echo:

[19:20:24][I][app:102]: ESPHome version 2023.7.0 compiled on Jul 19 2023, 19:13:04
[19:20:24][I][app:104]: Project m5stack.atom-echo version 1.0
[19:20:24][C][wifi:543]: WiFi:
[19:20:24][C][wifi:379]:   Local MAC: 64:B7:08:80:31:68
[19:20:24][C][wifi:380]:   SSID: [redacted]
[19:20:24][C][wifi:381]:   IP Address: 192.168.11.16
[19:20:24][C][wifi:383]:   BSSID: [redacted]
[19:20:24][C][wifi:384]:   Hostname: 'atomecho-voice-assist-1'
[19:20:24][C][wifi:386]:   Signal strength: -67 dB ▂▄▆█
[19:20:24][C][wifi:390]:   Channel: 1
[19:20:24][C][wifi:391]:   Subnet: 255.255.0.0
[19:20:24][C][wifi:392]:   Gateway: 192.168.17.1
[19:20:24][C][wifi:393]:   DNS1: 192.168.17.1
[19:20:24][C][wifi:394]:   DNS2: 0.0.0.0
[19:20:24][C][logger:301]: Logger:
[19:20:24][C][logger:302]:   Level: DEBUG
[19:20:24][C][logger:303]:   Log Baud Rate: 115200
[19:20:24][C][logger:305]:   Hardware UART: UART0
[19:20:24][C][esp32_rmt_led_strip:171]: ESP32 RMT LED Strip:
[19:20:24][C][esp32_rmt_led_strip:172]:   Pin: 27
[19:20:24][C][esp32_rmt_led_strip:173]:   Channel: 0
[19:20:24][C][esp32_rmt_led_strip:198]:   RGB Order: GRB
[19:20:24][C][esp32_rmt_led_strip:199]:   Max refresh rate: 0
[19:20:24][C][esp32_rmt_led_strip:200]:   Number of LEDs: 1
[19:20:25][C][gpio.binary_sensor:015]: GPIO Binary Sensor 'Button'
[19:20:25][C][gpio.binary_sensor:016]:   Pin: GPIO39
[19:20:25][C][light:103]: Light 'atomecho-voice-assist-1'
[19:20:25][C][light:105]:   Default Transition Length: 0.0s
[19:20:25][C][light:106]:   Gamma Correct: 2.80
[19:20:25][C][captive_portal:088]: Captive Portal:
[19:20:25][C][mdns:112]: mDNS:
[19:20:25][C][mdns:113]:   Hostname: atomecho-voice-assist-1
[19:20:25][C][ota:093]: Over-The-Air Updates:
[19:20:25][C][ota:094]:   Address: atomecho-voice-assist-1.local:3232
[19:20:25][C][api:138]: API Server:
[19:20:25][C][api:139]:   Address: atomecho-voice-assist-1.local:6053
[19:20:25][C][api:141]:   Using noise encryption: YES
[19:20:25][C][improv_serial:032]: Improv Serial:
[19:20:25][C][audio:203]: Audio:
[19:20:25][C][audio:225]:   External DAC channels: 1
[19:20:25][C][audio:226]:   I2S DOUT Pin: 22
[19:20:30][D][binary_sensor:036]: 'Button': Sending state ON
[19:20:31][D][voice_assistant:132]: Requesting start...
[19:20:31][D][voice_assistant:111]: Starting...
[19:20:32][D][voice_assistant:154]: Assist Pipeline running
[19:20:32][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:20:32][D][light:047]:   State: ON
[19:20:32][D][light:059]:   Red: 0%, Green: 0%, Blue: 100%
[19:20:33][D][binary_sensor:036]: 'Button': Sending state OFF
[19:20:33][D][voice_assistant:144]: Signaling stop...
[19:20:34][D][voice_assistant:168]: Speech recognised as: " Set the office desk lamp to white."
[19:20:34][D][voice_assistant:144]: Signaling stop...
[19:20:34][D][voice_assistant:192]: Response: "Color set"
[19:20:34][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:20:34][D][light:059]:   Red: 0%, Green: 100%, Blue: 0%
[19:20:34][D][voice_assistant:207]: Response URL: "http://192.168.17.10:8123/api/tts_proxy/9a9f96af14ebec28fdc2f47c5ae5cfa7b4e512a4_en-us_718a3c601f_tts.piper.mp3"
[19:20:34][D][media_player:059]: 'atomecho-voice-assist-1' - Setting
[19:20:34][D][media_player:066]:   Media URL: http://192.168.17.10:8123/api/tts_proxy/9a9f96af14ebec28fdc2f47c5ae5cfa7b4e512a4_en-us_718a3c601f_tts.piper.mp3
[19:20:34][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:20:34][D][light:059]:   Red: 0%, Green: 100%, Blue: 0%
[19:20:34][D][light:109]:   Effect: 'Pulse'
[19:20:34][W][component:204]: Component api took a long time for an operation (0.05 s).
[19:20:34][W][component:205]: Components should block for at most 20-30ms.
[19:20:35][W][component:204]: Component i2s_audio.media_player took a long time for an operation (0.55 s).
[19:20:35][W][component:205]: Components should block for at most 20-30ms.
[19:20:35][D][voice_assistant:218]: Assist Pipeline ended
[19:20:54][W][component:204]: Component i2s_audio.media_player took a long time for an operation (0.06 s).
[19:20:54][W][component:205]: Components should block for at most 20-30ms.
[19:20:55][W][component:204]: Component i2s_audio.media_player took a long time for an operation (0.46 s).
[19:20:55][W][component:205]: Components should block for at most 20-30ms.
[19:20:55][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:20:55][D][light:047]:   State: OFF
[19:20:55][D][light:109]:   Effect: 'None'
[19:21:07][D][binary_sensor:036]: 'Button': Sending state ON
[19:21:07][D][voice_assistant:132]: Requesting start...
[19:21:07][D][voice_assistant:111]: Starting...
[19:21:08][D][voice_assistant:154]: Assist Pipeline running
[19:21:08][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:21:08][D][light:047]:   State: ON
[19:21:08][D][light:059]:   Red: 0%, Green: 0%, Blue: 100%
[19:21:09][D][binary_sensor:036]: 'Button': Sending state OFF
[19:21:09][D][voice_assistant:144]: Signaling stop...
[19:21:10][D][voice_assistant:168]: Speech recognised as: " Set the office desk lamp to red."
[19:21:10][D][voice_assistant:144]: Signaling stop...
[19:21:10][D][voice_assistant:192]: Response: "Color set"
[19:21:10][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:21:10][D][light:059]:   Red: 0%, Green: 100%, Blue: 0%
[19:21:10][D][voice_assistant:207]: Response URL: "http://192.168.17.10:8123/api/tts_proxy/9a9f96af14ebec28fdc2f47c5ae5cfa7b4e512a4_en-us_718a3c601f_tts.piper.raw"
[19:21:10][D][media_player:059]: 'atomecho-voice-assist-1' - Setting
[19:21:10][D][media_player:066]:   Media URL: http://192.168.17.10:8123/api/tts_proxy/9a9f96af14ebec28fdc2f47c5ae5cfa7b4e512a4_en-us_718a3c601f_tts.piper.raw
[19:21:10][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:21:10][D][light:059]:   Red: 0%, Green: 100%, Blue: 0%
[19:21:10][D][light:109]:   Effect: 'Pulse'
[19:21:10][W][component:204]: Component api took a long time for an operation (0.05 s).
[19:21:10][W][component:205]: Components should block for at most 20-30ms.
[19:21:11][W][component:204]: Component i2s_audio.media_player took a long time for an operation (0.54 s).
[19:21:11][W][component:205]: Components should block for at most 20-30ms.
[19:21:11][D][voice_assistant:218]: Assist Pipeline ended
[19:21:11][W][component:204]: Component i2s_audio.media_player took a long time for an operation (0.46 s).
[19:21:11][W][component:205]: Components should block for at most 20-30ms.
[19:21:12][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:21:12][D][light:047]:   State: OFF
[19:21:12][D][light:109]:   Effect: 'None'
[19:21:28][D][binary_sensor:036]: 'Button': Sending state ON
[19:21:29][D][voice_assistant:132]: Requesting start...
[19:21:29][D][voice_assistant:111]: Starting...
[19:21:29][D][voice_assistant:154]: Assist Pipeline running
[19:21:29][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:21:29][D][light:047]:   State: ON
[19:21:29][D][light:059]:   Red: 0%, Green: 0%, Blue: 100%
[19:21:30][D][binary_sensor:036]: 'Button': Sending state OFF
[19:21:30][D][voice_assistant:144]: Signaling stop...
[19:21:31][D][voice_assistant:168]: Speech recognised as: " Turn off the office desk lamp."
[19:21:31][D][voice_assistant:144]: Signaling stop...
[19:21:31][D][voice_assistant:192]: Response: "Turned off light"
[19:21:31][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:21:31][D][light:059]:   Red: 0%, Green: 100%, Blue: 0%
[19:21:31][D][voice_assistant:207]: Response URL: "http://192.168.17.10:8123/api/tts_proxy/db98156bf572727274889253f275cea21c83824c_en-us_718a3c601f_tts.piper.mp3"
[19:21:31][D][media_player:059]: 'atomecho-voice-assist-1' - Setting
[19:21:31][D][media_player:066]:   Media URL: http://192.168.17.10:8123/api/tts_proxy/db98156bf572727274889253f275cea21c83824c_en-us_718a3c601f_tts.piper.mp3
[19:21:31][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:21:31][D][light:059]:   Red: 0%, Green: 100%, Blue: 0%
[19:21:31][D][light:109]:   Effect: 'Pulse'
[19:21:32][W][component:204]: Component i2s_audio.media_player took a long time for an operation (0.54 s).
[19:21:32][W][component:205]: Components should block for at most 20-30ms.
[19:21:32][D][voice_assistant:218]: Assist Pipeline ended
[19:21:39][W][component:204]: Component i2s_audio.media_player took a long time for an operation (0.06 s).
[19:21:39][W][component:205]: Components should block for at most 20-30ms.
[19:21:39][W][component:204]: Component i2s_audio.media_player took a long time for an operation (0.47 s).
[19:21:39][W][component:205]: Components should block for at most 20-30ms.
[19:21:39][D][light:036]: 'atomecho-voice-assist-1' Setting:
[19:21:39][D][light:047]:   State: OFF
[19:21:39][D][light:109]:   Effect: 'None'
grahambrown11 commented 1 year ago

I'm also getting the .raw format when using either media_player or speaker in ESPHome, however the speaker component does playback the audio where the media player does not...

media_player log:

[10:36:24][D][voice_assistant:192]: Response: "Turned off light"
[10:36:24][D][voice_assistant:207]: Response URL: "http://192.168.1.9:8123/api/tts_proxy/db98156bf572727274889253f275cea21c83824c_en-gb_71e0aacf05_tts.piper.raw"
[10:36:24][D][media_player:059]: 'Media Player' - Setting
[10:36:24][D][media_player:066]:   Media URL: http://192.168.1.9:8123/api/tts_proxy/db98156bf572727274889253f275cea21c83824c_en-gb_71e0aacf05_tts.piper.raw

speaker log:

[10:49:05][D][voice_assistant:192]: Response: "Turned on light"
[10:49:05][D][voice_assistant:207]: Response URL: "http://192.168.1.9:8123/api/tts_proxy/c9423eae01959b2af87c0b8d21f861b36e9b0fec_en-gb_4a8f4e3e86_tts.piper.raw"
[10:49:05][D][voice_assistant:218]: Assist Pipeline ended
RDG88 commented 1 year ago

I can confirm @grahambrown11, i have the same issue.

witold-gren commented 1 year ago

I also can confirm that i have the same issue. It is problematic because such messages cannot be played on other media_player..

robin-thoni commented 1 year ago

Same problems here:

Maybe a solution/workaround would be to add raw format support to ESP32-audioI2S? This way, the media player would be able to play raw files, eliminating the need for the speaker component altogether.

I see a few references to CODEC_MP3 in https://github.com/esphome/ESP32-audioI2S/blob/07cb6eb71fbc47d45185270b5c84c762a126bbc3/src/Audio.cpp. Adding a new raw "codec" shouldn't be that hard, since it's exactly what the i2s function expects. I might give it a try soonish.

robin-thoni commented 1 year ago

This was actually trivial to implement: https://github.com/esphome/ESP32-audioI2S/pull/12

You can test it with the following steps:

Copy the i2s_audio component

Clone the patched ESP32-audioI2S library

Configure the new ESP32-audioI2S library

Rebuild

It should now play the .raw files generated by HA, without eating the end of the file. Here's a quick demo: https://owncloud.rthoni.com/s/DfcrJXLZoRLpFpq

mark4code commented 12 months ago

Oh please someone merge that PR!

amrutprabhu commented 12 months ago

Thanks a lot for the PR @synesthesiam

Your changes are merged to the dev branch.

So just using the dev version of HA, will it generate mp3 instead of raw or do I need to set parameters?

synesthesiam commented 12 months ago

It will generate mp3 by default now, except for when ESPHome is streaming the response to the device (WAV in that case).

cociweb commented 12 months ago

Thank's a lot for the PR @synesthesiam. Will it solve that ESPHome usecase, when the tts source is from https url or we still need to wait for the issue's secured solution?

synesthesiam commented 12 months ago

You're welcome @cociweb!

I don't think this will help Internet radio station playing. The fix only covers text to speech.

cociweb commented 12 months ago

I mean tts from https source.

amrutprabhu commented 12 months ago

@synesthesiam , I tested the fix using the de version of HA and it is working fine with an esphome media player.

thank you once again for the fix.