migrate mww to esphome_audio, bring back volume control and media_player

cowboyrushforth commented 1 month ago

NOTE - this is using some undocumented features of esphome_audio thanks to it's author pointing them out. (gain_log2: 3)

So, perhaps we make this another yaml as the previous one with "speaker" component is fairly stable, for me anyways. Open to suggestions.

Ultimately, this probably needs a lot of testing. I did have an issue with it once I added volume controls back in that seemed to vanish on its own, but its nice to have media_player back, and you can even detect a wake word while a song is playing. However, I can't seem to find the VAD timeout setting for microwakeword, so it appears to take awhile to detect the wake word, if there is a song playing, but does work. -- More nuance on this one too, once the TTS response comes back, it will stop the media player to play it, which makes sense, but we may want to configure it to resume from where it was in the previous file, if it was playing, perhaps.

cowboyrushforth commented 1 month ago

Thanks for the feedback! Fixed most of it. Left you a comment above too.

cowboyrushforth commented 1 month ago

@tetele implemented the stopping of media player on wake word detection.

Separately I have started another branch here that starts to implement audio ducking - IE - the music would play in the background but softer while the device/system processes the full request, with the end goal being that it would resume playing where it left off after the request is fully done. As I got into this I realized it was going to be a little complex, and I might need to make some changes to media_player in esphome to support it, so I will work on this separately if I get time.

tetele commented 1 month ago

I doubt it will be easy to do that on the ESPHome media_player or that it will be implemented any time soon. It would be lovely, but certainly not trivial (from the ESPHome point of view; from this config's point of view, it just needs upstream support).

That said, I wanted to thank you for your contributions! You, sir, have been amazing! With my current limited availability, I don't even know when I would have been able to test and implement the ADF media_player in the MWW config.

I'd like to get at least some testing feedback on this implementation and then merge it to the main config. I've found an issue with the volume and reported it upstream. There's a workaround, but I'd like for that change to be merged before proceeding, apart from some testing.

If all goes well, I'd also like to port this implementation to the non-MWW config as a longer term plan. None of that would have been possible without your help ❤️

cowboyrushforth commented 1 month ago

Also pushed a method to restore the volume between reboots as it was driving me crazy. Not sure if there is a better way to do it.

Ya, I noticed the speaker sounds bad over 50% of volume as well. But strangely only for music, not TTS. (for me, at least). So I do wonder whether there is something else going on.. Does TTS sound poor above 50% for you, or just music?

Lastly, re: audio ducking, I agree, there needs to be changes in media_player, I plan to look at this when I get time, upstream in esphome, and perhaps in voice_assistant, too. Might be a fun challenge, but my goal is to fully replace siri in my house, one day! :p

tetele commented 1 month ago

Does TTS sound poor above 50% for you, or just music?

Both TTS and media sound just as loud above 50%. Music does sound a bit poorer, but maybe TTS does too, it's just that it's harder to trace

tetele commented 1 month ago

Can you please hit me up on Discord @cowboyrushforth ? I'm @tetele on both the HA and ESPHome servers

cowboyrushforth commented 1 month ago

Rolled that last commit back. Sent you a DM on discord, I am on the esphome discord under spot or spotman_, cheers

tbrasser commented 1 month ago

I've been following this and porting to non-mww. Hit me up (@discord) if I can help out on that front. 👍

s00500 commented 1 month ago

That took some legitimate long compile time haha, but super awesome work!!!

Could successfully test microwakeword with audio control and media player working!!! works without disabeling listening manually, sound quality still seems to be missing higher frequencies... (does not bother me, just noticed it)

Also played a bit with adding a beep, this works really really well:

  on_wake_word_detected:
    - if:
        condition: media_player.is_playing
        then:
          - media_player.pause
    - media_player.play_media:
        id: onju_out
        media_url: "http://192.168.0.111:8000/beep.mp3"
    - delay: 300ms
    #- wait_until:
    #    not:
    #      media_player.is_playing: onju_out
    - voice_assistant.start:
        wake_word: !lambda return wake_word;

Beep is a sub 500ms little clip and is currently hosted on some rpi in my network, but I am wondering if there would be a more elegant version of this.... any ideas ?

Greetings, and thanks for the amazing work, I will flash this out to all 5 onjus I am running as soon as I get to it =D

tetele commented 1 month ago

I'm having some issues with either MWW or the I2S audio - wake word recognition simply stops after a long enough time. It works after the MWW detection is turned off and on again.

So far I haven't been able to trace the root cause of the issue, but I will keep trying. Does anybody else get that?

s00500 commented 1 month ago

Hm... had it running since my comment yesterday, still works fine here (running esphome 2023.4 btw). I did have a issue like that on my RaspiAudio MuseProto Boards that I could never get to fix fully...

jherby2k commented 1 month ago

Re: sound quality. From https://github.com/gnumpi/esphome_audio?tab=readme-ov-file#i2s-settings:

sample_rate (Optional, positive integer): I2S sample rate. Defaults to 16000.

Probably just need to set this to 48000 for music playback. blah blah blah...

edit: Never mind. I see you're way ahead of me, and the microphone fails to work with the resampler. So 16000 hz it is for now.

cowboyrushforth commented 1 month ago

I'm having some issues with either MWW or the I2S audio - wake word recognition simply stops after a long enough time. It works after the MWW detection is turned off and on again.

So far I haven't been able to trace the root cause of the issue, but I will keep trying. Does anybody else get that?

So I have 3 onju devices setup. This happens in one location despite which device is in that location. In this location it is closer to another device. I am curious if this could have to do with perhaps 2 onju devices listen to the wake word, and something gets weird and neither actually respond.

To expand though, one device experiencing this frequently, took to another location (downstairs, far away from any other de ice) and its never once not heard the wake word.

So I started to look thru code as I read somewhere that HA supports only a single device actually responding, but I can't actually find any support for this, so the whole thing remains a mystery.

tetele commented 1 month ago

So I started to look thru code as I read somewhere that HA supports only a single device actually responding

That is true, i haven't thought to check this, but it would not explain 1. the fact that toggling the wake word makes it work immediately and 2. the fact that until the toggle it doesn't trigger at all, regardless how well isolated it is from other devices (i.e. closed doors etc.).

s00500 commented 1 month ago

Seems like your issue is not related to that. I have however had similar issues as @cowboyrushforth

Often when I trigger 2 devices at once one of them does not respond, which is useful sometimes... but mostly feels buggy.

Even worse I had cases where the second device would repeat the question (!!!) and answer of the first again. this is especially strange as I do not understand how the question gets into tts the pipeline again at all..

cowboyrushforth commented 1 month ago

So I started to look thru code as I read somewhere that HA supports only a single device actually responding

That is true, i haven't thought to check this, but it would not explain 1. the fact that toggling the wake word makes it work immediately and 2. the fact that until the toggle it doesn't trigger at all, regardless how well isolated it is from other devices (i.e. closed doors etc.).

Perhaps, but perhaps not. Would love to find the source code for whatever functionality HA has for only-one-device-detecting-wake-word at one time.

regarding 1 - perhaps toggling the wake word resets something critical? wouldnt say that that rules this out at all. regarding 2 - for me, its once it gets into this "bad state". if the device never gets into a bad state, which for me is if i keep the door closed to this room, it always toggles, so long as when I enter this room, and shut the door behind me.

cowboyrushforth commented 1 month ago

Got some info from discord, so for MWW, the code HA uses to stop concurrent requests is here: https://github.com/home-assistant/core/blob/dev/homeassistant%2Fcomponents%2Fassist_pipeline%2Fpipeline.py#L1364-L1381

After expanding my loglevel for assist_pipeline, I do see this is triggered for me on the trouble unit. Will continue debugging when I get more time.

My gut is that there is maybe multiple ways that the system can get into a weird state.

cowboyrushforth commented 1 month ago

Small update here..

have not been able to replicate for 2 days.. nothing has changed in my house.
the only things I have changed code wise are

a. upgraded to ESPHome 2024.4.1 b. am running this yaml change, with the intention of making it obvious if there is a wifi issue:

  on_client_connected:
    - if:
        condition:
          and:
            - switch.is_on: use_wake_word
            - binary_sensor.is_off: mute_switch
        then:
          - script.execute: reset_led
          - micro_wake_word.start:
  on_client_disconnected:
    - if:
        condition:
          and:
            - switch.is_on: use_wake_word
            - binary_sensor.is_off: mute_switch
        then:
          - light.turn_on:
              id: top_led
              blue: 0%
              red: 100%
              green: 0%
              effect: none
          - voice_assistant.stop:
          - micro_wake_word.stop:

The intention of this yaml is just to make the lights red if the connection to HA fails. When I restart HA I do see the lights go to red then back to normal.

rccoleman commented 1 month ago

I'm seeing a decryption failure with this PR when trying to play valid media on my Onju voice devices (tested with TTS, but also any responses). I recall that responses worked at some point, but have updated to the ESPHome betas since then.

[10:08:15][D][media_player:059]: 'Onju Voice Satellite Dining Room' - Setting
[10:08:15][D][media_player:066]:   Media URL: https://xxx/api/tts_proxy/b8b13f9279e4f60bbc005a4c6d66bf220dd2df68_en-us_5c97d21c48_cloud.mp3
[10:08:15][D][adf_media_player:030]: Got control call in state 1
[10:08:15][D][esp_adf_pipeline:050]: Starting request, current state STOPPED
[10:08:15][D][esp_adf_pipeline:302]: State changed from STOPPED to PREPARING
[10:08:15][I][adf_media_player:135]: got new pipeline state: 1
[10:08:15][D][adf_i2s_out:127]: Set final i2s settings: 16000
[10:08:15][D][esp_audio_processors:079]: New settings: SRC: rate: 16000, ch: 2 DST: rate: 16000, ch: 2 
[10:08:16][D][esp-idf:000]: I (7767236) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1

[10:08:16][D][esp-idf:000]: I (7767239) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[10:08:16][D][esp_aud:000]: 
ERROR Fatal error: protocol.data_received() call failed.
protocol: <aioesphomeapi._frame_helper.noise.APINoiseFrameHelper object at 0x7f06254bf060>
transport: <_SelectorSocketTransport fd=6 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "/usr/lib/python3.11/asyncio/selector_events.py", line 1009, in _read_ready__data_received
    self._protocol.data_received(data)
  File "aioesphomeapi/_frame_helper/noise.py", line 136, in aioesphomeapi._frame_helper.noise.APINoiseFrameHelper.data_received
  File "aioesphomeapi/_frame_helper/noise.py", line 163, in aioesphomeapi._frame_helper.noise.APINoiseFrameHelper.data_received
  File "aioesphomeapi/_frame_helper/noise.py", line 319, in aioesphomeapi._frame_helper.noise.APINoiseFrameHelper._handle_frame
  File "/usr/local/lib/python3.11/dist-packages/noise/state.py", line 74, in decrypt_with_ad
    plaintext = self.cipher.decrypt(self.k, self.n, ad, ciphertext)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/noise/backends/default/ciphers.py", line 13, in decrypt
    return self.cipher.decrypt(nonce=self.format_nonce(n), data=ciphertext, associated_data=ad)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/chacha20poly1305_reuseable/__init__.py", line 127, in chacha20poly1305_reuseable.ChaCha20Poly1305Reusable.decrypt
  File "src/chacha20poly1305_reuseable/__init__.py", line 147, in chacha20poly1305_reuseable.ChaCha20Poly1305Reusable.decrypt
  File "src/chacha20poly1305_reuseable/__init__.py", line 263, in chacha20poly1305_reuseable._decrypt_with_fixed_nonce_len
  File "src/chacha20poly1305_reuseable/__init__.py", line 273, in chacha20poly1305_reuseable._decrypt_data
cryptography.exceptions.InvalidTag
WARNING onju-voice-dr @ 192.168.1.131: Connection error occurred: onju-voice-dr @ 192.168.1.131: Invalid encryption key: received_name=onju-voice-dr
INFO Processing unexpected disconnect from ESPHome API for onju-voice-dr @ 192.168.1.131
WARNING Disconnected from API
INFO Successfully connected to onju-voice-dr @ 192.168.1.131 in 0.010s
INFO Successful handshake with onju-voice-dr @ 192.168.1.131 in 0.109s

If I stick the audio URLin a browser, it plays the expected audio. I don't see anything in the HA logs when this happens.

tetele commented 1 month ago

@rccoleman can you double check that the encryption key in the config is the same as that stored in HA config entry?

rccoleman commented 1 month ago

@rccoleman can you double check that the encryption key in the config is the same as that stored in HA config entry?

The noise_psk key in the config_entry is the same as the one specified in the ESPHome config:

api:
  encryption:
    key: "same_as_noise_psk_key"

If the API key was wrong, I would have expected much bigger issues like HA not being able to talk to the device at all. This is isolated to media playback, as far as I can tell. I believe that I even removed devices and re-added them to ensure they sync properly, but it didn't help.

Edit: Weird, it's just that one Onju voice device. The other four that I have running the same ESPHome config work fine. I changed the key (it was actually a dup of a key for another device), removed it from HA, rebuilt/reuploaded the firmware, re-added to HA with the new key, and now it's working. 🤷

VivantSenior commented 1 month ago

Is there something that prevents to merge this to the main branch?

s00500 commented 1 month ago

Some minor comments here on testing this: It works nicely most of the time but sometimes I needed to do a fully clean build and re-upload to get stuff working, not sure what is happening there..

Meanwhile I added a http mp3 link served by home assistant to the on_wakeword_detected. This works fine, but it is too slow when using HTTPS... so I downgraded to HTTP (hosting it somewhere else than my HA instance now using a simple http server)

micro_wake_word:
  model: okay_nabu
  probability_cutoff: 0.6
  on_wake_word_detected:
    - if:
        condition: media_player.is_playing
        then:
          - media_player.pause
    - media_player.play_media:
        id: onju_out
        media_url: "http://192.168.0.244:8000/beep.mp3"
    - delay: 300ms # tuned to length, works a bit better than waiting for state
    #- wait_until:
    #    not:
    #      media_player.is_playing: onju_out
    - voice_assistant.start:
        wake_word: !lambda return wake_word;

tetele commented 1 month ago

I still have a lot of issues with mine, which leave very few traces in the debug log.

However, since we have the option of leaving the MWW version marked as experimental, I'm going to merge it into main.

tetele / onju-voice-satellite

migrate mww to esphome_audio, bring back volume control and media_player #39