hugobloem / esp-ha-speech

Local speech recognition on an ESP32 for Home Assistant
142 stars 10 forks source link

ESPHome and HomeAssistants integrated VoiceControl (Assist) #16

Open ChristophCaina opened 1 year ago

ChristophCaina commented 1 year ago

Hi, I haven't found any discussion or similar where I could post this... therefore, this comes in as an "issue"...

So first of all, thanks for sharing the information here :) The first time, when I saw that device, I immediately thought that this could be a great addition / replacement for my current solution.

Anyway - since the HA Team puts a lot of effort in their own voice control (year of the voice) - I wonder, if it would be possible to integrate this device into their current implementation... I know, right now, they don't really have any "hot word" detection or something like that... but I am pretty sure that this will improve by a lot over the year

So are there any plans to also go into this direction?

hugobloem commented 1 year ago

Hi,

Thank you for your interest. While I have been busy with my normal work, I am still working on this project, and I am trying to work on this as much as I can.

I am following closely what the HA is doing with their Voice project. I am currently exploring on how to sidestep Rhasspy altogether and directly integrate with the HA assist pipeline. This would remove the requirement for both Rhasspy and MQTT, making it easier to get started as well. I am close to getting a proof-of-concept, but I am not there just yet.

If I get the assist pipeline working I am wondering whether it is worthy to continue development for Rhasspy. I assume most people would opt for HA assist pipelines as it is much easier to set up. What is more, I am not sure what the added benefit of Rhasspy would be if HA has different STT and TTS options (which I am sure will be coming).

ESPHome's implementation is one to watch as well. At the moment, their implementation simply streams the microphone input to HA for it to be processed there. The advantage of the C implementation here is that Espressif provides an Audio Front End (AFE) neural network which cleans up the audio and can combine the input from multiple microphones for clearer sound. What is more, ESPHome does not support wake word detection (yet) which this project does.

ChristophCaina commented 1 year ago

Hi, thanks for your reply :)

I just bought one of those devices to start some testing and "playing" on my end... but since I am not a developer, I am unsure HOW I could assist on this project... ok, I could probably do some testing... that would be nice ;)

The Device should arrive in the next two weeks, if I remember the expected delivery date correct... and sure - it is not urgent, as there are still a lot of things that don't work yet with HA's voice control as I would expect or wish...

If ESPHome is only streaming the Microphone input to HA, that's most probably what we need... OK, having a wakeword would be cool... but maybe, that could be implemented in a custom code ... I think, there was something on how to include custom libraries with ESPHome? (Don't remember, and yet, I haven't done much with ESPHome)...