Closed vekexasia closed 8 months ago
Hello, great and well thought project.
I see you used the esp32-s3 which also has some AI capabilities. I was unable to dig deeper but i know that S3 can be used to run very small AI models (possibly for speech recognition).
My previous research showed that they only supported mandarin and english but i was unable to look for a way to build custom models.
Maybe there is room to have the s3 do voice recognition instead of having a srv running and getting the audio stream? This is especially helpful when planning to have several of these devices around the house.
I wouldn't put that AI at the edge so much. It's going to rely on the entities in your HA being indexed into a management DB accessible to OpenAI's Chat GPT. There are ways of doing this online utilizing MindsDB, Pinecone, and ChatGPT and some you might be able to utilize a Coral TPU or Jetson Nano to help build custom models. But the main issue I've found at the edge is having enough power to make and hold a solid socket with HA and then allow its own internal pipelines to utilize what is needed. Rarely that I know of would having the AI, which I don't think that chip has, be processed locally to HA or at a server level. These ESP32s just don't have the power or primarily memory to handle much besides the wake work and initial intents to start the HA Assist Pipeline.
If anyone knows better than I, please chip in. But this is just my testing with a Muse Luxe ESPHomed as a voice assistant and it seems in that process it looses its ability to act as a media player. Or at least I have been unsuccessful in having both RaspiAudio's squeezeplay ability alongside a Voice Assistant build.
Hello, great and well thought project.
I see you used the esp32-s3 which also has some AI capabilities. I was unable to dig deeper but i know that S3 can be used to run very small AI models (possibly for speech recognition).
My previous research showed that they only supported mandarin and english but i was unable to look for a way to build custom models.
Maybe there is room to have the s3 do voice recognition instead of having a srv running and getting the audio stream? This is especially helpful when planning to have several of these devices around the house.