justLV / onju-voice

A hackable AI home assistant platform
MIT License
1.24k stars 69 forks source link

[REQUEST] New PCB with both an ESP32-S3 and an xCORE chip from XMOS for advanced audio processing as a ESPHome compatible voice-kit? #59

Open Hedda opened 2 months ago

Hedda commented 2 months ago

Any chance someone has the skills, time + interest to re-design a new PCB for the Google Nest Mini and Google Home Mini speaker series that also includes an xCORE xCORE DSP chip from XMOS?

Not sure if all of you have followed news about Nabu Casa's upcmoing voice-assistant hardware project, but just heard Paulus Schoutsen reveal on their Home Assistant's ESPHome Summer Release Party on YouTube that Nabu Casa's ESPHome developers are working on a new open-source hardware platform for their voice-assistant products that will based on ESP32-S3 in combination with a very powerful XMOS xCORE DSP chip for advanced audio processing).

The XMOS xCORE DSP (Digital Signal Processor) chip acts like a sound-card co-processor adding in-line off-loading of audio noise removal (voice clean-up) from the microphone(s), like Interference Cancellation (IC)​, Acoustic Echo Cancellation (AEC), Noise Suppression (NS), and Automatic Gain Control, etc. and/and other audio post-processing algorithms to improve the solution's voice recognition capabilities). (Depending on which XMOS chip they use their XCORE-VOICE framework could technically also allow also for up to 16 PDM microphones to be connected to a single xCORE device with a different PCB design).

UPDATE: The news about that upcoming voice-kit hardware platform by Nabu Casa is also Home Assistant roadmap, and that ESPHome-based voice-kit hardware has since also mentioned by Mike and Kevin during the Voice Chapter 7 livestream:

https://www.home-assistant.io/blog/2024/06/12/roadmap-2024h1/#current-priority-2-make-assist-easier-to-start-with

"Current priority 2: Make Assist easier to start with" ... "we’re exploring building our voice satellite hardware to create a more plug-and-play experience."

https://www.home-assistant.io/blog/2024/06/26/voice-chapter-7/

I'm paraphrasing but one of the representatives that is working on that "voice-kit"project more or less wrote there that while ESPHome and Home Assistant voice developers from Nabu Casa are now focusing to work on voice assistant features they are still figuring all this around making use of external audio processors and while they are currently only testing the XMOS xCORE chip as a candidate for an ESPHome-based voice-kit reference hardware design for official Home Assistant Voice Assistant development kit they also plan to work on "audio processor" component for ESPHome with hardware-independent architecture that will not be reliant on specific hardware configurations or dependent specifically on the XMOS xCORE DSP chip but instead allow others to add support for additional DSPs as audio processors (i.e. sound co-processors) in the future, (plus the fact that they will make it so that all the I2S settings and pins are still configurable in YAML, meaning that it should at least be possible add support for DSP types to the "audio processor" component if they work similar to XMOS xCORE DSP chips, as well as different board designs that uses other I2S settings and pins). That representative also wrote; "we will add all the code to the base ESPHome project once things are stable and working well".m and noted that ESPHome and Home Assistant / Nabu Casa developers are right now moving very fast and breaking things as they go so working on code for the new voice-kit related components for ESPHome in a separate repository on GitHub here:

By the way, I think similar chips from XMOS like their XU-316 AI (xCORE XU316) is by the way used in Amazon Alexa Voice Service (AVS) Development Kit(s), and is used in some Amazon Echo products as well as other popular :

As far as I can tell the complete source code for XMOS's xcode-voice firmware is available on Github under sln_voice repo:

https://github.com/xmos/sln_voice

More information about that in their user-guide for their XK-VOICE-L71 Evaluatuion board:

https://www.xmos.com/download/XVF3610-User-Guide(v5_7_3).pdf

Since Nabu Casa's designs it said to be open-source hardware and XMOS integration will probably be added to the ESPHome's Media Player Components (and Microphone Components) I for one am hoping that it could and will be extended to different types of speakerless solutions with appliance solutions with AUX-output/audio-output and AUX-input/audio-input port and not only for voice-assistant.

Personally I would also love to see inexpensive speakerless network-streamer player/receiver hardware without microphones but only with with AUX-out that can connect to any of your existing amplifiers or speakers with built-in amplifiers in order to replace products like Chromecast Audio and Amazon Echo Input / Echo Link Amp, (e.i. devices with no on-board speakers that must be connected to external speakers for audio output (AUX-output).

That is, I am sure that not everyone only wants "smart speakers" with voice-assistant and that instead many would be also happy to have network streamers/players without microphone which only purpose is to receive and output highest quality audio possible from Music Assistant to your "dumb" speakers.

I for one still have loads of Chromecast Audio audio-only receivers connected to various models and brands of different speaker/reciever systems in each room used to achieve multi-room music playback on a budget (because could not afford Sonos speakers in all rooms).

So even if though Nabu Casa's hardware will initially primarly be designed for "Home Assistant Satellite" (also known as "Wyoming Satellite") for voice-assistant appliances, such open-source hardware it just like the ESPHome firmware does have a lot of potential for different use cases.

Also on my wishlist if a network streamer receiver hardware with AUX-input and ADC to get music from analog audio source. As an easy way to achieve a remote AUX input into Music Assistant from an external analog audio source like a vinyl record player (LP turntable) or cassette player.

What I want to achieve is a solution that is easy to install/maintain and use that allow my wife to stream music from a vinyl record player (LP turntable) to any speaker or group of speakers in our home. The vinyl record player (turntable) setup she has a pre-amp with phono (RCA) output ports for analog audio in stereo.

I would therefore prefer if we could buy some kind of networked (Wi-Fi) enabled appliance like a music streamer with stereo AUX input port that it will use for on-the-fly perform analog-to-digital conversion (ADC) + encoding for streaming to a Music Provider inside Music Assistant.

I do however think that both such a solution does need its own non-propriatory audio-only streaming protocol for high-quality music streams?

Hedda commented 1 month ago

@justLV and others who may be interested; FYI, Seeed Studio’s new Voice Assistant Kit hardware where they are combining an ESP32-S3 with a XMOS XU-316 MCU chip for advanced audio processing into a voice-kit hardware board solution for ESPHome:

image

Features (which are mostly attributed to the XMOS XU-316 AI sound and audio chip)

CNX Software has a nice summary blog article with more details on the technical hardware specification:

Close-up picture of only the ReSpeaker Lite board with XMOS XU316 MCU chip and XIAO ESP32S3 for ESPHome:

image

In addiion to that, in good timing the ESPHome developers have now published a new experimental "voice-kit" GitHub repository where ESPHome developers are developing new or improved components for I2S audio support and a new native media player with support for FLAC, MP3, etc. for the upcoming "official" Home Assistant voice-kit hardware made Nabu Casa and based on a XMOS xCORE chip + an ESP32-S3:

ESPHome developers have so far added many new features and functions or improvements/enhancements to ESPHome, such as:

They also have many TODO inline coments in the code there if anyone are interested in helping them:

https://github.com/search?q=repo%3Aesphome%2Fvoice-kit%20todo&type=code

Note! Be aware that there are many comments there to that most of the new stuff are not yet stable.

PS: Even if XMOS is proprietary hardware they are very popular and have open-source compatible libraries:

As far as I can tell the complete source code for XMOS’s xcode-voice firmware is insln_voice repository on GitHub:

More information about that is in their user-guide for their XK-VOICE-L71 Evaluation Board which is based on the same chip:

Hedda commented 1 month ago

@justLV and others who may be interested; FYI, Seeed Studio’s new Voice Assistant Kit hardware where they are combining an ESP32-S3 with a XMOS XU-316 MCU chip for advanced audio processing into a voice-kit hardware board solution for ESPHome:

@justLV As far as I can tell picture show a 60-pin package, or at least I counted 15 pins on each visable side. I'm looking at the one on their wiki which looks sharper than the others:

image

As far as I can tell the text printed on the XMOS chip in the picture reads:

XMOS
V16A0
G12342P2
TF1148.00

And if do a search for "V16A0 AND XMOS" I only find the datasheet for "XU316-1024-QF60A" SKU in a 60pin package, and from a cost-effectiveness perspective I guess it makes more sense to use "XU316-1024-QF60A-C24" (offering 2400 MIPS) over the faster "XU316-1024-QF60A-C32" (offering 3200 MIPS) even though from developers and end-user perspective we probably almost always want the faster variant if we had a choice 😄

https://www.xmos.com/file/xu316-1024-qf60a-xcore_ai-datasheet?version=latest

https://www.xmos.com/download/XU316-1024-QF60A-xcore_ai-Datasheet(26).pdf

And I believe that would also make sense from a hardware developer point-of-view to use either XU316-1024-QF60A-C24 (or XU316-1024-QF60A-C32) since XU316-1024-QF60A-C24 is what is used by XMOS's "XK-VOICE-L71 Voice Reference Design Evaluation Kit" so it is very well documented and tested:

https://www.xmos.com/xk-voice-l71

https://www.xmos.com/file/xu316-1024-qf60a-datasheet/?version=latest

https://www.xmos.com/file/xk-voice-l71-hardware-manual/?version=latest

https://www.xmos.com/file/xk-voice-l71-pcb-design-files/?version=latest

Hedda commented 1 month ago

@justLV more new projects are coming as, FYI, @FutureProofHomes has now also announced a similar XMOS and ESP32-based two-board voice satellite hardware development kit for Home Assistant that he is calls ”Satellite1 PCB Dev Kit

https://github.com/FutureProofHomes/Satellite1-Hardware

https://futureproofhomes.net/products/satellite1-pcb-dev-kit

image

image

image

image

Unclear what XMOS xCORE chip he is using but I assume he is also using XU316-1024-QF60A-C24 or XU316-1024-QF60A-C32?

Satellite1 PCB Dev Kit

The Satellite1 PCB Dev Kit contains the two PCBs necessary to build your own completely private voice assistant & multi-sensor with XMOS advanced audio processing & music playback. Add your own speaker and power supplies.

This Dev Kit focuses on controlling your smart home via the Home Assistant platform and their incredible Assist voice control pipeline.

Satellite1 HAT Board:

This board features 4 PDM microphones, 12 NeoPixel LEDs, humidity/temp/lux sensors, 4 buttons (volume up/down, action button & hardware mute), plus the XMOS audio processing chip and a power DAC with for amplified speaker-out connection or 3.5mm headphone connection. All remaining GPIOs are also exposed.

The Satellite1 Hat connects easily to the Sat1 Core Board but can also be paired with a Raspberry Pi or a PC/Mac via USB! Perfect for all your voice assistant and audio projects!

Satellite1 Core Board:

The Satellite1 Core Board contains the ESP32-S3 n16r8, USB-C Power Delivery and 40-pin connection. This board attaches to the companion Sat1 HAT Board.

Looks like he also posted a future roadmap showing that he working on a a nice enclosure (as well as the mentioning of an optional recessed enclosure for in-cealing / in-wall mounting of this smart speaker):

image

image

PS: Noticed that FutureProofHomes had a preview video on YouTube mentioning this project as "HomeX" 4-months ago (but at that time he had based the prototype on the wyoming-satellite platform running on a Raspberry Pi instead of using Nabu Casa's upcoming ESPHome-based voice-kit hardware platform that runs on ESP32-S3 and using an XMOS xCORE chip for audio processing):

https://www.youtube.com/watch?v=dRTLjQHfjSM

alextrical commented 1 month ago

A good base for your reverse engineering will be to take a look over the xmos xk-voice-l71 eval kit, that almost all of these project are using as a refference design.

I would guess that seeed studio is using the XU316-1024-QF60B family of chips, as that chip is native 3.3v IO, as apposed to XU316-1024-QF60A that would need logic level shifters to convert to 1.8v for the ESP32.

Im currently working with the team at FutureProofHomes to build the Satellite1, We will be using the XU316-1024-QF60B. We have yet to decide on using the C24/C32 variant, if economy of scale allows, we will be going for the latter

Hedda commented 1 month ago

Update: @FutureProofHomes wrote that they specifically use the "### XU316-1024-QF60B-C32" SKU of the XMOS xCORE AI chips, which not only is native 3.3v IO but "C32" models offer 3200 MIPS in performance compared to 2400 MIPS of "C24" models:

https://community.home-assistant.io/t/voice-chapter-7-supercharged-wake-words-and-timers/743625/92

https://community.home-assistant.io/t/respeaker-lite-new-seeed-studio-voice-assistant-kit-hardware-combine-esp32-with-xmos-xu316-chip-for-advanced-audio-processing-as-esphome-based-voice-kit-for-ha/756944/13

tbrasser commented 1 month ago

For current-hw onju voice owners, what do we really miss out on? Assuming we can do wakeword etc off-device instead of using mww?

Hedda commented 1 month ago

For current-hw onju voice owners, what do we really miss out on? Assuming we can do wakeword etc off-device instead of using mww?

I updated original post to mention the features you can use with XMOS xCORE chip that the essential benefits to voice pic-up:

The XMOS xCORE DSP (Digital Signal Processor) chip acts like a sound-card co-processor adding in-line off-loading of audio noise removal (voice clean-up) from the microphone(s), like Interference Cancellation (IC)​, Acoustic Echo Cancellation (AEC), Noise Suppression (NS), and Automatic Gain Control, etc. and/and other audio post-processing algorithms to improve the solution's voice recognition capabilities). (Depending on which XMOS chip they use their XCORE-VOICE framework could technically also allow also for up to 16 PDM microphones to be connected to a single xCORE device with a different PCB design).

UPDATE: Note that while ESPHome and Home Assistant voice developers working on voice assistant features are still figuring all this around making use of external DSP audio processors and while they are currently only focusing on the XMOS xCORE chip for an official ESPHome-based voice-kit reference hardware design for Home Assistant they also plan to work on an "audio processor component" with hardware-independent architecture that will not be reliant on specific hardware configurations or dependent specifically on the XMOS xCORE chip but instead allow others to add support for additional DSPs as audio processors (sound co-processors) in the future, (plus the fact that all the I2S settings and pins are still configurable in YAML, so it should at least be relatively straightforward to add support for similar XMOS xCORE DSP chips and board designs). we will add all the code to the base ESPHome project once things are stable and working well. ESPHome and Home Assistant / Nabu Casa developers are right now moving very fast and breaking things as they go so working on code for the new voice-kit related components for ESPHome in a separate repository on GitHub here:

You will are still using microWakeWord running in ESPHome on the ESP32 unless someone manage to run microWakeWord nativly on the XMOS xCore chip, which is not impossible.

You can think of the XMOS xCORE AI as an in-line audio post-processing chip that runs several different custom "noise reduction" AI models algorithms (running custom firmware) and sits in between the microphone and the ESP32 (that runs ESPHome).

That XMOS xCORE will as such automatically clean-up the audio coming from the microphone on-device and thus it makes it easier for microWakeWord to hear the wake-word correctly in a noisy room.

It is important to understand that this audio clean-up does not only work for microWakeWord, it will also clean-up all other voice comnig in via the microphone, so it will also make it easier for the speech-to-text engine that the Assistant's pipeline to understand what is being said.

The XMOS xCORE AI chip is technically also not limited to audio input from the microphone, so it can also be used for audio output to improve music playback, etc. using other custom AI models algorithms adding EQ options, and other features such as DRC (Digital Room Correction), etc. to achieve improved sound fidelity. Many products only XMOS chip just for music playback, like example music network streamers, to get great HiFi quality audio for low cost. See ex. https://www.hifistudio79.nl/launch-topping-d70-pro-octo-8x-cs43198-a-new-era/?lang=en

XK-VOICE-L71 (XMOS Voice Reference Design Evaluation Kit features 3,5mm line out jack for audio output to external speakers.

Hedda commented 4 weeks ago

A good base for your reverse engineering will be to take a look over the xmos xk-voice-l71 eval kit, that almost all of these project are using as a refference design.

FYI, FutureProofHomes have posted a new video on their YouTube channel showing off the current design of their ESP32-based hardware prototype upcoming FutureProofHomes Satellite1 voice control development board which looks to now be using such a XU316-1024-QF60A-C24 based XK-VOICE-L71 (XMOS Voice Reference Design Evaluation Kit connected externally. Check it out:

Hedda commented 3 weeks ago

By the way, also check out ESP32-Korvo V1.1 projects which does not contain a DSP but does feature ES7210 (high performance four channels audio ADC ) + ES8311 (audio code and DAC) chips which be interesting for high quality music playback on new PCB.

https://github.com/espressif/esp-skainet/blob/master/docs/en/hw-reference/esp32/user-guide-esp32-korvo-v1.1.md

Another similar board is the ESP32-LyraT Mini audio development board which features ES8311 audio codec and ES7243 ADC:

https://espressif-docs.readthedocs-hosted.com/projects/esp-adf/en/latest/design-guide/dev-boards/get-started-esp32-lyrat-mini.html

There are also esphome developers working on ES8388 support that is used in the Raspiaudio Muse Luxe which is a low-cost audio codec chip that featurs DAC and ADC:

Hedda commented 1 week ago

FYI, to make all these eventually become useful to the avérage Home Assistant user they not only need to be fully supported in the upstream ESPHome project, they also need to standardize those devices in both ESPHome and the Home Assistant core, and ESHome developers are now working on several new components related to this, including a new entity component as assist_satellite platform for that which will represent a standard VoIP-based voice satellite for Home Assistant Assist voice control. Check out this architecture discussion (which sounds like it has essentially been approved)

And the initial entity component for this new assist_satellite platform has been merged to Home Assistant core now:

Also follow related ongoing patches with many new related features submitted to both ESPHome and the Home Assistant core:

Bigger picture: