Closed mineshaftgap closed 1 year ago
Generally when I do speed comparisons between Alexa and Willow I set my Willow wake word to Alexa so variations in speech and in this case start of the timer (finger) don't impact results. There's an additional issue in that we've observed the command time with Home Assistant can vary quite a bit between executions (HA has to receive the command from Willow/Alexa and issue to device). What version of HA are you running? If it's 2023.5 we use websockets and they are substantially faster.
In our testing we consistently see the Willow device confirm before Alexa. We've also dug in the HA logs to confirm the command from Willow gets there first.
Depending on a variety of factors (your total commands, command lengths, etc) multinet is typically slower. We haven't spent as much time optimizing it but we'll get into it at some point.
I am running HA 2023.5.3.
I can see if the wake work Alexa works, but I was just about to open another ticket where the "Hi Lexin" wake word does not work. I was hoping to have a different wake word so I don't have multiple devices trying to do the same task. I ran my test at least 10 times and the results were relatively the same so that does not appear to be related to the variable HA executions.
We suggest people use "Hi ESP" especially if you already have Alexa devices (like I do) for obvious reasons. I only suggested using Alexa with Willow for your Alexa v Willow bake-off :).
Oh - almost forgot. We allow the user to configure the VAD timeout now. I personally think our default of 300ms is way too conservative. I use 100ms myself.
The "Hi Lexin" wake word comes from the English pronunciation of the Chinese name of Espressif, from what I understand. The pronunciation is so strange it's almost useless and I myself am only able to trigger it maybe 20% of the time. We may end up just disabling it.
Ya, was only trying "HI Lexin" as it was one syllable shorter, will stick to "Hi ESP" and hope for a "Willow" wake word in the future ;).
I will look to change the VAD to try and squeeze out a bit more performance.
Going to build with "Alexa"/multinet now to see if there is still same delay.
Until multinet is optimized it sounds like for right now if people want the quickest response they should use wakenet, if they want to keep things local use multinet.
One of the reasons I would love multinet to be faster is I have been trying to make sure all my home automations work when power and even internet go out. So I will have to see if I can live with the multinet lag.
One clarification - you're always using Wakenet. Wakenet is for wake word, multinet is for local speech commands, and server/WIS mode is using a Willow Inference Server.
Speaking of which, we will be releasing WIS tomorrow. Our goal is for this to be all local and as right now, if you are using the demo Tovera hosted WIS, you're just sending your speech to us instead of Amazon. That's not the goal for this project.
For performance with accuracy WIS strongly prefers CUDA. If you have CUDA hardware on your WIS instance a locally hosted WIS is the most private, fastest, and most reliable of all options. WIS has been highly optimized by us for acceleration on Nvidia hardware going back to Pascal. I'll be testing, demoing, and documenting CUDA hardware configurations using $100 used GTX hardware later this week. WIS runs on CPU but the performance isn't any better than other CPU Whisper implementations which is to say it won't be competing with Alexa for speed and accuracy anytime soon.
You might have given me reason to dust of my hardly used/old gaming computer...would just want to see how low I could make the energy consumption.
Here are some teaser benchmarks for WIS:
Device | Model | Beam Size | Speech Duration (ms) | Inference Time (ms) | Realtime Multiple |
---|---|---|---|---|---|
RTX 4090 | large-v2 | 5 | 3840 | 140 | 27x |
H100 | large-v2 | 5 | 3840 | 294 | 12x |
H100 | large-v2 | 5 | 10688 | 519 | 20x |
H100 | large-v2 | 5 | 29248 | 1223 | 23x |
GTX 1060 | large-v2 | 5 | 3840 | 1114 | 3x |
Tesla P4 | large-v2 | 5 | 3840 | 1099 | 3x |
RTX 4090 | medium | 1 | 3840 | 84 | 45x |
GTX 1060 | medium | 1 | 3840 | 588 | 6x |
Tesla P4 | medium | 1 | 3840 | 586 | 6x |
RTX 4090 | medium | 1 | 29248 | 377 | 77x |
GTX 1060 | medium | 1 | 29248 | 1612 | 18x |
Tesla P4 | medium | 1 | 29248 | 1730 | 16x |
RTX 4090 | base | 1 | 180000 | 277 | 648x (not a typo) |
Out of curiosity, which Nvidia GPU do you have? In terms of power consumption my GTX 1060 (6GB) idles at about 8w with max at 150w. As you can see from the table above a typical ~1-2s speech command results in a (roughly) 300ms power spike, with a good chunk of that not actually being on the GPU. I'm going to have my local machine connected to a Belkin Wemo Insight monitoring plug to observe total power usage across 1000 speech commands (or something like that). I'm pretty confident the power usage will be much lower than people would expect.
Additionally, I'll be looking at the power usage of Echo devices and HA running on a Raspberry Pi. If you're leaving a WIS instance up you should also run HA on it because the performance of just about any hardware of that class will decimate a Raspberry Pi and won't significantly increase power usage of the WIS+HA machine. Better hardware for HA = faster execution of commands from Willow (another variable in your speed test). Plus then you get an unobtanium Raspberry Pi back for any other projects :).
I suspect that, in the end, the total power consumption of this approach will be more than acceptable for most users. Especially those (many) users who are already pretty seriously into homelab. For those who don't have Nvidia hardware there's also the Tesla P4 which can be purchased used for roughly $100 and uses a max of 60w (slot powered).
I would definitely like to see Willow users compete for who can come up with throttling, power limit, etc configurations that don't drastically impact performance but do reduce power consumption!
I can't remember exactly what the card is but I think it was a GTX 1080? Maybe a 1040?
Oh yes, I was already planning on making it dual boot headless Linux (if I didn't previously) and then putting Proxmox with a HA image. I would be putting it on a Zigbee power outlet that should be able to tell me energy used.
To my knowledge there is no such thing as a GTX 1040 so between those two options I'd say it's a 1080, which should work quite well with WIS.
We've tested WIS in WSL for users who want to use Windows gaming machines but given the option I wouldn't go that route.
Heh, I will look at it later and figure out what it is!
...you can tell I am not much of a gamer...
So it is a NVIDIA GeForce GTX 1070.
Perfect! I'm actually going to be doing benchmarks with my recently ordered GTX 1070 later this week. For new users it's probably the best bang for the buck right now.
Have Proxmox installed and ready to throw WIS at it. Will I be needing to do a VM or will you be releasing a container?
It is containerized but setup is a little more involved than docker run
- much like Willow today it is early.
WIS released - discuss on HN
While Wakenet is only trivially slower than Alexa, Multinet is perceptible slower.
I timed a very unscientific video of all three turning on my living room lights and here is what I have found:
I have confirmed that the command
TURN ON LIVING ROOM LIGHTS
is in speech_commands/commands_en.txt.What else do I need to do in order to improve my multinet speeds?
EDIT: removed movies from automatically showing due to bringing browser to a crawl, hoping links above will still work.