Multinet slower than Alexa

mineshaftgap commented 1 year ago

While Wakenet is only trivially slower than Alexa, Multinet is perceptible slower.

I timed a very unscientific video of all three turning on my living room lights and here is what I have found:

Alexa - 1.28 seconds
Wakenet - 1.58 seconds
Multinet - 3.33 seconds

I have confirmed that the command TURN ON LIVING ROOM LIGHTS is in speech_commands/commands_en.txt.

What else do I need to do in order to improve my multinet speeds?

EDIT: removed movies from automatically showing due to bringing browser to a crawl, hoping links above will still work.

kristiankielhofner commented 1 year ago

Generally when I do speed comparisons between Alexa and Willow I set my Willow wake word to Alexa so variations in speech and in this case start of the timer (finger) don't impact results. There's an additional issue in that we've observed the command time with Home Assistant can vary quite a bit between executions (HA has to receive the command from Willow/Alexa and issue to device). What version of HA are you running? If it's 2023.5 we use websockets and they are substantially faster.

In our testing we consistently see the Willow device confirm before Alexa. We've also dug in the HA logs to confirm the command from Willow gets there first.

Depending on a variety of factors (your total commands, command lengths, etc) multinet is typically slower. We haven't spent as much time optimizing it but we'll get into it at some point.

mineshaftgap commented 1 year ago

I am running HA 2023.5.3.

I can see if the wake work Alexa works, but I was just about to open another ticket where the "Hi Lexin" wake word does not work. I was hoping to have a different wake word so I don't have multiple devices trying to do the same task. I ran my test at least 10 times and the results were relatively the same so that does not appear to be related to the variable HA executions.

kristiankielhofner commented 1 year ago

We suggest people use "Hi ESP" especially if you already have Alexa devices (like I do) for obvious reasons. I only suggested using Alexa with Willow for your Alexa v Willow bake-off :).

Oh - almost forgot. We allow the user to configure the VAD timeout now. I personally think our default of 300ms is way too conservative. I use 100ms myself.

The "Hi Lexin" wake word comes from the English pronunciation of the Chinese name of Espressif, from what I understand. The pronunciation is so strange it's almost useless and I myself am only able to trigger it maybe 20% of the time. We may end up just disabling it.

mineshaftgap commented 1 year ago

Ya, was only trying "HI Lexin" as it was one syllable shorter, will stick to "Hi ESP" and hope for a "Willow" wake word in the future ;).

I will look to change the VAD to try and squeeze out a bit more performance.

Going to build with "Alexa"/multinet now to see if there is still same delay.

mineshaftgap commented 1 year ago

Until multinet is optimized it sounds like for right now if people want the quickest response they should use wakenet, if they want to keep things local use multinet.

mineshaftgap commented 1 year ago

One of the reasons I would love multinet to be faster is I have been trying to make sure all my home automations work when power and even internet go out. So I will have to see if I can live with the multinet lag.

kristiankielhofner commented 1 year ago

One clarification - you're always using Wakenet. Wakenet is for wake word, multinet is for local speech commands, and server/WIS mode is using a Willow Inference Server.

Speaking of which, we will be releasing WIS tomorrow. Our goal is for this to be all local and as right now, if you are using the demo Tovera hosted WIS, you're just sending your speech to us instead of Amazon. That's not the goal for this project.

For performance with accuracy WIS strongly prefers CUDA. If you have CUDA hardware on your WIS instance a locally hosted WIS is the most private, fastest, and most reliable of all options. WIS has been highly optimized by us for acceleration on Nvidia hardware going back to Pascal. I'll be testing, demoing, and documenting CUDA hardware configurations using $100 used GTX hardware later this week. WIS runs on CPU but the performance isn't any better than other CPU Whisper implementations which is to say it won't be competing with Alexa for speed and accuracy anytime soon.

mineshaftgap commented 1 year ago

You might have given me reason to dust of my hardly used/old gaming computer...would just want to see how low I could make the energy consumption.

kristiankielhofner commented 1 year ago

Here are some teaser benchmarks for WIS:

Device	Model	Beam Size	Speech Duration (ms)	Inference Time (ms)	Realtime Multiple
RTX 4090	large-v2	5	3840	140	27x
H100	large-v2	5	3840	294	12x
H100	large-v2	5	10688	519	20x
H100	large-v2	5	29248	1223	23x
GTX 1060	large-v2	5	3840	1114	3x
Tesla P4	large-v2	5	3840	1099	3x
RTX 4090	medium	1	3840	84	45x
GTX 1060	medium	1	3840	588	6x
Tesla P4	medium	1	3840	586	6x
RTX 4090	medium	1	29248	377	77x
GTX 1060	medium	1	29248	1612	18x
Tesla P4	medium	1	29248	1730	16x
RTX 4090	base	1	180000	277	648x (not a typo)

Out of curiosity, which Nvidia GPU do you have? In terms of power consumption my GTX 1060 (6GB) idles at about 8w with max at 150w. As you can see from the table above a typical ~1-2s speech command results in a (roughly) 300ms power spike, with a good chunk of that not actually being on the GPU. I'm going to have my local machine connected to a Belkin Wemo Insight monitoring plug to observe total power usage across 1000 speech commands (or something like that). I'm pretty confident the power usage will be much lower than people would expect.

Additionally, I'll be looking at the power usage of Echo devices and HA running on a Raspberry Pi. If you're leaving a WIS instance up you should also run HA on it because the performance of just about any hardware of that class will decimate a Raspberry Pi and won't significantly increase power usage of the WIS+HA machine. Better hardware for HA = faster execution of commands from Willow (another variable in your speed test). Plus then you get an unobtanium Raspberry Pi back for any other projects :).

I suspect that, in the end, the total power consumption of this approach will be more than acceptable for most users. Especially those (many) users who are already pretty seriously into homelab. For those who don't have Nvidia hardware there's also the Tesla P4 which can be purchased used for roughly $100 and uses a max of 60w (slot powered).

I would definitely like to see Willow users compete for who can come up with throttling, power limit, etc configurations that don't drastically impact performance but do reduce power consumption!

mineshaftgap commented 1 year ago

I can't remember exactly what the card is but I think it was a GTX 1080? Maybe a 1040?

Oh yes, I was already planning on making it dual boot headless Linux (if I didn't previously) and then putting Proxmox with a HA image. I would be putting it on a Zigbee power outlet that should be able to tell me energy used.

kristiankielhofner commented 1 year ago

To my knowledge there is no such thing as a GTX 1040 so between those two options I'd say it's a 1080, which should work quite well with WIS.

We've tested WIS in WSL for users who want to use Windows gaming machines but given the option I wouldn't go that route.

mineshaftgap commented 1 year ago

Heh, I will look at it later and figure out what it is!

mineshaftgap commented 1 year ago

...you can tell I am not much of a gamer...

mineshaftgap commented 1 year ago

So it is a NVIDIA GeForce GTX 1070.

kristiankielhofner commented 1 year ago

Perfect! I'm actually going to be doing benchmarks with my recently ordered GTX 1070 later this week. For new users it's probably the best bang for the buck right now.

mineshaftgap commented 1 year ago

Have Proxmox installed and ready to throw WIS at it. Will I be needing to do a VM or will you be releasing a container?

kristiankielhofner commented 1 year ago

It is containerized but setup is a little more involved than docker run - much like Willow today it is early.

kristiankielhofner commented 1 year ago

WIS released - discuss on HN

toverainc / willow

Multinet slower than Alexa #87