toverainc / willow

Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative
https://heywillow.io/
Apache License 2.0
2.59k stars 96 forks source link

Certain words/phrases are detected inconsistently #200

Open mhilbush opened 1 year ago

mhilbush commented 1 year ago

I have two devices running Willow built from a repo I cloned on June 11. Each device is in a completely separate part of the house (1st floor kitchen and lower level family/rec room).

There are several words I use frequently in my home that Willow often detects inconsistently. This occurs with both devices.

Please see issue #199 for a description and photos of the environment where my devices are located.

adamast0r commented 1 year ago

I am also having the same type of issues when mentioning the names of HA entities, an example would be:

kristiankielhofner commented 1 year ago

As a first pass at debugging this try updating your Willow Inference Server URL to our new implementation with a tweak:

https://wisng.tovera.io/api/asr?model=large&beam_size=5

In addition to our new WIS implementation that uses the highest possible quality settings available for Whisper. We default to the medium model with a beam size of 1 otherwise.

mhilbush commented 1 year ago

Thanks.

I set the WIS URL to what you indicated, built the image, and flashed my device.

It detects when I say the wake word (i.e. Alexa), but it's not showing any of the text spoken after the wake word.

This is what I'm seeing in the monitor (HTTP error 422).

I (06:39:36.230) WILLOW: Using WIS URL 'https://wisng.tovera.io/api/asr?model=large&beam_size=5'
I (06:39:36.240) WILLOW: WIS HTTP client starting stream, waiting for end of speech
I (06:39:39.044) WILLOW: AUDIO_REC_VAD_END
I (06:39:39.045) WILLOW: AUDIO_REC_WAKEUP_END
I (06:39:39.087) WILLOW: WIS HTTP client HTTP_STREAM_POST_REQUEST, write end chunked marker
I (06:39:39.175) WILLOW: WIS HTTP client HTTP_STREAM_FINISH_REQUEST
E (06:39:39.175) WILLOW: WIS returned HTTP error: 422
I (06:39:49.071) WILLOW: Wake LCD timeout, turning off LCD

Edit: Note, I also changed the Wake Word Recognition Operating Mode to DET_MODE_2CH_95

kristiankielhofner commented 1 year ago

I feel terrible...

I gave you the wrong URL! Sorry, brain fart on my part. The URL you should use is actually:

https://wisng.tovera.io/api/willow?model=large&beam_size=5

I'm really sorry about that, I promise I don't want to waste your time!

mhilbush commented 1 year ago

No worries.

mhilbush commented 1 year ago

Ok, it's working now. Thanks.

I'll spend some time with it tomorrow and get back to you.

kristiankielhofner commented 1 year ago

With the wisng endpoint (it's in beta) we have debug logging turned on so I was watching your sessions.

You exposed a bug in our production implementation - long story short this server has multiple GPUs and WIS wasn't pinned to the right one - so you were seeing ASR times of ~3s occasionally (load balancing across GPUs) but that is fixed now. You should consistently see response times in the 200-300ms range now.

I'm really off my game!

mhilbush commented 1 year ago

It was getting a bit late last night, but it did seem like it was taking longer than what was typical. I didn't think much of it at the time knowing it wasn't the full production implementation. Much quicker this morning.

stintel commented 1 year ago

I too experience some issues with certain phrases. My most problematic one seems to be "turn on desk light". Here are some wrong results:

My workaround is to use an alias that is less "error" prone. I seem to have way better success with "workstation" as alias to "desk light".