Certain words/phrases are detected inconsistently

toverainc / willow

Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative

https://heywillow.io/

Apache License 2.0

2.59k stars 96 forks source link

Certain words/phrases are detected inconsistently #200

Open mhilbush opened 1 year ago

mhilbush commented 1 year ago

I have two devices running Willow built from a repo I cloned on June 11. Each device is in a completely separate part of the house (1st floor kitchen and lower level family/rec room).

There are several words I use frequently in my home that Willow often detects inconsistently. This occurs with both devices.

“pool table” is sometimes also detected as “pull table”
“rec room” is sometimes also detected as “wreck room”
“sun room” is sometimes also detected as “summer”, “sunroom” and “sub room”

Please see issue #199 for a description and photos of the environment where my devices are located.

adamast0r commented 1 year ago

I am also having the same type of issues when mentioning the names of HA entities, an example would be:

"smart plug" being detected as "smart blur" or "smart blue"

kristiankielhofner commented 1 year ago

As a first pass at debugging this try updating your Willow Inference Server URL to our new implementation with a tweak:

https://wisng.tovera.io/api/asr?model=large&beam_size=5

In addition to our new WIS implementation that uses the highest possible quality settings available for Whisper. We default to the medium model with a beam size of 1 otherwise.

mhilbush commented 1 year ago

Thanks.

I set the WIS URL to what you indicated, built the image, and flashed my device.

It detects when I say the wake word (i.e. Alexa), but it's not showing any of the text spoken after the wake word.

This is what I'm seeing in the monitor (HTTP error 422).

I (06:39:36.230) WILLOW: Using WIS URL 'https://wisng.tovera.io/api/asr?model=large&beam_size=5'
I (06:39:36.240) WILLOW: WIS HTTP client starting stream, waiting for end of speech
I (06:39:39.044) WILLOW: AUDIO_REC_VAD_END
I (06:39:39.045) WILLOW: AUDIO_REC_WAKEUP_END
I (06:39:39.087) WILLOW: WIS HTTP client HTTP_STREAM_POST_REQUEST, write end chunked marker
I (06:39:39.175) WILLOW: WIS HTTP client HTTP_STREAM_FINISH_REQUEST
E (06:39:39.175) WILLOW: WIS returned HTTP error: 422
I (06:39:49.071) WILLOW: Wake LCD timeout, turning off LCD

Edit: Note, I also changed the Wake Word Recognition Operating Mode to DET_MODE_2CH_95

kristiankielhofner commented 1 year ago

I feel terrible...

I gave you the wrong URL! Sorry, brain fart on my part. The URL you should use is actually:

https://wisng.tovera.io/api/willow?model=large&beam_size=5

I'm really sorry about that, I promise I don't want to waste your time!

mhilbush commented 1 year ago

No worries.

mhilbush commented 1 year ago

Ok, it's working now. Thanks.

I'll spend some time with it tomorrow and get back to you.

kristiankielhofner commented 1 year ago

With the wisng endpoint (it's in beta) we have debug logging turned on so I was watching your sessions.

You exposed a bug in our production implementation - long story short this server has multiple GPUs and WIS wasn't pinned to the right one - so you were seeing ASR times of ~3s occasionally (load balancing across GPUs) but that is fixed now. You should consistently see response times in the 200-300ms range now.

I'm really off my game!

mhilbush commented 1 year ago

It was getting a bit late last night, but it did seem like it was taking longer than what was typical. I didn't think much of it at the time knowing it wasn't the full production implementation. Much quicker this morning.

stintel commented 1 year ago

I too experience some issues with certain phrases. My most problematic one seems to be "turn on desk light". Here are some wrong results:

Turn on this light.
Turn on the disc light.
Turn on the best glide. This seems to happen on small, medium and large models.

My workaround is to use an alias that is less "error" prone. I seem to have way better success with "workstation" as alias to "desk light".