Open mhilbush opened 1 year ago
I am also having the same type of issues when mentioning the names of HA entities, an example would be:
As a first pass at debugging this try updating your Willow Inference Server URL to our new implementation with a tweak:
https://wisng.tovera.io/api/asr?model=large&beam_size=5
In addition to our new WIS implementation that uses the highest possible quality settings available for Whisper. We default to the medium model with a beam size of 1 otherwise.
Thanks.
I set the WIS URL to what you indicated, built the image, and flashed my device.
It detects when I say the wake word (i.e. Alexa), but it's not showing any of the text spoken after the wake word.
This is what I'm seeing in the monitor (HTTP error 422).
I (06:39:36.230) WILLOW: Using WIS URL 'https://wisng.tovera.io/api/asr?model=large&beam_size=5'
I (06:39:36.240) WILLOW: WIS HTTP client starting stream, waiting for end of speech
I (06:39:39.044) WILLOW: AUDIO_REC_VAD_END
I (06:39:39.045) WILLOW: AUDIO_REC_WAKEUP_END
I (06:39:39.087) WILLOW: WIS HTTP client HTTP_STREAM_POST_REQUEST, write end chunked marker
I (06:39:39.175) WILLOW: WIS HTTP client HTTP_STREAM_FINISH_REQUEST
E (06:39:39.175) WILLOW: WIS returned HTTP error: 422
I (06:39:49.071) WILLOW: Wake LCD timeout, turning off LCD
Edit:
Note, I also changed the Wake Word Recognition Operating Mode to DET_MODE_2CH_95
I feel terrible...
I gave you the wrong URL! Sorry, brain fart on my part. The URL you should use is actually:
https://wisng.tovera.io/api/willow?model=large&beam_size=5
I'm really sorry about that, I promise I don't want to waste your time!
No worries.
Ok, it's working now. Thanks.
I'll spend some time with it tomorrow and get back to you.
With the wisng endpoint (it's in beta) we have debug logging turned on so I was watching your sessions.
You exposed a bug in our production implementation - long story short this server has multiple GPUs and WIS wasn't pinned to the right one - so you were seeing ASR times of ~3s occasionally (load balancing across GPUs) but that is fixed now. You should consistently see response times in the 200-300ms range now.
I'm really off my game!
It was getting a bit late last night, but it did seem like it was taking longer than what was typical. I didn't think much of it at the time knowing it wasn't the full production implementation. Much quicker this morning.
I too experience some issues with certain phrases. My most problematic one seems to be "turn on desk light". Here are some wrong results:
My workaround is to use an alias that is less "error" prone. I seem to have way better success with "workstation" as alias to "desk light".
I have two devices running Willow built from a repo I cloned on June 11. Each device is in a completely separate part of the house (1st floor kitchen and lower level family/rec room).
There are several words I use frequently in my home that Willow often detects inconsistently. This occurs with both devices.
Please see issue #199 for a description and photos of the environment where my devices are located.