toverainc / willow

Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative
https://heywillow.io/
Apache License 2.0
2.48k stars 94 forks source link

Wake word timing issues #112

Open dslugPX opened 1 year ago

dslugPX commented 1 year ago

This may better be described as command completion timing issues - not sure.

From time to time Hi ESP will activate after the wake word and one of four possible outcomes can happen.

  1. everything is fine - this is the majority of the time by the way.
  2. The command will not complete before willow sends. E.G. Hi ESP, Skip Drums and Space is sent as Skip or Skip Drums - this is the most common of the error conditions I see. Fairly repeatable with that particular command, presuming the cadence of the command is at issue, the command Two, give me two, which is another alias for the same command completes fairly regularly.
  3. The command never completes or takes a long time to complete. There are times when after speaking the wake word and a command Willow will sit and act like I'm still speaking. Sometimes later completing the command (could this simply be a delay from the best effort inference server?) I don't think it's waiting on HA, as the screen is still on with the cancel button visible which makes me think it's still either "listening" or it's processing something on the box. And that leads us too...
  4. Sometimes when 3 is occuring instead of completing eventually Willow will crash on the esp32 box and the server will reload.I think I've seen this maybe 4 times.

Note: Probably some of this is related to the background noise in our house - one more issue coming in on that next.

kristiankielhofner commented 1 year ago

In your other issue #111 I referenced us exposing more configuration options. We've already added two:

1) VAD timeout. Once wake is activated and you start speaking, this is the amount of silence (in milliseconds) before VAD assumes you're done speaking and transcribes the captured audio. The default is 300ms but you can try extending this. Do note there is a trade-off - because it will wait longer for you to potentially finish your command the perceived latency of the command will increase by however many ms you add.

2) Maximum stream duration. This is a value (measured in seconds) that will trigger a final timeout after wake. With this value set it will only capture audio after wake for max $NUM seconds before cutting off VAD and sending the audio for processing. It's generally a good idea to have to ensure the mic doesn't get stuck open but should also help with the endless/very long VAD issue you are having. We assume the default value of five seconds should be long enough for someone to "spit out" a given command but we could certainly be wrong about that. However, like the VAD timeout you may need to kind of "dial-in" this value to ensure it's long enough for your commands and your rate of speech while also not being annoying long in the occasional instances when your background noise confuses VAD. One or more of the configuration options I mentioned previously may also help with the fundamental "background noise VAD confusion" issue.

I would suggest you check on #111 and this issue later in the day once we complete these changes so you can test with them.

In terms of the crash, that could be any number of things and we'll try to reproduce it. Essentially what is happening here is because Willow is intended to be a "no touch, no prompt" device if it ends up in some state of unknown confusion it will just reset itself to make sure it's good to try again for another command. It's one of the many "back stops" we have to ensure that even in it's early form Willow attempts to provide the best overall experience possible. We don't ever want to have to tell people to do things like "unplug it and plug it back in" - no matter how early we are.

Thanks again for reporting these!