toverainc / willow

Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative
https://heywillow.io/
Apache License 2.0
2.61k stars 96 forks source link

Poor VAD end with background noise #113

Open dslugPX opened 1 year ago

dslugPX commented 1 year ago

As mentioned in issue 112 we have a relatively noisy environment as we have music running 24/7.

We have noticed that sometimes willow may seem to be listening to the music in addition to our voices. I believe it may be causing some of the issues mentioned in 112, in particular scenario 3.

I have also seen (only once) it pick up what I think was a drum beat as the command "No no no"

Happy to help by providing whatever I can for you.

Also - should mention I have two ESP32s in flight now, and one more in a box still so I can certainly try some different settings and the like as well.

Cheers!

kristiankielhofner commented 1 year ago

Thank you for filing separate issues, we'll be addressing them in commits for you to test with later today.

As I've noted previously, of all of the reports we are getting you seem to be having the most usability issues. It's reassuring to us that even with these initial and very early problems your experience is still positive enough to order more devices!

dslugPX commented 1 year ago

Oh yeah, this is great. I'll buy one more once stock is high again for these too. Gonna put one outside too! Be REALLY nice to control stuff with voice while in the pool (we live in AZ so summer time is spent in water if we are in the yard, though that will be an interesting thing to see how they do in the heat here, eeek)

I'm presuming I'm seeing more issues for two reasons:

  1. My network is a complete mess. It's cobbled together with a half dozen unmanaged switches in addition to a few different mesh network endpoints and some vlan chaos that's probably not helping, at all.
  2. I'm probably the worst (or best, dunno) kind of tester for you right now. Someone willing to kind of muddle through, but not 100% certain what they are doing. I mean, I'm well versed in parts of this, but others I just followed a shit ton of tutorials to get something running and didn't even really try to retain much. I think, so far, I've done things the way you would expect though.
  3. I have tinnitus, and bad. As such and as you know by now we keep music on constantly because it helps drown out some of the constant ringing. I have to presume this is a factor, but you seem confident it can be dialed in, and I have no reason whatsoever to doubt you. We're happy to be a help in knocking out these things as once you get the easy install script stuff done, you are going to have a LOT more users like me around. And my wife is simply over the moon with how easy it is for me to make her little things she can do now. I mean, over the moon.

On Fri, May 26, 2023 at 6:55 AM Kristian Kielhofner < @.***> wrote:

Thank you for filing separate issues, we'll be addressing them in commits for you to test with later today.

As I've noted previously, of all of the reports we are getting you seem to be having the most usability issues. It's reassuring to us that even with these initial and very early problems your experience is still positive enough to order more devices!

— Reply to this email directly, view it on GitHub https://github.com/toverainc/willow/issues/113#issuecomment-1564432756, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3AMQB6GITL6FLTV5R6E4T3XICY6VANCNFSM6AAAAAAYQF6YFI . You are receiving this because you authored the thread.Message ID: @.***>

kristiankielhofner commented 1 year ago

Wow, yeah... Now that I'm hearing about the network situation I suppose I'm even happier Willow works as well as it does, especially for hardware that is 2.4 GHz only. Do you have any plans to address some of that? I wouldn't ask you to do it for Willow - after all, we aim to be the best speech solution in the world and it's good to know it's being used in environments that are... Let's just go with "suboptimal" for a wireless network connected speech recognition device. Don't take this the wrong way but from the sounds of it someone couldn't purposely design more of an environmental nightmare for a solution like Willow ;). I'm almost surprised it works at all.

I'm very sorry to hear you have severe tinnitus. I don't have it myself but from what I understand it's dramatically life-impacting.

Yes, background noise is always a challenge. The ESP BOX and the various libraries do (IMO) a very good job with it but at the end of the day you can start to run out of magic. That said we have plenty of knobs to tweak and we'll get the full set to you later today.

dslugPX commented 1 year ago

Well I went with a little hyperbole there. I have wired backhaul to each access point and the 2.4ghz network has its own VLAN. But it's definitely in the ballpark of "messy but good enough most of the time" :) I always have plans to make everything better. But who knows when that will happen on the network side. It was a nearly 6 month journey to get whole home audio and video working perfectly. Audio was sort of easy, but once you add in keeping things in sync with video too, it gets dicey fast and it took a while, That's when a lot of I'll just run ethernet through this closet and add a switch kind of crap came into play. Sonos would have done the trick, but then I'd have sonos quality sound (sub par) with easy controls. Now I have good sound and easy controls. In fact the very first commands willow ever used were: Switch to Music and Switch to TV. Anyway, if i put a server in to run WIS, I'll move everything else over to it and that will take a ton of the weird network routes on the network out of the mix instead of plex here, HA there, and so forth and so on. (PopOS looks perfect by the way).

Anyway - our house is for sure a good example of the kind of "real world" folks you're ultimately targeting. Perhaps an extreme one even :) Just hope it's not too early for you to begin dealing with this stuff. And if it is, please don't put a ton of effort into helping me specifically, you have much more important stuff to work on, but I sure don't mind giving you feedback so you have it. OK... Day job time!

On Fri, May 26, 2023 at 7:26 AM Kristian Kielhofner < @.***> wrote:

Wow, yeah... Now that I'm hearing about the network situation I suppose I'm even happier Willow works as well as it does, especially for hardware that is 2.4 GHz only. Do you have any plans to address some of that? I wouldn't ask you to do it for Willow - after all, we aim to be the best speech solution in the world and it's good to know it's being used in environments that are... Let's just go with "suboptimal" for a wireless network connected speech recognition device. Don't take this the wrong way but from the sounds of it someone couldn't purposely design more of an environmental nightmare for a solution like Willow ;). I'm almost surprised it works at all.

I'm very sorry to hear you have severe tinnitus. I don't have it myself but from what I understand it's dramatically life-impacting.

Yes, background noise is always a challenge. The ESP BOX and the various libraries do (IMO) a very good job with it but at the end of the day you can start to run out of magic. That said we have plenty of knobs to tweak and we'll get the full set to you later today.

— Reply to this email directly, view it on GitHub https://github.com/toverainc/willow/issues/113#issuecomment-1564478270, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3AMQB727H4DASSIVSTDWG3XIC4QZANCNFSM6AAAAAAYQF6YFI . You are receiving this because you authored the thread.Message ID: @.***>

stintel commented 1 year ago

My apartment is also relatively noisy and I too run into AUDIO_REC_VAD_END not triggering. With 7da6d7358816fb91279c7ef0961c83fad80fbb66 we will force the stream to end after CONFIG_WILLOW_STREAM_TIMEOUT seconds, which avoids endless stream and setting it to 5 works around that problem somewhat.

Last night I wondered if reducing the mic gain would help in noisy environments, so I added a Kconfig option to set mic gain in 00f0d1b6f57769872fb0ac42d6200248e95d4d53. Could you please test if reducing the mic gain helps in noisy environments? I'm currently travelling so can't test myself.

kristiankielhofner commented 1 year ago

@dslugPX As shown in the commit reference I also just added a parameter exposed under "Advanced Configuration" to configure the "aggressiveness" of VAD - higher values mean it will be more selective in considering what constitutes speech. In my initial testing VAD_MODE_4 (most aggressive) helps with this issue, but you may want to play with the various levels in your environment.

dslugPX commented 1 year ago

@dslugPX As shown in the commit reference I also just added a parameter exposed under "Advanced Configuration" to configure the "aggressiveness" of VAD - higher values mean it will be more selective in considering what constitutes speech. In my initial testing VAD_MODE_4 (most aggressive) helps with this issue, but you may want to play with the various levels in your environment.

Nice. I'll have a little time coming up in the next few days to do some updates and try a few more things. We are still using this daily and trying to note things we are finding. Drums are definitely a source of trouble, but I only did the one update since we put them online so most of your more interesting changes aren't in use yet. Will follow up again soon!

btw - bunch of esp32 boxes hit ADAfruit this afternoon, so I'm guessing you will have a new run of users coming at you soon!

kristiankielhofner commented 1 year ago

Thanks, appreciate it!

Yep, we saw a bunch come into Mouser too!