Open adonahue opened 4 years ago
Jordan & I dug a bit deeper into what we are doing right now.
It looks like our code to construct a bandpass filter for human voices doesn't actually do anything. It was supposed to make this filter:
Luckily, the default parameters for a bandpass filter make this filter:
These are almost the same. The default one is right-shifted by about 5 Hz. So we might be missing the very deepest voices because of this.
We should probably fix it anyway, but it looks like we lucked out.
Another issue is that we are logging some 0ms length utterances. These are probably going to pick up a lot of background noises. This is caused by line 66/67 of https://github.com/rifflearning/sibilant/blob/master/sibilant.js#L68
The fix is to either log only events that have at least 2 speaking times, or to use the start of the quiet times as the end of the utterance, rather than the timestamp for the last high-volume time.
Possible Node libraries we could use instead:
https://www.npmjs.com/package/node-vad
https://www.npmjs.com/package/voice-activity-detection
VAD looks super simple. Voice-Activity-Detection has some more features we could customize, where VAD picks sensible defaults and hides parameters.
"Voice Activity Detection is based on the method used in the upcoming WebRTC HTML5 standard. Extracted from Chromium for stand-alone use as a library."
@adonahue
We are marking this as ready for review. Our findings are summarized in this document:
https://docs.google.com/document/d/1H17j_gpVpagIeVfVeWZ1XX4sTxlSDEDbQRDr7Rqb_cA/edit?usp=sharing
@jaedoucette - I am realizing that it's not clear to me what of the recommended work should be done when. I had the understanding that what we had in place was good enough to move forward with new metrics, and wasn't urgent to fix. But from chatting w. @jordanreedie - I think he does not think that's the case. I think I may have misunderstood what work is high priority, how that impacts our ability to make new metrics, and if it's part of the spike or a follow on effort.
@jordanreedie
I'm still not completely clear on what's outstanding for this card. I think we did talk about it, but I've forgotten what we concluded. I think the conclusion was that VAD may or may not be suitable as a replacement, but that the spike is done because the system works "well enough" for now, even if we want to replace it eventually?
Weigh in here, and then we'll have a note for next time.
I think our conclusion was that, yeah, it seems to work well enough for now, but we should fix the bug that causes us to record zero length utterances. In the future, it would be nice to move to a better designed / cleaner library, but at the moment it would take too much effort.
@jaedoucette @jordanreedie - so it sounds like there is one actionable story right now, which is the zero length utterance bug?
@adonahue Yes. I'll make a card for that, and then close out this spike.
Awesome, thank you @jaedoucette .
As a Riff Developer, I am not confident that our speech detection is working correctly, based on the code that I've seen. Specifically, I'm concerned that we are not properly detecting actual speech versus other ambient noise (such as doors slamming, dogs barking, etc), and this results in lower quality data that could impact the accuracy of our analytics.
I would like to look at other speech detection services (for javascript) and determine whether by direct evidence (testing them and seeing results) or evidence provided by others if there is a better speech detection service out there for us to use.
Story Acceptance Criteria A meeting with the rest of the team to report on the following:
If changes are needed