Replace rifflearning/sibilant for voice detection

Describe the ideal solution or feature request

As a scientist, I want to have confidence that the utterance data we record in Riff-server is an accurate reflection of when participants actually spoke.

Currently, our voice detection uses a specialized version of the node/sibilant library, stored in rifflearning/sibilant. This library's behavior is suspect for several reasons. The library purports to detect whether a user is speaking or not, and to log an utterance to riff-server whenever the speaker finishes.

Voice-Activity-Detection is a node library (https://www.npmjs.com/package/voice-activity-detection), that appears to provide the functionality to detect the start and end of speech. This library, or another third-party library that we have confidence in, should be used in place of sibilant.

[ ] silibant library is no longer used in Riff products.
[ ] A third-party library is used to detect the start and end of human voices speaking in Riff products.
[ ] Riff products appear to behave the same way as prior to this replacement, and any changes are documented and explained.
[ ] Tests that measure the accuracy of voice processing, so that we know it is working as we expect, and how much this work has improved over our current implementation.

Background

The spike summary doc that lead to this story is here: https://docs.google.com/document/d/1H17j_gpVpagIeVfVeWZ1XX4sTxlSDEDbQRDr7Rqb_cA/edit#

rifflearning / zenhub

Replace rifflearning/sibilant for voice detection #208

Describe the ideal solution or feature request

Background