Voice Activity detection (VAD) helps recognize whether someone is speaking in an audio. Since using the audio transcription API is expensive, a way to prevent unneeded calls is to only do API calls when someone is speaking. By using VAD, this helps prevent the unneeded calls.
Voice Activity detection (VAD) helps recognize whether someone is speaking in an audio. Since using the audio transcription API is expensive, a way to prevent unneeded calls is to only do API calls when someone is speaking. By using VAD, this helps prevent the unneeded calls.
I implemented the VAD using the
webrtcvad
library. I had to install a fork of the library because it seemed like there were issues installing it with poetry/pip.Since I didn't know if this feature was completely necessary, I kept it under a feature toggle. That way it can easily be removed if necessary.