robotsasfurniture / passive-sound-localization

Other
1 stars 2 forks source link

feat: Voice activity detection to avoid unneeded API calls for audio transcription or chat completions #5

Closed nicolasperez19 closed 1 month ago

nicolasperez19 commented 1 month ago

Voice Activity detection (VAD) helps recognize whether someone is speaking in an audio. Since using the audio transcription API is expensive, a way to prevent unneeded calls is to only do API calls when someone is speaking. By using VAD, this helps prevent the unneeded calls.

I implemented the VAD using the webrtcvad library. I had to install a fork of the library because it seemed like there were issues installing it with poetry/pip.

Since I didn't know if this feature was completely necessary, I kept it under a feature toggle. That way it can easily be removed if necessary.