ricky0123 / vad

Voice activity detector (VAD) for the browser with a simple API
https://www.vad.ricky0123.com
Other
901 stars 143 forks source link

[QUESTION] Different pause durations #156

Closed iak-a-dev closed 4 days ago

iak-a-dev commented 1 week ago

I need to understand if this package can be used for voice activity detection in my project, where I want to trigger different actions based on varying pause lengths. Is it possible to achieve this functionality with this package, and if so, how can I set up different actions for different pause durations?

My goal is to use the VAD to identify small pauses and based on them to cut the audio stream into pieces and thus start transcribing it before the speech is finished

ricky0123 commented 1 week ago

Hi @iak-a-dev it might be possible if you are creative, although it wouldn't be very straightforward. You could set "redemptionFrames" to the shortest pause length you're interested in and have onSpeechEnd callback start a timer that alerts you the next pause duration you're interested in (and can be interrupted by onSpeechStart callback). But if it gets too complex or is buggy, you may be better off creating your own solution. You may also want to consider streaming audio to your server via websocket or webrtc

iak-a-dev commented 4 days ago

@ricky0123 I had the same idea myself but decided to ask for better solutions. Unfortunately, the standard Whisper does not support volume streaming, so I have to experiment. Thanks for your answer!