Closed CerbonXD closed 3 months ago
Hi,
Generating annotation with a 20ms window is very hard and most likely 30ms is good enough for the majority of applications.
You can try windowing the VAD, i.e. applying it with a 20ms hop in a overlapping pattern.
🚀 Feature
I would like to ask support for voice detection in chunks less than 32ms.
Motivation
In my current project, I wanted to use a Voice Activity Detector to identify when a person is speaking. However, in my context, the audio I receive has 321 samples at 16kHz, which equates to 20ms of audio. Because of that the VAD does not work.
Pitch
Make it compatible with audios less than 32ms if possible.
Alternatives
No alternatives I can think of.
Additional context
I'm using Java