snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector
MIT License
4.41k stars 432 forks source link

How should i choose window_size_samples? #322

Closed wl-junlin closed 1 year ago

wl-junlin commented 1 year ago

it was said in the comment "Silero VAD models were trained using 512, 1024, 1536 samples for 16000 sample rate" so, for a better acuuracy, should i chosse 1536 as my window_size_samples? however, for a better lantancy, i should choose 512?

snakers4 commented 1 year ago

The bigger the window size, the higher the quality. With an obvious latency trade off.