snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector
MIT License
3.38k stars 353 forks source link

❓ Can window_size_samples be selected as 160 (10ms)? #442

Closed jifashen closed 3 months ago

jifashen commented 3 months ago

❓ Questions and Help

For 16000Hz audio, window_size_samples can choose 512 (32ms) 1024 (64ms) 1536 (96ms). Can window_size_samples be selected as 160 (10ms)? In addition, what do the parameters threshold, min_silence_samples_at_max_speech, min_speech_samples, max_speech_samples, speech_pad_samples mean, and what impact do they have on the vad results?