For 16000Hz audio, window_size_samples can choose 512 (32ms) 1024 (64ms) 1536 (96ms). Can window_size_samples be selected as 160 (10ms)? In addition, what do the parameters threshold, min_silence_samples_at_max_speech, min_speech_samples, max_speech_samples, speech_pad_samples mean, and what impact do they have on the vad results?
❓ Questions and Help
For 16000Hz audio, window_size_samples can choose 512 (32ms) 1024 (64ms) 1536 (96ms). Can window_size_samples be selected as 160 (10ms)? In addition, what do the parameters threshold, min_silence_samples_at_max_speech, min_speech_samples, max_speech_samples, speech_pad_samples mean, and what impact do they have on the vad results?