snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector
MIT License
3.38k stars 353 forks source link

How to use the older (V4) version? #479

Closed helloWorld199 closed 1 week ago

helloWorld199 commented 1 week ago

Hi, I've been using your model to detect vocals in audio tracks. It was working pretty fine and the choice of the window was a cool feature that allowed me to optimize the detection. Now with the newer version I see a lot of the activity is not detected, and I would revert back to the older model. Is there a way to do this? I tried loading the older model from the hub but it seems it's not available anymore.

By loading it in this way: USE_ONNX = False # change this to True if you want to test onnx model if USE_ONNX: !pip install -q onnxruntime

model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad:v4.0', model='silero_vad', force_reload=True, onnx=USE_ONNX)

(get_speech_timestamps, save_audio, read_audio, VADIterator, collect_chunks) = utils

I get this error: ~/.cache/torch/hub/helloWorld199_silero-vadv4_master/hubconf.py in 2 import torch 3 import json ----> 4 from utils_vad import (init_jit_model, 5 get_speech_timestamps, 6 get_number_ts,

ImportError: cannot import name 'get_number_ts' from 'utils_vad' (/root/.cache/torch/hub/snakers4_silero-vad_master/utils_vad.py)

I'm don't really know how torch hub works, but I would like to know how to resolve this versions conflict, and use the v4 one,

Thank you!

snakers4 commented 1 week ago

https://github.com/snakers4/silero-vad/issues/474

will be fixed here we know about the problem

the choice of the window was a cool feature that allowed me to optimize the detection

turns out some people were passing the whole instead of a chunk in any case with the new version we changed some internals in the model and audio chunk size choice became pointless since the results now hardly depend on it