Closed khusainovaidar closed 2 years ago
for my example model returns dim 3 tensor and it kills it on item()
Plus maybe it's not quite the same issue, but we found (subjectively) quality degradation of new version of VAD (ONNX version). We tested it on clear samples and it skips lots of voiced segments now. At the same time previous version works excellent.
Plus maybe it's not quite the same issue, but we found (subjectively) quality degradation of new version of VAD (ONNX version). We tested it on clear samples and it skips lots of voiced segments now. At the same time previous version works excellent.
Please create a separate ticket with the audio files, hyper-parameters you are using and please plot the probability charts.
'any_8k_audio_file.wav',
Please provide your audio file.
Please provide your audio file.
It really doesn't matter. It fails with any tensor i tried, f.i. wav = torch.Tensor(1, 100000). With first one found from the Internet also fails:
wav, sr = torchaudio.load('http://mauvecloud.net/sounds/pcm1608m.wav') speech_chunks = get_speech_timestamps( wav, model, sampling_rate=8000 )
@khusainovaidar
Hotfixed. Thanks for reporting!
Wrong models were uploaded accidentally. Latest models are now in repo, please check quality on them.
P.S for 8k model it's better to use kwarg window_size_samples=256
New version of VAD non-ONNX model doesn't work with 8 kHz audio. Code from example:
SAMPLING_RATE = 8000 USE_ONNX = False model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad', model='silero_vad', force_reload=True, onnx=USE_ONNX)
(get_speech_timestamps, save_audio, read_audio, VADIterator, collect_chunks) = utils
wav = read_audio('any_8k_audio_file.wav', sampling_rate=SAMPLING_RATE) speech_timestamps = get_speech_timestamps(wav, model, sampling_rate=SAMPLING_RATE)
ValueError: only one element tensors can be converted to Python scalars on line 252 of utils_vad: speech_prob = model(chunk, sampling_rate).item()