snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector
MIT License
3.37k stars 353 forks source link

Properly loading v3.1 and v4 on a non-clean installation #474

Closed rvryan67 closed 1 week ago

rvryan67 commented 1 week ago

❓ Questions and Help

I'm having issues with latest version v5.0

Until I get time to investigate and fix the issue I want to use the previous version,

vad, utils = torch.hub.load( repo_or_dir="snakers4/silero-vad:v4.0", model="silero_vad", onnx=False )

This results in the following error:

The provided filename /root/.cache/torch/hub/snakers4_silero-vad_master/files/silero_vad.jit does not exist

snakers4 commented 1 week ago

Hi, which issue are you having with v5? Can you post some reproducible code which causes an error?

rvryan67 commented 1 week ago
import torch
import torchaudio
import uuid
import os
import urllib
import ffmpeg

vad, utils = torch.hub.load( repo_or_dir="snakers4/silero-vad", model="silero_vad", onnx=False )

def speechonly(wavfile, utils, vad):

    (get_speech_timestamps, save_audio, read_audio, VADIterator, collect_chunks) = utils

    VAD_SR = 16000
    vad_threshold = 0.4

    tmpAudioFile = "/tmp/" + str(uuid.uuid4()) + ".wav" # create wav file from audio_string

    wav = read_audio(wavfile, sampling_rate=VAD_SR)

    t = get_speech_timestamps(wav, vad, sampling_rate=VAD_SR, threshold=vad_threshold, min_speech_duration_ms=250) # Returns list with segments of audio timestamps (start and end)

    print(t)

    chunks = []
    chunk_probs = []

    for i in range(len(t)):
        t[i]["start"] = max(0, t[i]["start"] - 3200) # 0.2s head
        t[i]["end"] = min(wav.shape[0] - 16, t[i]["end"] + 20800) # 1.3s tail
        if i > 0 and t[i]["start"] < t[i - 1]["end"]:
            t[i]["start"] = t[i - 1]["end"] # Remove overlap

        chunk_duration = t[i]["end"]-t[i]["start"]

        if chunk_duration >= 512: # 512 is minimum size to pass through model at 16000 Hz sample rate
            speech_probability = vad(wav[t[i]["start"]:t[i]["end"]], VAD_SR).item()

            chunk = wav[t[i]["start"]:t[i]["end"]]
            chunks.append(chunk)
            chunk_probs.append(speech_probability)

    logger.info("speechonly len(chunks): " + str(len(chunks)) + ", max(chunk_probs): " + str(max(chunk_probs)) + ", vad_threshold: " + str(vad_threshold))

    if len(chunks) == 0 or max(chunk_probs) < vad_threshold: # No speech segments detected or maximum segment probability is below threshold
        return wavfile, t
    else:
        combined_chunks = torch.cat(chunks) # Combine audio segments into one tensor
        save_audio(tmpAudioFile, combined_chunks, sampling_rate=VAD_SR) # Save combined tensor to audio file with non-speech removed
        return tmpAudioFile, t
def urlToWav(inputUrl, outputfile):
    try:
        if os.path.isfile(outputfile):
            os.remove(outputfile)

        dowloadfile = '/tmp/'+os.path.basename(inputUrl)
        urllib.request.urlretrieve(inputUrl, dowloadfile)

        ( 
           ffmpeg.input(dowloadfile)
           .output(outputfile, acodec='pcm_s16le', ac=1, ar=16000)
           .run(capture_stdout=True, capture_stderr=True)
        )
    except Exception as e:
        print("failed to convert to WAV - ERROR: " + str(e))
        return ""   
    finally:
        if os.path.exists(dowloadfile):
            os.remove(dowloadfile)

    return outputfile
audioUrl = 'audioUrl = 'https://file-examples.com/storage/fe0ebbce85667e496a17872/2017/11/file_example_MP3_2MG.mp3''
tmpAudioFile = "/tmp/" + str(uuid.uuid4()) + ".wav" # create wav file from s3 bucket
urlToWav(audioUrl, tmpAudioFile)
speechOnlyFile = tmpAudioFile
speechOnlyFile, voicetimestamps =  speechonly(tmpAudioFile, utils, vad)

ERROR: Provided number of samples is 27936 (Supported values: 256 for 8000 sample rate, 512 for 16000)

snakers4 commented 1 week ago

Hi, this is correct behavior, the VAD always had limitations regarding the chunk size, and now the chunk size is fixed as noted in the error message.

Also probably a more proper way to hack into probabilities would be just to extend the get_speech_timestamps function.

rvryan67 commented 1 week ago

The code worked up to recently, it's broken since v5 released yesterday.

Is there a way I can load the previous version to quickly fix the problem until I have time to fix properly?

snakers4 commented 1 week ago

Your code ran, but it produced incorrect results since vad never worked with such large chunks.

In your case v4.0 does not load because it looks like pytorch caches the hubconf file or the full repo.

From a fresh environment any version loads.

We removed old unused utils in 5.0, so after removing cache everything should work.

ggoedde commented 1 week ago

This issue (The provided filename /root/.cache/torch/hub/snakers4_silero-vad_master/files/silero_vad.jit does not exist) is likely caused by this line: https://github.com/snakers4/silero-vad/blob/v4.0/hubconf.py#L38

From the line above, the model attempts to be loaded from snakers4_silero-vad_master. However, running torch.hub.load(repo_or_dir="snakers4/silero-vad:v4.0", model="silero_vad", onnx=False ), i.e. specifying a version number, will put the model + code in /root/.cache/torch/hub/snakers4_silero-vad_v4.0 instead.

I see in the latest release (v5.0), this snakers4_silero-vad_master isn't hard coded in the model loading step. https://github.com/snakers4/silero-vad/blob/v5.0/hubconf.py#L43

@snakers4 could the loading JIT model code in v4.0 be updated to match what is in v5.0? Otherwise I think trying to download v4.0 will continue to have this issue.

i.e. update hubconf.py for v4.0 to this:

image

instead of this:

image
snakers4 commented 1 week ago

This issue (The provided filename /root/.cache/torch/hub/snakers4_silero-vad_master/files/silero_vad.jit does not exist) is likely caused by this line:

Many thanks, we arrived at the same conclusion. Hence the issue with "non-clean" initialization, when the init is "tainted" with loading several versions at once.

We are thinking now how to fix git history properly, there are 3 versions now - v5.0, v4.0 and v3.1 that people remember.

Ideally, ofc, we would deprecate the old ones, but being able to load the earlier model easily on a non-clean environment is a nice feature, e.g. for benchmarking.

dgoryeo commented 1 week ago

@snakers4 , if possible please don't deprecate the old versions yet. I use version 3.1 for transcribing long form anime movies and so far it works best. Thanks.

snakers4 commented 1 week ago

A possible solution would be to create historic branches for v4 and v3.1 and try to re-tag the tags to use these branches' commits. If this works, it will be and easy fix.

adamnsandle commented 1 week ago

fixed v3.1 and v4.0 tags they should work properly now

snakers4 commented 1 week ago

image

tags for v3.1 and v4 are updated to load predictably for older version on non-clean installations the only downside is that this solution may not work properly for windows

if so, a PR would be appreciated for this line https://github.com/snakers4/silero-vad/blob/master/hubconf.py#L39

snakers4 commented 1 week ago

Please can someone verify that this now works.

GaetanBaert commented 1 week ago

Hello

I just tried, I got ImportError: cannot import name 'get_number_ts' from 'utils_vad' (C:\Users\gaeta/.cache\torch\hub\snakers4_silero-vad_master\utils_vad.py)

When using model, _ = torch.hub.load( repo_or_dir='snakers4/silero-vad:v4.0', model='silero_vad', force_reload=True )

I'm on Windows

adamnsandle commented 1 week ago

Hello

I just tried, I got ImportError: cannot import name 'get_number_ts' from 'utils_vad' (/root/.cache/torch/hub/snakers4_silero-vad_master/utils_vad.py)

When using model, _ = torch.hub.load( repo_or_dir='snakers4/silero-vad:v4.0', model='silero_vad', force_reload=True )

Hi Try running this code before loading vad model to overcome module collision

import sys
try:
    sys.modules.pop('utils_vad')
except:
    pass
GaetanBaert commented 1 week ago

It worked ! I tried to load both v4 and v5 in the same jupyter notebook (I wanted to make a benchmark between both versions), that's why I got this conflict.

snakers4 commented 1 week ago

@dgoryeo @rvryan67 @ggoedde @helloWorld199 @hungiito

please verify that these fixes work for you

dgoryeo commented 1 week ago

I can report that the fixes work. I tried V4 in colab environment and V3.1 in Windows environment (after cleaning the local cache). Thanks @snakers4 !

ggoedde commented 1 week ago

Confirmed that v4.0 works in Databricks environment. Thanks!

snakers4 commented 1 week ago

Looks like that we have 3 confirmations. If the issue persists for someone, please open a new ticket.