snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector
MIT License
4.1k stars 402 forks source link

Why shouldn't ONNX be set to True? #478

Closed yongmin96 closed 3 months ago

yongmin96 commented 3 months ago
import os
import wave
import torch
import pyaudio
torch.set_num_threads(1)
os.environ['KMP_DUPLICATE_LIB_OK'] = 'True'

def record_audio_with_vad(file_path, vad_model, vad_utils, sample_rate=16000, chunk_size=512):
    p = pyaudio.PyAudio()

    stream = p.open(format=pyaudio.paInt16,
                    channels=1,
                    rate=sample_rate,
                    input=True,
                    frames_per_buffer=chunk_size)

    data = stream.read(chunk_size)
    frames = []
    frames.append(data)

    with wave.open(file_path, 'wb') as wf:
        wf.setnchannels(1)
        wf.setsampwidth(pyaudio.PyAudio().get_sample_size(pyaudio.paInt16))
        wf.setframerate(sample_rate)
        wf.writeframes(b''.join(frames))

    frames = []

    (get_speech_timestamps,
     save_audio,
     read_audio,
     VADIterator,
     collect_chunks) = vad_utils

    wav = read_audio(file_path, sampling_rate=sample_rate)
    test = vad_model.audio_forward(wav, 16000)

if __name__ == "__main__":
    USE_ONNX = True
    vad_model, vad_utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
                                  model='silero_vad',
                                  force_reload=True,
                                  onnx=USE_ONNX)

    output_file = "C:/Users/ISPL/PycharmProjects/ASR/sample/test.wav"  # Change the filename as needed
    a = 0
    while a == 0:
        try:
            record_audio_with_vad(output_file, vad_model, vad_utils)
        except KeyboardInterrupt:
            a = 1
            break

When I set ONNX to False, it works fine. However, when I set it to True, the following error appears.

onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type. Actual: (tensor(int32)) , expected: (tensor(int64))

adamnsandle commented 3 months ago

Thanks for reporting this bug! Fixed dtype mismatch in the latest commit, check if it works properly, please

yongmin96 commented 3 months ago

I verified that the code is working fine. Thanks for your help. I have another question. When I change 'repo_or_dir' to 'snakers4/silero-vad:v4.0', I get the following error. Is this a bug?

ValueError: Required inputs (['state']) are missing from input feed (['input', 'h', 'c', 'sr']).

adamnsandle commented 3 months ago

It is related to the following issue: https://github.com/snakers4/silero-vad/issues/474 we are working to fix this bug

yongmin96 commented 3 months ago

Ok. I'm asking because I'm getting an error while doing several tests. I'm currently working on Windows 10 and when I use GPU and ONNX at the same time, I get the following error:

ValueError: This ORT build has ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(..., providers=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'], ...)

adamnsandle commented 3 months ago

ONNX model has some restictions on Windows 10 https://github.com/snakers4/silero-vad/issues/355

snakers4 commented 3 months ago

Ok. I'm asking because I'm getting an error while doing several tests. I'm currently working on Windows 10 and when I use GPU and ONNX at the same time, I get the following error:

https://github.com/snakers4/silero-vad/blob/a395853982ded7dcae53de6772984060861c0243/utils_vad.py#L20-L23

I believe we added these lines, because latest VAD versions were not compatible with ONNX GPU, right?

@adamnsandle

snakers4 commented 3 months ago

In any case the VAD is not supposed to be run on GPU. If you can hack the session options to make it work on GPU.

snakers4 commented 3 months ago

Let this issue remain as a reminder how to set different executors for ONNX, but I believe the VAD by design should not be run on GPU. In any case, if for some reason running on GPU is imperative, just fork the above lines.