snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector
MIT License
4.41k stars 432 forks source link

Bug report - [can not use vad] #368

Closed jingyuhhh closed 1 year ago

jingyuhhh commented 1 year ago

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

  1. use vad:
    
    (venv) PS D:\Project\autocut> autocut -t --force .\video\1.mp4 --whisper-mode=openai --openai-rpm=50        
    [autocut:transcribe.py:L32] INFO   Done Init model in 1.0 sec
    [autocut:transcribe.py:L36] INFO   Transcribing .\video\1.mp4
    [autocut:utils.py:L115] INFO   .\video\1.md exists. Will overwrite it
    Using cache found in C:\Users\myname/.cache\torch\hub\snakers4_silero-vad_master
    Traceback (most recent call last):
    File "<frozen runpy>", line 198, in _run_module_as_main
    File "<frozen runpy>", line 88, in _run_code
    File "D:\Project\autocut\venv\Scripts\autocut.exe\__main__.py", line 7, in <module>
    File "D:\Project\autocut\venv\Lib\site-packages\autocut\main.py", line 170, in main
    Transcribe(args).run()
    File "D:\Project\autocut\venv\Lib\site-packages\autocut\transcribe.py", line 42, in run
    speech_array_indices = self._detect_voice_activity(audio)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "D:\Project\autocut\venv\Lib\site-packages\autocut\transcribe.py", line 60, in _detect_voice_activity
    self.vad_model, funcs = torch.hub.load(
                            ^^^^^^^^^^^^^^^
    File "D:\Project\autocut\venv\Lib\site-packages\torch\hub.py", line 558, in load
    model = _load_local(repo_or_dir, model, *args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "D:\Project\autocut\venv\Lib\site-packages\torch\hub.py", line 587, in _load_local
    model = entry(*args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^
    File "C:\Users\myname/.cache\torch\hub\snakers4_silero-vad_master\hubconf.py", line 46, in silero_vad
    model = init_jit_model(os.path.join(model_dir, 'silero_vad.jit'))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "C:\Users\myname/.cache\torch\hub\snakers4_silero-vad_master\utils_vad.py", line 149, in init_jit_model
    model = torch.jit.load(model_path, map_location=device)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "D:\Project\autocut\venv\Lib\site-packages\torch\jit\_serialization.py", line 162, in load
    cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files, _restore_shapes)  # type: ignore[call-arg]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    RuntimeError: open file failed because of errno 42 on fopen: Illegal byte sequence, file path: C:\Users\myname/.cache\torch\hub\snakers4_silero-vad_master\files\silero_vad.jit
2. do not use vad:

autocut -t --force .\video\1.mp4 --whisper-mode=openai --openai-rpm=50 --vad=0


In this case, autocut can run well.

[autocut](https://github.com/mli/autocut)

## Environment

StatusCode        : 200
StatusDescription : OK
Content           : 
                    # Unlike the rest of the PyTorch this file must be python2 compliant.
                    # This script outputs relevant system environment info
                    # Run it with `python collect_env.py`.
                    import datetime
                    import locale
                    impor...
RawContent        : HTTP/1.1 200 OK
                    Connection: keep-alive
                    Content-Security-Policy: default-src 'none'; style-src 'unsafe-inline'; sandbox
                    Strict-Transport-Security: max-age=31536000
                    X-Content-Type-Options: nosniff
                    ...
Forms             : {}
Headers           : {[Connection, keep-alive], [Content-Security-Policy, default-src 'none'; style-src 'unsafe-inline'; sandbox], [Strict-Transport-Security, max-age=31536 
                    000], [X-Content-Type-Options, nosniff]...}
Images            : {}
InputFields       : {}
Links             : {}
ParsedHtml        : mshtml.HTMLDocumentClass
RawContentLength  : 21653

I guess this error might be triggered by Win11 ???
snakers4 commented 1 year ago

autocut

We do usually not support third party tools.