snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector
MIT License
4.44k stars 435 forks source link

Bug report - Device cuda does not work #113

Closed pchampio closed 3 years ago

pchampio commented 3 years ago

🐛 Bug

I would like to use cuda to compute the vad. Your tookit has an argument for it: https://github.com/snakers4/silero-vad/blob/a345715b8fc2d24b2991ec5d54c7588c64b9f9c7/utils_vad.py#L174

But it crashes when I set device to 'cuda' (the input wav is also correctly set to("cuda")). Does your toolkit support it ? BTW, Thanks for your awesome work on this toolkit! :+1:

Traceback (most recent call last):
  File "/lium/raid01_b/pchampi/lab/venv/bin/extract_xvectors.py", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/lium/raid01_b/pchampi/lab/sidekit/bin/extract_xvectors.py", line 157, in <module>
    main(xtractor, args.wav_scp, args.out_scp, args.device, args.vad, args.vad_num_samples_per_window, args.vad_min_silence_samples)
  File "/lium/raid01_b/pchampi/labvenv/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/lium/raid01_b/pchampi/lab/sidekit/bin/extract_xvectors.py", line 123, in main
    speech_timestamps = get_speech_ts_adaptive(signal.to("cuda"), model,
  File "/lium/home/pchampi/.cache/torch/hub/snakers4_silero-vad_a345715/utils_vad.py", line 227, in get_speech_ts_adaptive
    chunks = torch.Tensor(torch.cat(to_concat, dim=0)).to(device)
TypeError: expected CPU (got CUDA)

Environment

Collecting environment information...
PyTorch version: 1.8.2+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 10 (buster) (x86_64)
GCC version: (Debian 8.3.0-6) 8.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.28

Python version: 3.8.5 (default, Sep  4 2020, 07:30:14)  [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-4.19.0-8-amd64-x86_64-with-glibc2.10
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.3
[pip3] torch==1.8.2+cu102
[pip3] torchaudio==0.8.2
[pip3] torchvision==0.9.2+cu102
[conda] numpy                     1.21.3                   pypi_0    pypi
[conda] torch                     1.8.2+cu102              pypi_0    pypi
[conda] torchaudio                0.8.2                    pypi_0    pypi
[conda] torchvision               0.9.2+cu102              pypi_0    pypi
snakers4 commented 3 years ago

Hi,

Many thanks for your report. This is not a bug, this is a design choice - https://github.com/snakers4/silero-vad/discussions/74 For simplicity and other reasons we decided to publish only the quantized models that predictably do not work on GPU. Using GPU for VAD seems very counter intuitive, since it works fine on one CPU thread.