snakers4 / silero-models

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Other
5k stars 316 forks source link

Bug report - Support for sound backend on Linux #59

Closed trackrx closed 3 years ago

trackrx commented 3 years ago

🐛 Bug

Running samples on the Linux platform (Ubuntu Focal/Mint Ulyssa flavors) causes crash due to the missing "soundfile" backend.

To Reproduce

On Ubuntu (focal):

  1. python3 -m python3 -m pip install pytorch torch omegaconf torchaudio
  2. Run minimal example from the README:
    
    import torch

language = 'ru' speaker = 'kseniya_16khz' device = torch.device('cpu') model, symbols, sample_rate, example_text, apply_tts = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_tts', language=language, speaker=speaker) model = model.to(device) # gpu or cpu audio = apply_tts(texts=[example_text], model=model, sample_rate=sample_rate, symbols=symbols, device=device)


Error message received:

Traceback (most recent call last): ... from utils import (init_jit_model, File "/home/user/.cache/torch/hub/snakers4_silero-models_master/utils.py", line 16, in torchaudio.set_audio_backend(audio_backend_name) # switch backend File "/home/user/.local/lib/python3.8/site-packages/torchaudio/backend/utils.py", line 52, in set_audio_backend raise RuntimeError( RuntimeError: Backend "soundfile" is not one of available backends: ['sox', 'sox_io'].```

It seems that on Linux the default sound backend should be "sox_io" (the "sox" backend is deprecated). The "soundfile" backend is only available on Windows.

Expected behavior

The example code should work on Linux.

Environment

PyTorch version: 1.8.1+cu102 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A

OS: Linux Mint 20.1 (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: 10.0.0-4ubuntu1 CMake version: version 3.16.3

Python version: 3.8 (64-bit runtime) Is CUDA available: False CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.17.4 [pip3] torch==1.8.1 [pip3] torchaudio==0.8.1 [conda] Could not collect

Additional context

Suggested solution - fix util.py to add support for platforms other than Windows.

snakers4 commented 3 years ago

Looks like this happens due to a newer version of torchaudio

snakers4 commented 3 years ago

I guess a better option will be just to fall back to soundfile defaults instead of switching back ends

snakers4 commented 3 years ago

https://github.com/snakers4/silero-models/commit/b6829eec1cdbc64c2db851e2a25a477bd37e68be https://github.com/snakers4/silero-models/commit/1752012f083d488228932a03f0736063de5fc5a9

Cleaned up the examples, remove soundfile switch and dependencies installation

snakers4 commented 3 years ago

@trackrx Please verify that everything works for you now

trackrx commented 3 years ago

Reviewed, re-tested. Confirm that the fix works. Closing the bug. Thank you!

snakers4 commented 3 years ago

Thanks for catching a bug and reporting it properly