snakers4 / silero-models

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Other
4.86k stars 303 forks source link

Bug report - [RunTime Error with latest Jit Spanish Model] #265

Open basillicus opened 9 months ago

basillicus commented 9 months ago

🐛 Bug

When following the examples of the colab_examples notebook, in the PyTorch Example/More Examples section, if use the latest Spanish jit model it gives the following Runtime error:

RuntimeError: stft requires the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release.

To Reproduce

Steps to reproduce the behavior:

  1. Open the Colab_examples.ipynb
  2. In the Pytorch Example/More Example sections, in the corresponding cell when loading the decoder and the model, change the English model to Spanish:
    # model, decoder = init_jit_model(models.stt_models.en.latest.jit, device=device)
    model, decoder = init_jit_model(models.stt_models.es.latest.jit, device=device)
  3. Keep running the notebook two more cells until the loop where the model is called. There is where the error shows up:
    
    RuntimeError                              Traceback (most recent call last)
    [<ipython-input-29-2c955b63e0bf>](https://localhost:8080/#) in <cell line: 4>()
      2 input = prepare_model_input(read_batch(random.sample(batches, k=1)[0]),
      3                             device=device)
    ----> 4 output = model(input)
      5 for example in output:
      6     print(decoder(example.cpu()))

1 frames /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, *kwargs) 1525 or _global_backward_pre_hooks or _global_backward_hooks 1526 or _global_forward_hooks or _global_forward_pre_hooks): -> 1527 return forward_call(args, **kwargs) 1528 1529 try:

RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/stt_pretrained/models/model.py", line 42, in forward _4 = self.win_length _5 = torch.hann_window(self.n_fft, dtype=ops.prim.dtype(x), layout=None, device=ops.prim.device(x), pin_memory=None) x0 = torch.torch.functional.stft(x, _2, _3, _4, _5, True, "reflect", False, True, )


    _6 = torch.slice(x0, 0, 0, 9223372036854775807, 1)
    _7 = torch.slice(_6, 1, 0, 9223372036854775807, 1)
  File "code/__torch__/torch/functional.py", line 20, in stft
  else:
    input0 = input
  _2 = torch.stft(input0, n_fft, hop_length, win_length, window, normalized, onesided)
       ~~~~~~~~~~ <--- HERE
  return _2

Traceback of TorchScript, original code (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/torch/functional.py", line 465, in stft
        input = F.pad(input.view(extended_shape), (pad, pad), pad_mode)
        input = input.view(input.shape[-signal_dim:])
    return _VF.stft(input, n_fft, hop_length, win_length, window, normalized, onesided)
           ~~~~~~~~ <--- HERE
RuntimeError: stft requires the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release.

```

## Expected behavior

The audio file should be transcribed to text

## Environment

The environment of the colab_example notebook itself:

ollecting environment information...
PyTorch version: 2.1.0+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: 14.0.0-1ubuntu1.1
CMake version: version 3.27.7
Libc version: glibc-2.35

Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.120+-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.6
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   46 bits physical, 48 bits virtual
Byte Order:                      Little Endian
CPU(s):                          2
On-line CPU(s) list:             0,1
Vendor ID:                       GenuineIntel
Model name:                      Intel(R) Xeon(R) CPU @ 2.20GHz
CPU family:                      6
Model:                           79
Thread(s) per core:              2
Core(s) per socket:              1
Socket(s):                       1
Stepping:                        0
BogoMIPS:                        4399.99
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat md_clear arch_capabilities
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       32 KiB (1 instance)
L1i cache:                       32 KiB (1 instance)
L2 cache:                        256 KiB (1 instance)
L3 cache:                        55 MiB (1 instance)
NUMA node(s):                    1
NUMA node0 CPU(s):               0,1
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Mitigation; PTE Inversion
Vulnerability Mds:               Vulnerable; SMT Host state unknown
Vulnerability Meltdown:          Vulnerable
Vulnerability Mmio stale data:   Vulnerable
Vulnerability Retbleed:          Vulnerable
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Vulnerability Spectre v2:        Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Vulnerable

Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] torch==2.1.0+cu118
[pip3] torchaudio==2.1.0+cu118
[pip3] torchdata==0.7.0
[pip3] torchsummary==1.5.1
[pip3] torchtext==0.16.0
[pip3] torchvision==0.16.0+cu118
[pip3] triton==2.1.0
[conda] Could not collect
snakers4 commented 9 months ago

Looks like the Spanish model is too old.

amda-phd commented 7 months ago

Greetings. I'm facing the same problem here. I've managed to make the onnx Spanish model work, but I'd like to know if there's any way to use the jit model as it is now. Is there any previous version of torch that might be able to do the trick to actually run it? Thanks in advance for any response you might be able to provide.