modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
5.82k stars 629 forks source link

【Error】AssertionError: choose a window size 400 that is [2, 0] #1924

Closed bestlee666 closed 1 month ago

bestlee666 commented 1 month ago

测试音频,请参考附件,音频25s,能正常播放,但识别时报错

File "/home/roots/data/SenseVoice/webui.py", line 167, in model_inference text = model.generate(input=input_wav, File "/home/roots/anaconda3/envs/coqui-xtts/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 263, in generate return self.inference_with_vad(input, input_len=input_len, cfg) File "/home/roots/anaconda3/envs/coqui-xtts/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 410, in inference_with_vad results = self.inference( File "/home/roots/anaconda3/envs/coqui-xtts/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 300, in inference res = model.inference(batch, kwargs) File "/home/roots/data/SenseVoice/model.py", line 817, in inference speech, speech_lengths = extract_fbank( File "/home/roots/anaconda3/envs/coqui-xtts/lib/python3.10/site-packages/funasr/utils/load_utils.py", line 173, in extract_fbank data, data_len = frontend(data, data_len, kwargs) File "/home/roots/anaconda3/envs/coqui-xtts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/roots/anaconda3/envs/coqui-xtts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, **kwargs) File "/home/roots/anaconda3/envs/coqui-xtts/lib/python3.10/site-packages/funasr/frontends/wav_frontend.py", line 134, in forward mat = kaldi.fbank( File "/home/roots/anaconda3/envs/coqui-xtts/lib/python3.10/site-packages/torchaudio/compliance/kaldi.py", line 591, in fbank waveform, channel, sample_frequency, frame_shift, frame_length, round_to_power_of_two, preemphasis_coefficient File "/home/roots/anaconda3/envs/coqui-xtts/lib/python3.10/site-packages/torchaudio/compliance/kaldi.py", line 142, in _get_waveform_and_window_properties window_size, len(waveform) AssertionError: choose a window size 400 that is [2, 0]

8_output.wav.zip

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

🐛 Bug

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

  1. Run cmd '....'
  2. See error

Code sample

Expected behavior

Environment

Additional context

LauraGPT commented 1 month ago

Yes, it is a bug. We would fix it soon.

LauraGPT commented 1 month ago

Bugfix: https://github.com/modelscope/FunASR/pull/1940