A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
res = model.generate(
task="transcribe",
batch_size_s=0,
input="/root/桌面/音频/output_30.mp3",
input="/root/桌面/音频/asr_example_zh.wav",
# sentence_timestamp=True,
is_final=True,
)
print(res)
3. See error
File "/data/llvm/whisper_t.py", line 16, in
res = model.generate(
File "/usr/local/lib/python3.10/dist-packages/funasr/auto/auto_model.py", line 224, in generate
return self.inference_with_vad(input, input_len=input_len, cfg)
File "/usr/local/lib/python3.10/dist-packages/funasr/auto/auto_model.py", line 358, in inference_with_vad
spk_res = self.inference(speech_b, input_len=None, model=self.spk_model, kwargs=kwargs, cfg)
File "/usr/local/lib/python3.10/dist-packages/funasr/auto/auto_model.py", line 257, in inference
res = model.inference(batch, kwargs)
File "/usr/local/lib/python3.10/dist-packages/funasr/models/bicif_paraformer/model.py", line 253, in inference
speech, speech_lengths = extract_fbank(audio_sample_list, data_type=kwargs.get("data_type", "sound"),
File "/usr/local/lib/python3.10/dist-packages/funasr/utils/load_utils.py", line 131, in extract_fbank
data, data_len = frontend(data, data_len, **kwargs)
<!-- If you have a code sample, error messages, stack traces, please provide it here as well -->
#### Code sample
<!-- Ideally attach a minimal code sample to reproduce the decried issue.
Minimal means having the shortest code but still preserving the bug. -->
### Expected behavior
<!-- A clear and concise description of what you expected to happen. -->
### Environment
- OS (e.g., Linux):Linux
- FunASR Version (e.g., 1.0.0):1.0.19
- ModelScope Version (e.g., 1.11.0):1.13.2
- PyTorch Version (e.g., 2.0.0):2.2.1
- How you installed funasr (`pip`, source):pip
- Python version:3.10.12
- GPU (e.g., V100M32)
- CUDA/cuDNN version (e.g., cuda11.7):
- Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
- Any other relevant information:
### Additional context
<!-- Add any other context about the problem here. -->
Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)
🐛 Bug
To Reproduce
Steps to reproduce the behavior (always include the command you ran):
model = AutoModel( model="Whisper-large-v3", kwargs={"model_path": "/data/llvm/whisper/"}, vad_model="iic/speech_fsmn_vad_zh-cn-16k-common-pytorch", vad_kwargs={"max_single_segment_time": 30000}, punc_model="iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
return_spk_res=True,
)
res = model.generate( task="transcribe", batch_size_s=0, input="/root/桌面/音频/output_30.mp3",
input="/root/桌面/音频/asr_example_zh.wav",
)
print(res)
File "/data/llvm/whisper_t.py", line 16, in
res = model.generate(
File "/usr/local/lib/python3.10/dist-packages/funasr/auto/auto_model.py", line 224, in generate
return self.inference_with_vad(input, input_len=input_len, cfg)
File "/usr/local/lib/python3.10/dist-packages/funasr/auto/auto_model.py", line 358, in inference_with_vad
spk_res = self.inference(speech_b, input_len=None, model=self.spk_model, kwargs=kwargs, cfg)
File "/usr/local/lib/python3.10/dist-packages/funasr/auto/auto_model.py", line 257, in inference
res = model.inference(batch, kwargs)
File "/usr/local/lib/python3.10/dist-packages/funasr/models/bicif_paraformer/model.py", line 253, in inference
speech, speech_lengths = extract_fbank(audio_sample_list, data_type=kwargs.get("data_type", "sound"),
File "/usr/local/lib/python3.10/dist-packages/funasr/utils/load_utils.py", line 131, in extract_fbank
data, data_len = frontend(data, data_len, **kwargs)