modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
4.97k stars 541 forks source link

Cannot run whisper with spk model,frontend is none. #1565

Open zhengxingmao opened 3 months ago

zhengxingmao commented 3 months ago

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

🐛 Bug

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

  1. Run cmd '....'
    
    from funasr import AutoModel

model = AutoModel( model="Whisper-large-v3", kwargs={"model_path": "/data/llvm/whisper/"}, vad_model="iic/speech_fsmn_vad_zh-cn-16k-common-pytorch", vad_kwargs={"max_single_segment_time": 30000}, punc_model="iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",

return_spk_res=True,

word_timestamps=True,
spk_model="iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
# spk_model="cam++",
hub="openai",

)

res = model.generate( task="transcribe", batch_size_s=0, input="/root/桌面/音频/output_30.mp3",

input="/root/桌面/音频/asr_example_zh.wav",

# sentence_timestamp=True,
is_final=True,

)

print(res)

3. See error

File "/data/llvm/whisper_t.py", line 16, in res = model.generate( File "/usr/local/lib/python3.10/dist-packages/funasr/auto/auto_model.py", line 224, in generate return self.inference_with_vad(input, input_len=input_len, cfg) File "/usr/local/lib/python3.10/dist-packages/funasr/auto/auto_model.py", line 358, in inference_with_vad spk_res = self.inference(speech_b, input_len=None, model=self.spk_model, kwargs=kwargs, cfg) File "/usr/local/lib/python3.10/dist-packages/funasr/auto/auto_model.py", line 257, in inference res = model.inference(batch, kwargs) File "/usr/local/lib/python3.10/dist-packages/funasr/models/bicif_paraformer/model.py", line 253, in inference speech, speech_lengths = extract_fbank(audio_sample_list, data_type=kwargs.get("data_type", "sound"), File "/usr/local/lib/python3.10/dist-packages/funasr/utils/load_utils.py", line 131, in extract_fbank data, data_len = frontend(data, data_len, **kwargs)



<!-- If you have a code sample, error messages, stack traces, please provide it here as well -->

#### Code sample
<!-- Ideally attach a minimal code sample to reproduce the decried issue.
Minimal means having the shortest code but still preserving the bug. -->

### Expected behavior

<!-- A clear and concise description of what you expected to happen. -->

### Environment

 - OS (e.g., Linux):Linux
 - FunASR Version (e.g., 1.0.0):1.0.19
 - ModelScope Version (e.g., 1.11.0):1.13.2
 - PyTorch Version (e.g., 2.0.0):2.2.1
 - How you installed funasr (`pip`, source):pip
 - Python version:3.10.12
 - GPU (e.g., V100M32)
 - CUDA/cuDNN version (e.g., cuda11.7):
 - Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
 - Any other relevant information:

### Additional context

<!-- Add any other context about the problem here. -->
zhengxingmao commented 3 months ago

@LauraGPT Can you give some advise here to solve the problem ?