modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
6.47k stars 688 forks source link

在docker 容器中跑推理脚本报错 #1453

Closed ZhinanWu closed 4 months ago

ZhinanWu commented 7 months ago

🐛 Bug

在docker 容器中执行 python 脚本 推理命令 报错

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

  1. 在root目录下 Run cmd 'python main.py'
  2. See error

脚本代码 ` from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks

inference_pipeline = pipeline( task=Tasks.auto_speech_recognition, model='iic/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020', model_revision="v2.0.4", vad_model='iic/speech_fsmn_vad_zh-cn-16k-common-pytorch', vad_model_revision="v2.0.4", punc_model='iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch', punc_model_revision="v2.0.4", outputs='./outputs' )

inference_pipeline(input='asr_example.wav', batch_size_s=300, param_dict={'use_timestamp': True}) `

错误信息 `

python main.py

2024-03-08 17:32:46,567 - modelscope - INFO - PyTorch version 2.2.1+cpu Found. 2024-03-08 17:32:46,567 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer 2024-03-08 17:32:46,599 - modelscope - INFO - Loading done! Current index file version is 1.12.0, with md5 267e259a1c49e41b34f4d79970cf07c7 and a total number of 964 components indexed 2024-03-08 17:32:47,677 - modelscope - INFO - Use user-specified model revision: v2.0.4 2024-03-08 17:32:47,920 - modelscope - INFO - initiate model from /root/.cache/modelscope/hub/iic/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020 2024-03-08 17:32:47,920 - modelscope - INFO - initiate model from location /root/.cache/modelscope/hub/iic/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020. 2024-03-08 17:32:47,921 - modelscope - INFO - initialize model from /root/.cache/modelscope/hub/iic/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020 If you want to use hugging, please pip install -U transformers If you want to use hugging, please pip install -U transformers ckpt: /root/.cache/modelscope/hub/iic/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/model.pt 2024-03-08 17:32:52,570 - modelscope - INFO - Use user-specified model revision: v2.0.4 ckpt: /root/.cache/modelscope/hub/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch/model.pt 2024-03-08 17:32:53,093 - modelscope - INFO - Use user-specified model revision: v2.0.4 ckpt: /root/.cache/modelscope/hub/iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/model.pt 2024-03-08 17:32:54,360 - modelscope - WARNING - Model revision not specified, use revision: v2.0.2 ckpt: /root/.cache/modelscope/hub/iic/speech_campplus_sv_zh-cn_16k-common/campplus_cn_common.bin 2024-03-08 17:32:54,972 - modelscope - WARNING - No preprocessor field found in cfg. 2024-03-08 17:32:54,972 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file. 2024-03-08 17:32:54,972 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': '/root/.cache/modelscope/hub/iic/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020'}. trying to build by task and model information. 2024-03-08 17:32:54,972 - modelscope - WARNING - No preprocessor key ('funasr', 'auto-speech-recognition') found in PREPROCESSOR_MAP, skip building preprocessor. 2024-03-08 17:32:54,973 - modelscope - INFO - cuda is not available, using cpu instead. rtf_avg: 0.014: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 32.52it/s] rtf_avg: 0.079: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 5.99it/s] rtf_avg: 0.030: 50%|██████████████████████████████████████████████████████████████████████████ | 1/2 [00:00<00:00, 11.02it/s] rtf_avg: -0.005: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 180.28it/s] ERROR:root:Only 'iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch' and 'iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch' can predict timestamp, and speaker diarization relies on timestamps. Traceback (most recent call last): File "main.py", line 13, in inference_pipeline(input='asr_example.wav', batch_size_s=300, param_dict={'use_timestamp': True}) File "/usr/local/lib/python3.8/site-packages/modelscope/pipelines/audio/funasr_pipeline.py", line 73, in call output = self.model(*args, kwargs) File "/usr/local/lib/python3.8/site-packages/modelscope/models/base/base_model.py", line 35, in call return self.postprocess(self.forward(*args, *kwargs)) File "/usr/local/lib/python3.8/site-packages/modelscope/models/audio/funasr/model.py", line 61, in forward output = self.model.generate(args, kwargs) File "/usr/local/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 215, in generate return self.inference_with_vad(input, input_len=input_len, **cfg) File "/usr/local/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 437, in inference_with_vad result['timestamp'], KeyError: 'timestamp' 0%| | 0/1 [00:00<?, ?it/s] `

Code sample

Expected behavior

Environment

Additional context

lyblsgo commented 4 months ago

Use the latest Docker.