A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
2024-03-25 17:45:42,026 - modelscope - INFO - PyTorch version 2.1.2 Found.
2024-03-25 17:45:42,027 - modelscope - INFO - Loading ast index from /home/fresh/.cache/modelscope/ast_indexer
2024-03-25 17:45:42,082 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 9271928ad57a76e3f712e4e1331c1640 and a total number of 956 components indexed
2024-03-25 17:45:44,582 - modelscope - INFO - Use user-specified model revision: v2.0.4
2024-03-25 17:45:44,877 - modelscope - INFO - initiate model from /home/fresh/.cache/modelscope/hub/damo/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online
2024-03-25 17:45:44,878 - modelscope - INFO - initiate model from location /home/fresh/.cache/modelscope/hub/damo/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online.
2024-03-25 17:45:44,879 - modelscope - INFO - initialize model from /home/fresh/.cache/modelscope/hub/damo/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online
Notice: If you want to use whisper, please pip install -U openai-whisper
ckpt: /home/fresh/.cache/modelscope/hub/damo/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online/model.pt
2024-03-25 17:45:55,775 - modelscope - INFO - Use user-specified model revision: v2.0.4
ckpt: /home/fresh/.cache/modelscope/hub/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch/model.pt
2024-03-25 17:45:56,654 - modelscope - INFO - Use user-specified model revision: v2.0.4
ckpt: /home/fresh/.cache/modelscope/hub/iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/model.pt
2024-03-25 17:45:59,121 - modelscope - WARNING - No preprocessor field found in cfg.
2024-03-25 17:45:59,122 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
2024-03-25 17:45:59,122 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': '/home/fresh/.cache/modelscope/hub/damo/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online'}. trying to build by task and model information.
2024-03-25 17:45:59,122 - modelscope - WARNING - No preprocessor key ('funasr', 'auto-speech-recognition') found in PREPROCESSOR_MAP, skip building preprocessor.
rtf_avg: 2.026: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00, 5.31s/it]
0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last): | 0/52 [00:00<?, ?it/s]
File "infer_asr.py", line 12, in
rec_result = inference_pipeline(input='./0325.wav')
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/modelscope/pipelines/audio/funasr_pipeline.py", line 73, in call
output = self.model(*args, kwargs)
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/modelscope/models/base/base_model.py", line 35, in call
return self.postprocess(self.forward(*args, *kwargs))
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/modelscope/models/audio/funasr/model.py", line 61, in forward
output = self.model.generate(args, kwargs)
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 225, in generate
return self.inference_with_vad(input, input_len=input_len, cfg)
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 349, in inference_with_vad
results = self.inference(speech_j, input_len=None, model=model, kwargs=kwargs, cfg)
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 258, in inference
res = model.inference(batch, kwargs)
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/funasr/models/uniasr/model.py", line 916, in inference
nbest_hyps = self.beam_search(
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(args, **kwargs)
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/funasr/models/uniasr/beam_search.py", line 402, in forward
best = self.search(running_hyps, x, x_mask=mask_enc, pre_acoustic_embeds=pre_acoustic_embeds_cur)
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/funasr/models/uniasr/beam_search.py", line 306, in search
scores, states = self.score_full(hyp, x, x_mask=x_mask, pre_acoustic_embeds=pre_acoustic_embeds)
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/funasr/models/uniasr/beam_search.py", line 176, in score_full
scores[k], states[k] = d.score(hyp.yseq, hyp.states[k], x, x_mask=x_mask, pre_acoustic_embeds=pre_acoustic_embeds)
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/funasr/models/scama/decoder.py", line 400, in score
logp, state = self.forward_one_step(
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/funasr/models/scama/decoder.py", line 434, in forward_one_step
x = torch.cat((x, pre_acoustic_embeds), dim=-1)
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 1 but got size 52 for tensor number 1 in the list.
0%| | 0/52 [00:02<?, ?it/s]
0%|
系统:ubuntu22.04 版本信息: funasr==1.0.18,modelscope==1.11.1
推理代码: from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks
inference_pipeline = pipeline( task=Tasks.auto_speech_recognition, model='damo/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online',model_revision='v2.0.4', vad_model='iic/speech_fsmn_vad_zh-cn-16k-common-pytorch', vad_model_revision="v2.0.4", vad_kwargs={"max_single_segment_time": 60000}, punc_model='iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch', punc_model_revision="v2.0.4", )
rec_result = inference_pipeline(input='./0325.wav') print(rec_result[0])
问题:0325.wav该音频时长4分钟,推理出错,取前10s钟能正常推理
错误信息:
2024-03-25 17:45:42,026 - modelscope - INFO - PyTorch version 2.1.2 Found. 2024-03-25 17:45:42,027 - modelscope - INFO - Loading ast index from /home/fresh/.cache/modelscope/ast_indexer 2024-03-25 17:45:42,082 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 9271928ad57a76e3f712e4e1331c1640 and a total number of 956 components indexed 2024-03-25 17:45:44,582 - modelscope - INFO - Use user-specified model revision: v2.0.4 2024-03-25 17:45:44,877 - modelscope - INFO - initiate model from /home/fresh/.cache/modelscope/hub/damo/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online 2024-03-25 17:45:44,878 - modelscope - INFO - initiate model from location /home/fresh/.cache/modelscope/hub/damo/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online. 2024-03-25 17:45:44,879 - modelscope - INFO - initialize model from /home/fresh/.cache/modelscope/hub/damo/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online Notice: If you want to use whisper, please
rec_result = inference_pipeline(input='./0325.wav')
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/modelscope/pipelines/audio/funasr_pipeline.py", line 73, in call
output = self.model(*args, kwargs)
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/modelscope/models/base/base_model.py", line 35, in call
return self.postprocess(self.forward(*args, *kwargs))
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/modelscope/models/audio/funasr/model.py", line 61, in forward
output = self.model.generate(args, kwargs)
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 225, in generate
return self.inference_with_vad(input, input_len=input_len, cfg)
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 349, in inference_with_vad
results = self.inference(speech_j, input_len=None, model=model, kwargs=kwargs, cfg)
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 258, in inference
res = model.inference(batch, kwargs)
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/funasr/models/uniasr/model.py", line 916, in inference
nbest_hyps = self.beam_search(
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(args, **kwargs)
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/funasr/models/uniasr/beam_search.py", line 402, in forward
best = self.search(running_hyps, x, x_mask=mask_enc, pre_acoustic_embeds=pre_acoustic_embeds_cur)
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/funasr/models/uniasr/beam_search.py", line 306, in search
scores, states = self.score_full(hyp, x, x_mask=x_mask, pre_acoustic_embeds=pre_acoustic_embeds)
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/funasr/models/uniasr/beam_search.py", line 176, in score_full
scores[k], states[k] = d.score(hyp.yseq, hyp.states[k], x, x_mask=x_mask, pre_acoustic_embeds=pre_acoustic_embeds)
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/funasr/models/scama/decoder.py", line 400, in score
logp, state = self.forward_one_step(
File "/home/fresh/miniconda3/envs/modelscope/lib/python3.8/site-packages/funasr/models/scama/decoder.py", line 434, in forward_one_step
x = torch.cat((x, pre_acoustic_embeds), dim=-1)
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 1 but got size 52 for tensor number 1 in the list.
0%| | 0/52 [00:02<?, ?it/s]
0%|
pip install -U openai-whisper
ckpt: /home/fresh/.cache/modelscope/hub/damo/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online/model.pt 2024-03-25 17:45:55,775 - modelscope - INFO - Use user-specified model revision: v2.0.4 ckpt: /home/fresh/.cache/modelscope/hub/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch/model.pt 2024-03-25 17:45:56,654 - modelscope - INFO - Use user-specified model revision: v2.0.4 ckpt: /home/fresh/.cache/modelscope/hub/iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/model.pt 2024-03-25 17:45:59,121 - modelscope - WARNING - No preprocessor field found in cfg. 2024-03-25 17:45:59,122 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file. 2024-03-25 17:45:59,122 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': '/home/fresh/.cache/modelscope/hub/damo/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online'}. trying to build by task and model information. 2024-03-25 17:45:59,122 - modelscope - WARNING - No preprocessor key ('funasr', 'auto-speech-recognition') found in PREPROCESSOR_MAP, skip building preprocessor. rtf_avg: 2.026: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00, 5.31s/it] 0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last): | 0/52 [00:00<?, ?it/s] File "infer_asr.py", line 12, in