modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
5.81k stars 629 forks source link

auto_model 中punc模型入参为空触发的bug #1660

Open clb-123 opened 4 months ago

clb-123 commented 4 months ago

🐛 Bug

punc入参的文本为空时报错如下:

Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.DoubleTensor instead (while checking arguments for embedding)

To Reproduce

Steps to reproduce the behavior (always include the command you ran): 1.在使用pipeline调用speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch模型进行asr推理,当asr识别结果为空时(没有识别有人说话),组合中的punc模型对asr的空文本结果进行推理,导致报错。

pipeline(task=Tasks.auto_speech_recognition, model='iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch', model_revision="v2.0.4", vad_model='iic/speech_fsmn_vad_zh-cn-16k-common-pytorch', vad_model_revision="v2.0.4", punc_model='iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch', punc_model_revision="v2.0.4", spk_model="iic/speech_campplus_sv_zh-cn_16k-common", spk_model_revision="v2.0.2", spk_mode='punc_segment', )

  1. See error

Code sample

File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/modelscope/pipelines/audio/funasr_pipeline.py", line 73, in call output = self.model(*args, kwargs) File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/modelscope/models/base/base_model.py", line 35, in call return self.postprocess(self.forward(*args, *kwargs)) File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/modelscope/models/audio/funasr/model.py", line 61, in forward output = self.model.generate(args, kwargs) File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 205, in generate return self.inference_with_vad(input, input_len=input_len, cfg) File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 386, in inference_with_vad punc_res = self.inference(result["text"], model=self.punc_model, kwargs=self.punc_kwargs, disable_pbar=True, cfg) File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 237, in inference results, meta_data = model.inference(batch, kwargs) File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/models/cttransformer/model.py", line 272, in inference y, = self.punc_forward(data) File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/models/ct_transformer/model.py", line 83, in punc_forward x = self.embed(text) File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 162, in forward return F.embedding( File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/torch/nn/functional.py", line 2233, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.DoubleTensor instead (while checking arguments for embedding)

Expected behavior

纠正这个bug,如果已纠正,请告知正确的funasr版本

Environment

LauraGPT commented 4 months ago

pip install -U funasr modelscope

clb-123 commented 4 months ago

pip install -U funasr modelscope

I have updated the version. This is the current version: funasr==1.0.25 modelscope==1.14.0

There is a new problem now. When the input audio don't exclude active audio, the following errors occurs: Traceback (most recent call last): File "D:\work_program\call-center-asr\engine_frame\web\service\webrtc_asr.py", line 176, in dbfs_check result.asr_content = double_channel_wav_asr(wavfile_path, file_right_path, file_left_path, is_dbfs=True, File "D:\work_program\call-center-asr\engine_frame\web\service\webrtc_asr.py", line 190, in double_channel_wav_asr return asr_pipeline_by_file(file_left_path, file_right_path, ascii_flag, right_bfs_cal) File "D:\work_program\call-center-asr\engine_frame\web\service\webrtc_asr.py", line 254, in asr_pipeline_by_file result.left_content = model.generate(input=file_left_path) File "D:\work_install\miniconda3\envs\funasr\lib\site-packages\funasr\auto\auto_model.py", line 232, in generate return self.inference_with_vad(input, input_len=input_len, **cfg) File "D:\work_install\miniconda3\envs\funasr\lib\site-packages\funasr\auto\auto_model.py", line 434, in inference_with_vad if raw_text is None: UnboundLocalError: local variable 'raw_text' referenced before assignment

LauraGPT commented 4 months ago

Please offer details to reproduce, only use the code of demo.

clb-123 commented 4 months ago

Please offer details to reproduce, only use the code of demo.

demo: model = AutoModel(model="iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch", model_revision="v2.0.4", vad_model="iic/speech_fsmn_vad_zh-cn-8k-common", vad_model_revision="v2.0.4", punc_model="ct-punc-c", punc_model_revision="v2.0.4", spk_model="cam++", spk_model_revision="v2.0.2", spk_mode='punc_segment' ) res = model.generate(input=wav_file) this is a test audio which can trigger this bug: test_audio.zip