auto_model 中punc模型入参为空触发的bug

clb-123 commented 4 months ago

🐛 Bug

punc入参的文本为空时报错如下：

Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.DoubleTensor instead (while checking arguments for embedding)

To Reproduce

Steps to reproduce the behavior (always include the command you ran): 1.在使用pipeline调用speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch模型进行asr推理，当asr识别结果为空时（没有识别有人说话），组合中的punc模型对asr的空文本结果进行推理，导致报错。

pipeline(task=Tasks.auto_speech_recognition, model='iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch', model_revision="v2.0.4", vad_model='iic/speech_fsmn_vad_zh-cn-16k-common-pytorch', vad_model_revision="v2.0.4", punc_model='iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch', punc_model_revision="v2.0.4", spk_model="iic/speech_campplus_sv_zh-cn_16k-common", spk_model_revision="v2.0.2", spk_mode='punc_segment', )

See error

Code sample

File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/modelscope/pipelines/audio/funasr_pipeline.py", line 73, in call output = self.model(*args, kwargs) File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/modelscope/models/base/base_model.py", line 35, in call return self.postprocess(self.forward(*args, *kwargs)) File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/modelscope/models/audio/funasr/model.py", line 61, in forward output = self.model.generate(args, kwargs) File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 205, in generate return self.inference_with_vad(input, input_len=input_len, cfg) File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 386, in inference_with_vad punc_res = self.inference(result["text"], model=self.punc_model, kwargs=self.punc_kwargs, disable_pbar=True, cfg) File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 237, in inference results, meta_data = model.inference(batch, kwargs) File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/models/cttransformer/model.py", line 272, in inference y, = self.punc_forward(data) File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/models/ct_transformer/model.py", line 83, in punc_forward x = self.embed(text) File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 162, in forward return F.embedding( File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/torch/nn/functional.py", line 2233, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.DoubleTensor instead (while checking arguments for embedding)

Expected behavior

纠正这个bug，如果已纠正，请告知正确的funasr版本

Environment

Linux: Alibaba Cloud Linux 3.2104 LTS 64位
- FunASR Version :1.0.8
- ModelScope Version :1.11.1
- PyTorch Version :2.1.2
- pip install funasr
- Python version : 3.8
- GPU (NVIDIA T4)
- CUDA/cuDNN version : NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1

LauraGPT commented 4 months ago

pip install -U funasr modelscope

clb-123 commented 4 months ago

pip install -U funasr modelscope

I have updated the version. This is the current version: funasr==1.0.25 modelscope==1.14.0

There is a new problem now. When the input audio don't exclude active audio, the following errors occurs: Traceback (most recent call last): File "D:\work_program\call-center-asr\engine_frame\web\service\webrtc_asr.py", line 176, in dbfs_check result.asr_content = double_channel_wav_asr(wavfile_path, file_right_path, file_left_path, is_dbfs=True, File "D:\work_program\call-center-asr\engine_frame\web\service\webrtc_asr.py", line 190, in double_channel_wav_asr return asr_pipeline_by_file(file_left_path, file_right_path, ascii_flag, right_bfs_cal) File "D:\work_program\call-center-asr\engine_frame\web\service\webrtc_asr.py", line 254, in asr_pipeline_by_file result.left_content = model.generate(input=file_left_path) File "D:\work_install\miniconda3\envs\funasr\lib\site-packages\funasr\auto\auto_model.py", line 232, in generate return self.inference_with_vad(input, input_len=input_len, **cfg) File "D:\work_install\miniconda3\envs\funasr\lib\site-packages\funasr\auto\auto_model.py", line 434, in inference_with_vad if raw_text is None: UnboundLocalError: local variable 'raw_text' referenced before assignment

LauraGPT commented 4 months ago

Please offer details to reproduce, only use the code of demo.

clb-123 commented 4 months ago

Please offer details to reproduce, only use the code of demo.

demo： model = AutoModel(model="iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch", model_revision="v2.0.4", vad_model="iic/speech_fsmn_vad_zh-cn-8k-common", vad_model_revision="v2.0.4", punc_model="ct-punc-c", punc_model_revision="v2.0.4", spk_model="cam++", spk_model_revision="v2.0.2", spk_mode='punc_segment' ) res = model.generate(input=wav_file) this is a test audio which can trigger this bug: test_audio.zip

modelscope / FunASR