modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
6.47k stars 688 forks source link

automodel做后台服务,多次调用model.generate出现错 #1326

Closed spritelw closed 8 months ago

spritelw commented 8 months ago

🐛 Bug

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

from funasr import AutoModel

model = AutoModel(model="iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch", model_revision="v2.0.4", vad_model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch", vad_model_revision="v2.0.4", punc_model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch", punc_model_revision="v2.0.4",

spk_model="damo/speech_campplus_sv_zh-cn_16k-common",

              # spk_model_revision="v2.0.2",
              )

res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav", hotword='达摩院 魔搭',

sentence_timestamp=True,

                )

res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav", hotword='达摩院 魔搭',

sentence_timestamp=True,

                )

接口调用路径

generate->inference_with_vad->inference ->model.inference(batch, kwargs)

以下是接口部分代码: def inference(self, data_in, data_lengths=None, key: list = None, tokenizer=None, frontend=None, cache: dict = {}, <---由kwargs展开,并保存, 最终是保存在 self.vad_kwargs **kwargs, ):

    if len(cache) == 0: <--第二次调用时 len(cache) > 0
        self.init_cache(cache, **kwargs)

会存在两种错误:

  1. mat = kaldi.fbank(waveform, File "/home/sd/miniconda3/envs/cuda121_funasr1.0/lib/python3.10/site-packages/torchaudio/compliance/kaldi.py", line 591, in fbank waveform, window_shift, window_size, padded_window_size = _get_waveform_and_window_properties( File "/home/sd/miniconda3/envs/cuda121_funasr1.0/lib/python3.10/site-packages/torchaudio/compliance/kaldi.py", line 142, in _get_waveform_and_window_properties assert 2 <= window_size <= len(waveform), "choose a window size {} that is [2, {}]".format( AssertionError: choose a window size 400 that is [2, 0]

  2. File "/home/sd/transformer/FunASR-git/funasr/models/fsmn_vad_streaming/model.py", line 443, in GetFrameState cur_decibel = cache["stats"].decibel[t] IndexError: list index out of range

Expected behavior

模型内部不应该保存中间结果,作为服务要符合幂等

Environment

Additional context

LauraGPT commented 8 months ago

I have tested it withou any errors. Please list your envs. The cache would be reset in https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/models/fsmn_vad_streaming/model.py#L623

spritelw commented 8 months ago

wav没问题,我传入的是pcm数据,所以没有final

LauraGPT commented 8 months ago

You should set in_final=True, such as .demo

ZhanRao commented 1 month ago

You should set in_final=True, such as .demo 这个好像没有解决问题,运行一段时间又会报错