modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
6.8k stars 720 forks source link

在colab上运行不了,报错:TypeError: SeacoParaformer.forward() missing 3 required positional arguments: 'speech_lengths', 'text', and 'text_lengths' #1268

Closed eleven-monkey closed 9 months ago

eleven-monkey commented 9 months ago

这段代码:

from funasr import AutoModel

model = AutoModel(
    model="damo/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
    model_revision="v2.0.0",
    vad_model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch",
    vad_model_revision="v2.0.2",
    punc_model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
    punc_model_revision="v2.0.1",
    # spk_model="damo/speech_campplus_sv_zh-cn_16k-common",
    # spk_model_revision="v2.0.0"
)

res = model(
    input="/content/combined_audio_trim.mp3",
    hotword='磨搭')
if __name__ == '__main__':
    print(res)
print(res,'\n',res[0]['text_with_punc'])

之前在colab上可以运行,识别效果不错,前两天运行不了了,报错: 2 frames /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, *kwargs) 1525 or _global_backward_pre_hooks or _global_backward_hooks 1526 or _global_forward_hooks or _global_forward_pre_hooks): -> 1527 return forward_call(args, **kwargs) 1528 1529 try:

TypeError: SeacoParaformer.forward() missing 3 required positional arguments: 'speech_lengths', 'text', and 'text_lengths'

后来换了项目主页上的代码:

from funasr import AutoModel
# paraformer-zh is a multi-functional asr model
# use vad, punc, spk or not as you need
model = AutoModel(model="paraformer-zh", model_revision="v2.0.2", \
                  vad_model="fsmn-vad", vad_model_revision="v2.0.2", \
                  punc_model="ct-punc-c", punc_model_revision="v2.0.2", \
                  spk_model="cam++", spk_model_revision="v2.0.2")
res = model.generate(input=f"/content/combined_audio_trim (9).mp3", 
                     batch_size=64, 
                     hotword='魔搭')
print(res)

还是运行不了,报错是: /usr/local/lib/python3.10/dist-packages/torch/_tensor.py in array(self, dtype) 1028 return handle_torch_function(Tensor.array, (self,), self, dtype=dtype) 1029 if dtype is None: -> 1030 return self.numpy() 1031 else: 1032 return self.numpy().astype(dtype, copy=False)

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

请多多指教。

LauraGPT commented 9 months ago

这段代码:

from funasr import AutoModel

model = AutoModel(
    model="damo/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
    model_revision="v2.0.0",
    vad_model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch",
    vad_model_revision="v2.0.2",
    punc_model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
    punc_model_revision="v2.0.1",
    # spk_model="damo/speech_campplus_sv_zh-cn_16k-common",
    # spk_model_revision="v2.0.0"
)

res = model(
    input="/content/combined_audio_trim.mp3",
    hotword='磨搭')
if __name__ == '__main__':
    print(res)
print(res,'\n',res[0]['text_with_punc'])

之前在colab上可以运行,识别效果不错,前两天运行不了了,报错: 2 frames /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, *kwargs) 1525 or _global_backward_pre_hooks or _global_backward_hooks 1526 or _global_forward_hooks or _global_forward_pre_hooks): -> 1527 return forward_call(args, **kwargs) 1528 1529 try:

TypeError: SeacoParaformer.forward() missing 3 required positional arguments: 'speech_lengths', 'text', and 'text_lengths'

后来换了项目主页上的代码:

from funasr import AutoModel
# paraformer-zh is a multi-functional asr model
# use vad, punc, spk or not as you need
model = AutoModel(model="paraformer-zh", model_revision="v2.0.2", \
                  vad_model="fsmn-vad", vad_model_revision="v2.0.2", \
                  punc_model="ct-punc-c", punc_model_revision="v2.0.2", \
                  spk_model="cam++", spk_model_revision="v2.0.2")
res = model.generate(input=f"/content/combined_audio_trim (9).mp3", 
                     batch_size=64, 
                     hotword='魔搭')
print(res)

还是运行不了,报错是: /usr/local/lib/python3.10/dist-packages/torch/_tensor.py in array(self, dtype) 1028 return handle_torch_function(Tensor.array, (self,), self, dtype=dtype) 1029 if dtype is None: -> 1030 return self.numpy() 1031 else: 1032 return self.numpy().astype(dtype, copy=False)

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

请多多指教。

Please note the usage model.generate(input=)

from funasr import AutoModel
# paraformer-zh is a multi-functional asr model
# use vad, punc, spk or not as you need
model = AutoModel(model="paraformer-zh", model_revision="v2.0.2", \
                  vad_model="fsmn-vad", vad_model_revision="v2.0.2", \
                  punc_model="ct-punc-c", punc_model_revision="v2.0.2", \
                  spk_model="cam++", spk_model_revision="v2.0.2")
res = model.generate(input=f"{model.model_path}/example/asr_example.wav", 
                     batch_size=64, 
                     hotword='魔搭')
print(res)
eleven-monkey commented 9 months ago

Please note the usage model.generate(input=)

well , I don't see any problem about input parameter in the code I provided ,this just provide the path of the audio file to be recognize.can you be more specific,thanks!

LauraGPT commented 9 months ago

Please note the usage model.generate(input=)

well , I don't see any problem about input parameter in the code I provided ,this just provide the path of the audio file to be recognize.can you be more specific,thanks!

Please update funasr and try it again.

git pull
pip install -e .
dsyrock commented 9 months ago

我遇到这个用户提到的一样的问题。我的情况是用中文完全正常

from funasr import AutoModel
model = AutoModel(model="paraformer-zh")
res = model.generate(input="k:\\cn1.wav")

用英文模型 paraformer-en

from funasr import AutoModel
model = AutoModel(model="paraformer-en")
res = model.generate(input="k:\\en1.wav")

的话就会提示

raise ConfigKeyError(f"Missing key {key!s}")
omegaconf.errors.ConfigKeyError: Missing key tokenizer_conf
LauraGPT commented 9 months ago

我遇到这个用户提到的一样的问题。我的情况是用中文完全正常

from funasr import AutoModel
model = AutoModel(model="paraformer-zh")
res = model.generate(input="k:\\cn1.wav")

用英文模型 paraformer-en

from funasr import AutoModel
model = AutoModel(model="paraformer-en")
res = model.generate(input="k:\\en1.wav")

的话就会提示

raise ConfigKeyError(f"Missing key {key!s}")
omegaconf.errors.ConfigKeyError: Missing key tokenizer_conf

paraformer-en has been supported. Please update funasr and try it again.

from funasr import AutoModel
# paraformer-zh is a multi-functional asr model
# use vad, punc, spk or not as you need
model = AutoModel(model="paraformer-en", model_revision="v2.0.3",
                  vad_model="fsmn-vad", vad_model_revision="v2.0.2",
                  punc_model="ct-punc", punc_model_revision="v2.0.3",
                  )
res = model.generate(input=f"{model.model_path}/example/asr_example.wav", 
                     batch_size_s=300,)
print(res)
dsyrock commented 9 months ago

paraformer-en has been supported. Please update funasr and try it again.

from funasr import AutoModel
# paraformer-zh is a multi-functional asr model
# use vad, punc, spk or not as you need
model = AutoModel(model="paraformer-en", model_revision="v2.0.3",
                  vad_model="fsmn-vad", vad_model_revision="v2.0.2",
                  punc_model="ct-punc", punc_model_revision="v2.0.3",
                  )
res = model.generate(input=f"{model.model_path}/example/asr_example.wav", 
                     batch_size_s=300,)
print(res)

我刚刚重新下载了一遍再测试,似乎指定模型版本是很有必要的?我如果不指定版本

model = AutoModel(model="paraformer-en")
res = model.generate(input="e:\\e1.wav")

这样运行后得到跟之前一样的错误信息

如果指定了

model = AutoModel(model="paraformer-en", model_revision="v2.0.3")
res = model.generate(input="e:\\e1.wav")

这样可以正常识别

那这样算是一个bug吗?如果不是的话,那么不指定版本的时候,程序是在使用哪个模型?

另外发现在调用这个模型时

model = AutoModel(model="paraformer-en", model_revision="v2.0.3")

返回的结果里没有提供时间戳,只有key和text

eleven-monkey commented 9 months ago

项目主页上的代码:

from funasr import AutoModel
# paraformer-zh is a multi-functional asr model
# use vad, punc, spk or not as you need
model = AutoModel(model="paraformer-zh", model_revision="v2.0.2", \
                  vad_model="fsmn-vad", vad_model_revision="v2.0.2", \
                  punc_model="ct-punc-c", punc_model_revision="v2.0.2", \
                  spk_model="cam++", spk_model_revision="v2.0.2")
res = model.generate(input=f"/content/combined_audio_trim (9).mp3", 
                     batch_size=64, 
                     hotword='魔搭')
print(res)

如果我把音频文件切分成每段40秒的分段,然后一段段的传递给input,是可以识别的,似乎问题与音频时长有关系。如果我切分成每段1分钟,就有的能成功识别,有的会报错。请问有什么更好的办法来识别长语音吗?

LauraGPT commented 9 months ago

项目主页上的代码:

from funasr import AutoModel
# paraformer-zh is a multi-functional asr model
# use vad, punc, spk or not as you need
model = AutoModel(model="paraformer-zh", model_revision="v2.0.2", \
                  vad_model="fsmn-vad", vad_model_revision="v2.0.2", \
                  punc_model="ct-punc-c", punc_model_revision="v2.0.2", \
                  spk_model="cam++", spk_model_revision="v2.0.2")
res = model.generate(input=f"/content/combined_audio_trim (9).mp3", 
                     batch_size=64, 
                     hotword='魔搭')
print(res)

如果我把音频文件切分成每段40秒的分段,然后一段段的传递给input,是可以识别的,似乎问题与音频时长有关系。如果我切分成每段1分钟,就有的能成功识别,有的会报错。请问有什么更好的办法来识别长语音吗?

Please raise a new issue.