Closed eleven-monkey closed 9 months ago
这段代码:
from funasr import AutoModel model = AutoModel( model="damo/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch", model_revision="v2.0.0", vad_model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch", vad_model_revision="v2.0.2", punc_model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch", punc_model_revision="v2.0.1", # spk_model="damo/speech_campplus_sv_zh-cn_16k-common", # spk_model_revision="v2.0.0" ) res = model( input="/content/combined_audio_trim.mp3", hotword='磨搭') if __name__ == '__main__': print(res) print(res,'\n',res[0]['text_with_punc'])
之前在colab上可以运行,识别效果不错,前两天运行不了了,报错: 2 frames /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, *kwargs) 1525 or _global_backward_pre_hooks or _global_backward_hooks 1526 or _global_forward_hooks or _global_forward_pre_hooks): -> 1527 return forward_call(args, **kwargs) 1528 1529 try:
TypeError: SeacoParaformer.forward() missing 3 required positional arguments: 'speech_lengths', 'text', and 'text_lengths'
后来换了项目主页上的代码:
from funasr import AutoModel # paraformer-zh is a multi-functional asr model # use vad, punc, spk or not as you need model = AutoModel(model="paraformer-zh", model_revision="v2.0.2", \ vad_model="fsmn-vad", vad_model_revision="v2.0.2", \ punc_model="ct-punc-c", punc_model_revision="v2.0.2", \ spk_model="cam++", spk_model_revision="v2.0.2") res = model.generate(input=f"/content/combined_audio_trim (9).mp3", batch_size=64, hotword='魔搭') print(res)
还是运行不了,报错是: /usr/local/lib/python3.10/dist-packages/torch/_tensor.py in array(self, dtype) 1028 return handle_torch_function(Tensor.array, (self,), self, dtype=dtype) 1029 if dtype is None: -> 1030 return self.numpy() 1031 else: 1032 return self.numpy().astype(dtype, copy=False)
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
请多多指教。
Please note the usage model.generate(input=)
from funasr import AutoModel
# paraformer-zh is a multi-functional asr model
# use vad, punc, spk or not as you need
model = AutoModel(model="paraformer-zh", model_revision="v2.0.2", \
vad_model="fsmn-vad", vad_model_revision="v2.0.2", \
punc_model="ct-punc-c", punc_model_revision="v2.0.2", \
spk_model="cam++", spk_model_revision="v2.0.2")
res = model.generate(input=f"{model.model_path}/example/asr_example.wav",
batch_size=64,
hotword='魔搭')
print(res)
Please note the usage model.generate(input=)
well , I don't see any problem about input parameter in the code I provided ,this just provide the path of the audio file to be recognize.can you be more specific,thanks!
Please note the usage model.generate(input=)
well , I don't see any problem about input parameter in the code I provided ,this just provide the path of the audio file to be recognize.can you be more specific,thanks!
Please update funasr and try it again.
git pull
pip install -e .
我遇到这个用户提到的一样的问题。我的情况是用中文完全正常
from funasr import AutoModel
model = AutoModel(model="paraformer-zh")
res = model.generate(input="k:\\cn1.wav")
用英文模型 paraformer-en
from funasr import AutoModel
model = AutoModel(model="paraformer-en")
res = model.generate(input="k:\\en1.wav")
的话就会提示
raise ConfigKeyError(f"Missing key {key!s}")
omegaconf.errors.ConfigKeyError: Missing key tokenizer_conf
我遇到这个用户提到的一样的问题。我的情况是用中文完全正常
from funasr import AutoModel model = AutoModel(model="paraformer-zh") res = model.generate(input="k:\\cn1.wav")
用英文模型 paraformer-en
from funasr import AutoModel model = AutoModel(model="paraformer-en") res = model.generate(input="k:\\en1.wav")
的话就会提示
raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigKeyError: Missing key tokenizer_conf
paraformer-en
has been supported. Please update funasr and try it again.
from funasr import AutoModel
# paraformer-zh is a multi-functional asr model
# use vad, punc, spk or not as you need
model = AutoModel(model="paraformer-en", model_revision="v2.0.3",
vad_model="fsmn-vad", vad_model_revision="v2.0.2",
punc_model="ct-punc", punc_model_revision="v2.0.3",
)
res = model.generate(input=f"{model.model_path}/example/asr_example.wav",
batch_size_s=300,)
print(res)
paraformer-en
has been supported. Please update funasr and try it again.from funasr import AutoModel # paraformer-zh is a multi-functional asr model # use vad, punc, spk or not as you need model = AutoModel(model="paraformer-en", model_revision="v2.0.3", vad_model="fsmn-vad", vad_model_revision="v2.0.2", punc_model="ct-punc", punc_model_revision="v2.0.3", ) res = model.generate(input=f"{model.model_path}/example/asr_example.wav", batch_size_s=300,) print(res)
我刚刚重新下载了一遍再测试,似乎指定模型版本是很有必要的?我如果不指定版本
model = AutoModel(model="paraformer-en")
res = model.generate(input="e:\\e1.wav")
这样运行后得到跟之前一样的错误信息
如果指定了
model = AutoModel(model="paraformer-en", model_revision="v2.0.3")
res = model.generate(input="e:\\e1.wav")
这样可以正常识别
那这样算是一个bug吗?如果不是的话,那么不指定版本的时候,程序是在使用哪个模型?
另外发现在调用这个模型时
model = AutoModel(model="paraformer-en", model_revision="v2.0.3")
返回的结果里没有提供时间戳,只有key和text
项目主页上的代码:
from funasr import AutoModel
# paraformer-zh is a multi-functional asr model
# use vad, punc, spk or not as you need
model = AutoModel(model="paraformer-zh", model_revision="v2.0.2", \
vad_model="fsmn-vad", vad_model_revision="v2.0.2", \
punc_model="ct-punc-c", punc_model_revision="v2.0.2", \
spk_model="cam++", spk_model_revision="v2.0.2")
res = model.generate(input=f"/content/combined_audio_trim (9).mp3",
batch_size=64,
hotword='魔搭')
print(res)
如果我把音频文件切分成每段40秒的分段,然后一段段的传递给input,是可以识别的,似乎问题与音频时长有关系。如果我切分成每段1分钟,就有的能成功识别,有的会报错。请问有什么更好的办法来识别长语音吗?
项目主页上的代码:
from funasr import AutoModel # paraformer-zh is a multi-functional asr model # use vad, punc, spk or not as you need model = AutoModel(model="paraformer-zh", model_revision="v2.0.2", \ vad_model="fsmn-vad", vad_model_revision="v2.0.2", \ punc_model="ct-punc-c", punc_model_revision="v2.0.2", \ spk_model="cam++", spk_model_revision="v2.0.2") res = model.generate(input=f"/content/combined_audio_trim (9).mp3", batch_size=64, hotword='魔搭') print(res)
如果我把音频文件切分成每段40秒的分段,然后一段段的传递给input,是可以识别的,似乎问题与音频时长有关系。如果我切分成每段1分钟,就有的能成功识别,有的会报错。请问有什么更好的办法来识别长语音吗?
Please raise a new issue.
这段代码:
之前在colab上可以运行,识别效果不错,前两天运行不了了,报错: 2 frames /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, *kwargs) 1525 or _global_backward_pre_hooks or _global_backward_hooks 1526 or _global_forward_hooks or _global_forward_pre_hooks): -> 1527 return forward_call(args, **kwargs) 1528 1529 try:
TypeError: SeacoParaformer.forward() missing 3 required positional arguments: 'speech_lengths', 'text', and 'text_lengths'
后来换了项目主页上的代码:
还是运行不了,报错是: /usr/local/lib/python3.10/dist-packages/torch/_tensor.py in array(self, dtype) 1028 return handle_torch_function(Tensor.array, (self,), self, dtype=dtype) 1029 if dtype is None: -> 1030 return self.numpy() 1031 else: 1032 return self.numpy().astype(dtype, copy=False)
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
请多多指教。