Open lanyuer opened 7 months ago
There are several limitations to speaker recognition currently in the pipeline. It may not perform well when the audio duration is too short (less than 60 seconds) or when the number of speakers is too large (more than 10). It cannot address the issue of overlapped speech. So it is recommended to try longer audio. @lanyuer
Just try pyannoate.audio
同,效果不好,全是0。
参考文档: https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary
版本:
funasr 0.8.6
代码: `from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks
audio_in = 'wangfang.wav' output_dir = "./results" inference_pipeline = pipeline( task=Tasks.auto_speech_recognition, model='damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn', model_revision='v0.0.2', vad_model='damo/speech_fsmn_vad_zh-cn-16k-common-pytorch', punc_model='damo/punc_ct-transformer_cn-en-common-vocab471067-large', output_dir=output_dir, ) rec_result = inference_pipeline(audio_in=audio_in, batch_size_token=5000, batch_size_token_threshold_s=40, max_single_segment_time=10000) print(rec_result)
for x in [(x['spk'], x['text'], f'{x["start"]}-{x["end"]}') for x in rec_result['sentences']]: print(x) `
结果: (0, '来来来介绍一下啊,', '900-2120') (0, '这是我大姨这个请问您贵庚了,', '2120-5440') (0, '贵庚八十。', '5440-6580') (0, '哇,', '6580-7180') (0, '这是我老妈,', '7180-8600') (0, '请问您贵庚啊,', '8600-10410') (0, '七十二,', '10410-11450') (0, '哎呀,', '11450-12150') (0, '这是我老爸啊,', '12150-13150') (0, '六哦哥啊,', '13150-15650') (0, '快八十了,', '15650-17075') (0, '这我们家焖面,', '17075-19260') (0, '我妈说吃太简单了,', '19260-20980') (0, '这多好啊。', '20980-22180') (0, '然后注意啊,', '22180-23460') (0, '一定要就蒜,', '23460-24320') (0, '一定要就蒜啊,', '24320-25500')
这里面实际情况是有4个说话人(3女/1男),其中第3/7/11句都不是默认说话人 全部都没有识别出来
音频放在附件上了
wf.zip