modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
5.88k stars 636 forks source link

说话人识别怎么识别不出来了啊,都是spk 0 #1986

Open viviliuwqhduhnwqihwqwudceygysjiwuwnn opened 1 month ago

viviliuwqhduhnwqihwqwudceygysjiwuwnn commented 1 month ago

from funasr import AutoModel

model = AutoModel( model="iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch", vad_model="iic/speech_fsmn_vad_zh-cn-16k-common-pytorch", punc_model="iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch", spk_model="iic/speech_campplus_sv_zh-cn_16k-common", )

res = model.generate( input="D:\FunASR-main\output.wav",

) print(res)

'sentence_info': [{'text': '中国的国家主席是谁?', 'start': 1430, 'end': 3575, 'timestamp': [[1430, 1630], [1630, 1850], [1850, 2090], [2110, 2270], [2270, 2510], [2530, 2730], [2730, 2970], [3090, 3250], [3250, 3575]], 'spk': 0}, {'text': '欢迎大家来体验达摩院推出的语音识别模型。', 'start': 3575, 'end': 17185, 'timestamp': [[12960, 13140], [13140, 13380], [13400, 13560], [13560, 13800], [13800, 14040], [14040, 14260], [14260, 14500], [14520, 14680], [14680, 14860], [14860, 15060], [15060, 15260], [15260, 15500], [15500, 15740], [15760, 15980], [15980, 16220], [16240, 16440], [16440, 16680], [16700, 16860], [16860, 17185]], 'spk': 0}]}],不同人说的两句话,识别成了一个人spk 0,怎么回事啊

yangtianyu92 commented 1 month ago

模型训练数据集少,模型本身也小。这种只能自己想办法搞了

HTWMedia commented 11 hours ago

有说话人区分的在线API需要吗, ![Uploading 角色区分.png…]()