modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
6.18k stars 657 forks source link

Paraformer-large-Spk模型没有说话人信息 #1025

Closed chiliuliu closed 11 months ago

chiliuliu commented 11 months ago

官方demo中:

from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks import time if name == 'main': audio_in = '/root/pythonFile/speech2text/test_data/VOB06128.WAV' output_dir = "/root/pythonFile/speech2text/test_data/test_data/txt" inference_pipeline = pipeline( task=Tasks.auto_speech_recognition, model='damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn', model_revision='v0.0.2', vad_model='damo/speech_fsmn_vad_zh-cn-16k-common-pytorch', punc_model='damo/punc_ct-transformer_cn-en-common-vocab471067-large', output_dir=output_dir, ) rec_result = inference_pipeline(audio_in=audio_in, batch_size_token=5000, batch_size_token_threshold_s=40, max_single_segment_time=6000) print(rec_result)

例如,在推论的rec_result里,sentence中并没有说话人的信息。 rec_result["sentences"]: 006: {'text': '前段时间已经在那个呃申请到部里面去了。', 'start': 11300, 'end': 14370, 'text_seg': '前 段 时 间 已 经 在 那 个 呃 ...部 里 面 去 了 ', 'ts_list': [[...], [...], [...], [...], [...], [...], [...], [...], [...], ...]}

LauraGPT commented 11 months ago

Please update funasr, and try it again. docs

chiliuliu commented 11 months ago

Please update funasr, and try it again. docs

Nice work! But where can I find the details of the way of merging the models: paraformer and CAM++? I could not find them in the code. If possible, please provide me some information. Thanks.