A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Nice work!
But where can I find the details of the way of merging the models: paraformer and CAM++?
I could not find them in the code. If possible, please provide me some information. Thanks.
官方demo中:
from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks import time if name == 'main': audio_in = '/root/pythonFile/speech2text/test_data/VOB06128.WAV' output_dir = "/root/pythonFile/speech2text/test_data/test_data/txt" inference_pipeline = pipeline( task=Tasks.auto_speech_recognition, model='damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn', model_revision='v0.0.2', vad_model='damo/speech_fsmn_vad_zh-cn-16k-common-pytorch', punc_model='damo/punc_ct-transformer_cn-en-common-vocab471067-large', output_dir=output_dir, ) rec_result = inference_pipeline(audio_in=audio_in, batch_size_token=5000, batch_size_token_threshold_s=40, max_single_segment_time=6000) print(rec_result)
例如,在推论的rec_result里,sentence中并没有说话人的信息。 rec_result["sentences"]: 006: {'text': '前段时间已经在那个呃申请到部里面去了。', 'start': 11300, 'end': 14370, 'text_seg': '前 段 时 间 已 经 在 那 个 呃 ...部 里 面 去 了 ', 'ts_list': [[...], [...], [...], [...], [...], [...], [...], [...], [...], ...]}