modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
7.13k stars 757 forks source link

实时转写 Docker 服务如何使用讲话人识别? #2048

Open purerosefallen opened 3 months ago

purerosefallen commented 3 months ago

Dockerfile:

FROM registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.10
WORKDIR /workspace/FunASR/runtime
RUN chmod +x ./run_server_2pass.sh && \
    sed -i 's/&$//g' ./run_server_2pass.sh
CMD bash ./run_server_2pass.sh --download-model-dir /workspace/models --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx --model-dir iic/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst --itn-dir thuduj12/fst_itn_zh --certfile 0

每次 2pass-offline 的结果大概是

{"is_final":false,"mode":"2pass-offline","stamp_sents":[{"end":1644710,"punc":"","start":1644440,"text_seg":"你","ts_list":[[1644440,1644710]]}],"text":"你","timestamp":"[[1644440,1644710]]","wav_name":"h5"}

但是看起来没有 spk 参数标记讲话人。这个如何去启用呢?

ruifengma commented 2 months ago

我也有同样的问题

HTWMedia commented 2 months ago

有说话人区分的在线API需要吗, ![Uploading 角色区分.png…]()

frankqianghe commented 2 months ago

我也有同样的问题