实时音视频转文字听写整句结果返回慢

L-Jim commented 5 months ago

What is your question?

使用 FunASR实时语音听写服务，进行实时音视频转文字听写，使用的是2pass模式服务启动命令

 nohup bash run_server_2pass.sh \
   --download-model-dir /workspace/models \
   --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
   --model-dir damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx  \
   --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx  \
   --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \
   --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
   --itn-dir thuduj12/fst_itn_zh \
   --certfile 0 \
   --hotword /workspace/models/hotwords.txt > log.txt 2>&1 &

客户端参数请求如下：{"mode":"2pass","is_speaking":true,"chunk_size":[5,10,5],"wav_format":"pcm","chunk_interval":10,"wav_name":"456"} aa1 aa2 但是发现返回带时间戳结果的整句时需要比较久，会影响字幕实时性，主要是因为句子比较长，在返回到json中的stamp_sents字段中有几个组成这个长句子的短句。我想问：是否支持每次返回时间戳结果的整句短一些，比如stamp_sents字段中的每个短句单独返回，让字幕实时性高一些呢？如果可以，改怎么配置参数？谢谢

What's your environment?

Docker version (funasr-runtime-sdk-online-cpu-0.1.7)

Ye83 commented 4 months ago

想问下，时间大概要多久。最快能有多少

MooWeii commented 2 months ago

请问找到解决方式了吗

Ignalxy commented 1 month ago

我看源码是要vad识别到断点才会跑offline，默认的vad效果不太行，经常断不开，换个模型试试？

modelscope / FunASR

实时音视频转文字听写整句结果返回慢 #1360

What is your question?

What's your environment?