modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
7.02k stars 747 forks source link

2pass模式下,部署funasr的英文识别结果,最后两个单词之间没有空格 #2071

Open chenpaopao opened 2 months ago

chenpaopao commented 2 months ago

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

🐛 Bug

Bug1

部署 funasr cd FunASR/runtime nohup bash run_server_2pass.sh \ --download-model-dir /workspace/models \ --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \ --model-dir damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx \ --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \ --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \ --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \ --itn-dir thuduj12/fst_itn_zh \ --hotword /workspace/models/hotwords.txt > log.txt 2>&1 &

英文识别最后两个单词的结果之间没有空格: 比如:i can you hearme。 i just want to turn to a few ofthem。

个人怀疑是runtime/onnxruntime/src/ct-transformer-online.cpp 下面的代码有问题: vector WordWithPunc; for (int i = 0; i < sentence_words_list.size(); i++) // for i in range(0, len(sentence_words_list)): { if (!(sentence_words_list[i][0] & 0x80) && (i + 1) < sentence_words_list.size() && !(sentence_words_list[i + 1][0] & 0x80)) { sentence_words_list[i] = " " + sentence_words_list[i]; }

bug2:

麻烦问下 punc_ct-transformer_cn-en-common-vocab471067-large-onnx 模型能否用在 run_server_2pass.sh 里面,我替换后,启动服务失败,会有error: websocket-server-2pass.cpp:586 index out of range

80boys commented 2 months ago

解决了没? 遇到同样的问题

LauraGPT commented 2 months ago

It is a bug.

AlvinAi96 commented 1 month ago

I came across the same bug. Any solution now?

lin-xiaosheng commented 1 month ago

这个问题很影响实际使用,想请问一下目前有什么进度吗?我通过连接wss://www.funasr.com:10095/发现该问题在官方的demo中已经被修复了,但最新的0.1.11版本问题还存在