Closed hertz-pj closed 7 months ago
已验证docker镜像更新到最新版,结果一致。
docker加载的哪个声学模型,是否与Python一致 docker有没有加载Ngram,如果有,通过 --lm-dir “”把Ngram关掉,与Python模型保持一致
docker加载的哪个声学模型,是否与Python一致 docker有没有加载Ngram,如果有,通过 --lm-dir “”把Ngram关掉,与Python模型保持一致
nohup bash run_server.sh \
--download-model-dir /workspace/models \
--vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
--model-dir damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx \
--punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \
--lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
--itn-dir thuduj12/fst_itn_zh \
--hotword /workspace/models/hotwords.txt > log.txt 2>&1 &
这是docker的启动命令。
docker加载的哪个声学模型,是否与Python一致 docker有没有加载Ngram,如果有,通过 --lm-dir “”把Ngram关掉,与Python模型保持一致
已测试关掉Ngram结果还是:从侧面跨越变形深入五官基地。
我测试了一下,最后发现问题的根源在vad_model
把你的代码 inference_pipeline = pipeline( task=Tasks.auto_speech_recognition, model=asr_model_path,
#punc_model=punc_model_path,
)
再看看结果,我也是遇到onnx和python推理结果不一样的问题
但是我发现服务端无法关掉vad
nohup bash run_server_2pass.sh \ --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \ --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \ --vad-dir "" \ --punc-dir "" \ --lm-dir "" \ --itn-dir thuduj12/fst_itn_zh \ --certfile 0 \ --decoder-thread-num 120 \ --io-thread-num 10 \ --hotword ../../hotwords.txt > log.txt 2>&1 &
用这个命令启动成功,但是只要客服端就连接就会报错 E20240306 02:39:05.681002 1117 tpass-online-stream.cpp:9] vad_handle is null
目前初步判断是音频单双声道的原因,都换成单声道结果是一致的。
请问是如何都换成单声道的呢?
请问是如何都换成单声道的呢?
ffmpeg -i wav_path -ac 1 out_path,然后测试。官方说docker支持处理多声道,等官方给个案例。 @lyblsgo
请问是如何都换成单声道的呢?
ffmpeg -i wav_path -ac 1 out_path,然后测试。官方说docker支持处理多声道,等官方给个案例。 @lyblsgo
Docker offline file transcription supports multi-channel audio. The presence of recognition results in the examples above indicates multi-channel support. The inconsistencies between the recognition results in Docker and Python are due to differences in how multi-channel data is processed between Docker and Python. This area can be looked into when time permits, and experts familiar with FFmpeg are also welcome to join the investigation.
问题一:测试发现,确实存在上述所说的问题。 想问生产如果继续使用docker容器的话,有无好的的解决办法推荐。 还是只能自己提前预处理。
问题二:damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch 这个模型也包含方言识别吗?我发现他的能力有时候比社区的方言识别模型效果都好。
请问是如何都换成单声道的呢?
ffmpeg -i wav_path -ac 1 out_path,然后测试。官方说docker支持处理多声道,等官方给个案例。 @lyblsgo
Docker offline file transcription supports multi-channel audio. The presence of recognition results in the examples above indicates multi-channel support. The inconsistencies between the recognition results in Docker and Python are due to differences in how multi-channel data is processed between Docker and Python. This area can be looked into when time permits, and experts familiar with FFmpeg are also welcome to join the investigation.
有计划后续对齐两个版本吗?
pipeline采用版本:
funasr: 0.8.7 model_scope: 1.10.0 执行代码
docker版本:
funasr-runtime-sdk-cpu-0.3.0
推理代码
pipeline结果: 从侧面跨越地形,深入敌方基地。 docker结果:从侧面跨越变形深入五官基地。