modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
6.99k stars 744 forks source link

A10卡GPU推理效率和CPU持平,不清楚是什么地方的问题 #2042

Open lanyuer opened 2 months ago

lanyuer commented 2 months ago

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

❓ Questions and Help

Before asking:

  1. search the issues.
  2. search the docs.

What is your question?

参考:https://github.com/modelscope/FunASR/blob/e8f535f53320780cd8ed6f3b8588b187935d3ae5/runtime/onnxruntime/readme.md 编译出onnxruntime的二进制版本,也打开了GPU=ON

开启量化后的合成效果加速比最大只有300左右,和CPU版本非常接近。看GPU利用率确实也有70%左右,这个是为什么呢。

Code

编译命令: cmake -DCMAKE_BUILD_TYPE=release .. -DONNXRUNTIME_DIR=/home/ubuntu/github/FunASR/onnxruntime-linux-x64-1.14.0 -DFFMPEG_DIR=/home/ubuntu/github/FunASR/ffmpeg-master-latest-linux64-gpl-shared -DGPU=on

模型导出方式:

funasr-export ++model=damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch ++quantize=true ++device=cuda ++type=torchscript

推理命令:

funasr-onnx-offline-rtf --model-dir /home/ubuntu/.cache/modelscope/hub/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch --vad-dir /home/ubuntu/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch --punc-dir /home/ubuntu/.cache/modelscope/hub/damo/punc_ct-transformer_cn-en-common-vocab471067-large --gpu --thread-num 20 --batch-size 48 --quantize true --wav-path ./test100.scp

What have you tried?

What's your environment?

lyblsgo commented 1 month ago

GPU部署请参考 https://github.com/modelscope/FunASR/blob/main/runtime/docs/SDK_advanced_guide_offline_gpu_zh.md