modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
6.95k stars 739 forks source link

paraformer onnx-gpu 转 tensorrt 报错 (Could not find any implementation for node) #1955

Open willnufe opened 3 months ago

willnufe commented 3 months ago

1. environment

1.2 onnx to tensorrt:

2. problem

使用下面的命令对 paraformer onnx-gpu 模型进行转换,报错

trtexec 
--onnx=/raid/t3cv/wangch/WORK_SAPCE/ASR/models/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model_sim.onnx 
--saveEngine=/raid/t3cv/wangch/WORK_SAPCE/TEMP/work_space/onnx2tensorrt/models/model.engine  
--minShapes=speech:1x1000x560,speech_lengths:1 
--optShapes=speech:16x1000x560,speech_lengths:16 
--maxShapes=speech:16x1000x560,speech_lengths:16 
--workspace=24576
--verbose  --fp16 --device=7

主要错误是:

Error[10]: Could not find any implementation for node 
{ForeignNode[(Unnamed Layer* 6555) [Constant] + (Unnamed Layer* 6556) [Shuffle].../decoder/decoders/decoders.0/self_attn/Transpose + (Unnamed Layer* 7213) [Shuffle]]}.

image

yuekaizhang commented 3 months ago

@willnufe Need to make some modifications to the code in order to support it successfully. I don't have time recently, but if you are willing to do it, I can give you some suggestions offline.

willnufe commented 3 months ago

@willnufe Need to make some modifications to the code in order to support it successfully. I don't have time recently, but if you are willing to do it, I can give you some suggestions offline.

Thank you very much. I want to make some attempts. Please give me some suggestions.

yuekaizhang commented 3 months ago

@willnufe I think to get the max throughput. We need to first make onnx fp16 paraformer work.

https://github.com/modelscope/FunASR/commit/9a9b474e7de7cc90d2ee124dc8d6c2cfa887c059. This PR used several registered_hook to rescale the torchscript fp32 model to torchscript fp16 model. The first thing is to follow it to calibrate onnx fp32 model.

With onnx fp16, you could expect about 50% throughput improvement comparing with onnx fp32 pipeline. Then let's work on tensorrt export.

Would you mind adding my wechat ykzhang2020?