open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.79k stars 638 forks source link

Running onnx2tensorrt on jetson xavier nx reports an error, prompting Could not find any implementation for node GlobalAveragePool_29 #2376

Closed da13132 closed 1 year ago

da13132 commented 1 year ago

Checklist

Describe the bug

I tried to deploy rtmpose on the jetson device, ran onnx2tensorrt and reported an error, prompting Could not find any implementation for node GlobalAveragePool_29, it seems that operators including softmax could not be found, I tried to increase the working space, but still reported an error, tensorrt version is 8.0 .1.6. Hope to get your help!

Reproduction

python tools/onnx2tensorrt.py /home/zhanglei/Desktop/zhanglei/mmdeploy/configs/mmpose/pose-detection_simcc_tensorrt_dynamic-256x192.py /home/zhanglei/Desktop/zhanglei/mmdeploy/rtmpose-t-9d41b3.onnx /home/zhanglei/Desktop/zhanglei/mmdeploy/rtmpose_tiny_tensorrt

Environment

08/25 20:32:03 - mmengine - INFO - sys.platform: linux
08/25 20:32:03 - mmengine - INFO - Python: 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 07:33:48) [GCC 7.3.0]
08/25 20:32:03 - mmengine - INFO - CUDA available: True
08/25 20:32:03 - mmengine - INFO - numpy_random_seed: 2147483648
08/25 20:32:03 - mmengine - INFO - GPU 0: Xavier
08/25 20:32:03 - mmengine - INFO - CUDA_HOME: /usr/local/cuda
08/25 20:32:03 - mmengine - INFO - NVCC: Cuda compilation tools, release 10.2, V10.2.300
08/25 20:32:03 - mmengine - INFO - GCC: x86_64-conda_cos6-linux-gnu-gcc (conda-forge gcc 12.2.0-19) 12.2.0
08/25 20:32:03 - mmengine - INFO - PyTorch: 1.10.0
08/25 20:32:03 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 7.5
  - C++ Version: 201402
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: NO AVX
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_53,code=sm_53;-gencode;arch=compute_62,code=sm_62;-gencode;arch=compute_72,code=sm_72
  - CuDNN 8.2.1
    - Built with CuDNN 8.0
  - Build settings: BLAS_INFO=open, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=8.0.0, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -DMISSING_ARM_VST1 -DMISSING_ARM_VLD1 -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=open, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=ON, USE_NCCL=0, USE_NNPACK=ON, USE_OPENMP=ON,

08/25 20:32:03 - mmengine - INFO - TorchVision: 0.11.1
08/25 20:32:03 - mmengine - INFO - OpenCV: 4.8.0
08/25 20:32:03 - mmengine - INFO - MMEngine: 0.8.2
08/25 20:32:03 - mmengine - INFO - MMCV: 2.0.0
08/25 20:32:03 - mmengine - INFO - MMCV Compiler: GCC 7.5
08/25 20:32:03 - mmengine - INFO - MMCV CUDA Compiler: 10.2
08/25 20:32:03 - mmengine - INFO - MMDeploy: 1.2.0+553f9b8
08/25 20:32:03 - mmengine - INFO -

08/25 20:32:03 - mmengine - INFO - **********Backend information**********
08/25 20:32:03 - mmengine - INFO - tensorrt:    8.0.1.6
08/25 20:32:03 - mmengine - INFO - tensorrt custom ops: Available
08/25 20:32:03 - mmengine - INFO - ONNXRuntime: None
08/25 20:32:03 - mmengine - INFO - ONNXRuntime-gpu:     1.10.0
08/25 20:32:03 - mmengine - INFO - ONNXRuntime custom ops:      NotAvailable
08/25 20:32:03 - mmengine - INFO - pplnn:       None
08/25 20:32:03 - mmengine - INFO - ncnn:        None
08/25 20:32:03 - mmengine - INFO - snpe:        None
08/25 20:32:03 - mmengine - INFO - openvino:    None
08/25 20:32:03 - mmengine - INFO - torchscript: 1.10.0
08/25 20:32:03 - mmengine - INFO - torchscript custom ops:      NotAvailable
08/25 20:32:04 - mmengine - INFO - rknn-toolkit:        None
08/25 20:32:04 - mmengine - INFO - rknn-toolkit2:       None
08/25 20:32:04 - mmengine - INFO - ascend:      None
08/25 20:32:04 - mmengine - INFO - coreml:      None
08/25 20:32:04 - mmengine - INFO - tvm: None
08/25 20:32:04 - mmengine - INFO - vacc:        None
08/25 20:32:04 - mmengine - INFO -

08/25 20:32:04 - mmengine - INFO - **********Codebase information**********
08/25 20:32:04 - mmengine - INFO - mmdet:       3.0.0
08/25 20:32:04 - mmengine - INFO - mmseg:       None
08/25 20:32:04 - mmengine - INFO - mmpretrain:  None
08/25 20:32:04 - mmengine - INFO - mmocr:       None
08/25 20:32:04 - mmengine - INFO - mmagic:      None
08/25 20:32:04 - mmengine - INFO - mmdet3d:     None
08/25 20:32:04 - mmengine - INFO - mmpose:      1.1.0
08/25 20:32:04 - mmengine - INFO - mmrotate:    None
08/25 20:32:04 - mmengine - INFO - mmaction:    None
08/25 20:32:04 - mmengine - INFO - mmrazor:     None
08/25 20:32:04 - mmengine - INFO - mmyolo:      None

Error traceback

08/25 20:25:11 - mmengine - INFO - Successfully loaded tensorrt plugins from /home/zhanglei/Desktop/zhanglei/mmdeploy/mmdeploy/lib/libmmdeploy_tensorrt_ops.so
[TensorRT] INFO: [MemUsageChange] Init CUDA: CPU +353, GPU +0, now: CPU 434, GPU 4551 (MiB)
[TensorRT] INFO: ----------------------------------------------------------------
[TensorRT] INFO: Input filename:   /home/zhanglei/Desktop/zhanglei/mmdeploy/rtmpose-t-9d41b3.onnx
[TensorRT] INFO: ONNX IR version:  0.0.6
[TensorRT] INFO: Opset version:    11
[TensorRT] INFO: Producer name:    pytorch
[TensorRT] INFO: Producer version: 1.11.0
[TensorRT] INFO: Domain:
[TensorRT] INFO: Model version:    0
[TensorRT] INFO: Doc string:
[TensorRT] INFO: ----------------------------------------------------------------
[TensorRT] WARNING: onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[TensorRT] WARNING: DLA requests all profiles have same min, max, and opt value. All dla layers are falling back to GPU
[TensorRT] INFO: [MemUsageSnapshot] Builder begin: CPU 448 MiB, GPU 4592 MiB
[TensorRT] INFO: ---------- Layers Running on DLA ----------
[TensorRT] INFO: ---------- Layers Running on GPU ----------
[TensorRT] INFO: [GpuLayer] onnx::MatMul_546 + (Unnamed Layer* 144) [Shuffle]
[TensorRT] INFO: [GpuLayer] onnx::MatMul_549 + (Unnamed Layer* 158) [Shuffle]
[TensorRT] INFO: [GpuLayer] onnx::Mul_551
[TensorRT] INFO: [GpuLayer] head.gau.beta + (Unnamed Layer* 185) [Shuffle]
[TensorRT] INFO: [GpuLayer] Conv_0
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_1), Mul_2)
[TensorRT] INFO: [GpuLayer] Conv_3
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_4), Mul_5)
[TensorRT] INFO: [GpuLayer] Conv_6
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_7), Mul_8)
[TensorRT] INFO: [GpuLayer] Conv_9
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_10), Mul_11)
[TensorRT] INFO: [GpuLayer] Conv_12 || Conv_15
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_13), Mul_14)
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_16), Mul_17)
[TensorRT] INFO: [GpuLayer] Conv_18
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_19), Mul_20)
[TensorRT] INFO: [GpuLayer] Conv_21
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_22), Mul_23)
[TensorRT] INFO: [GpuLayer] Conv_24
[TensorRT] INFO: [GpuLayer] PWN(PWN(PWN(Sigmoid_25), Mul_26), Add_27)
[TensorRT] INFO: [GpuLayer] onnx::Concat_255 copy
[TensorRT] INFO: [GpuLayer] GlobalAveragePool_29
[TensorRT] INFO: [GpuLayer] Conv_30
[TensorRT] INFO: [GpuLayer] PWN(HardSigmoid_31, Mul_32)
[TensorRT] INFO: [GpuLayer] Conv_33
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_34), Mul_35)
[TensorRT] INFO: [GpuLayer] Conv_36
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_37), Mul_38)
[TensorRT] INFO: [GpuLayer] Conv_39 || Conv_42
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_40), Mul_41)
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_43), Mul_44)
[TensorRT] INFO: [GpuLayer] Conv_45
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_46), Mul_47)
[TensorRT] INFO: [GpuLayer] Conv_48
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_49), Mul_50)
[TensorRT] INFO: [GpuLayer] Conv_51
[TensorRT] INFO: [GpuLayer] PWN(PWN(PWN(Sigmoid_52), Mul_53), Add_54)
[TensorRT] INFO: [GpuLayer] onnx::Concat_289 copy
[TensorRT] INFO: [GpuLayer] GlobalAveragePool_56
[TensorRT] INFO: [GpuLayer] Conv_57
[TensorRT] INFO: [GpuLayer] PWN(HardSigmoid_58, Mul_59)
[TensorRT] INFO: [GpuLayer] Conv_60
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_61), Mul_62)
[TensorRT] INFO: [GpuLayer] Conv_63
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_64), Mul_65)
[TensorRT] INFO: [GpuLayer] Conv_66 || Conv_69
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_67), Mul_68)
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_70), Mul_71)
[TensorRT] INFO: [GpuLayer] Conv_72
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_73), Mul_74)
[TensorRT] INFO: [GpuLayer] Conv_75
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_76), Mul_77)
[TensorRT] INFO: [GpuLayer] Conv_78
[TensorRT] INFO: [GpuLayer] PWN(PWN(PWN(Sigmoid_79), Mul_80), Add_81)
[TensorRT] INFO: [GpuLayer] GlobalAveragePool_83
[TensorRT] INFO: [GpuLayer] Conv_84
[TensorRT] INFO: [GpuLayer] PWN(HardSigmoid_85, Mul_86)
[TensorRT] INFO: [GpuLayer] Conv_87
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_88), Mul_89)
[TensorRT] INFO: [GpuLayer] Conv_90
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_91), Mul_92)
[TensorRT] INFO: [GpuLayer] Conv_93
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_94), Mul_95)
[TensorRT] INFO: [GpuLayer] MaxPool_96
[TensorRT] INFO: [GpuLayer] MaxPool_97
[TensorRT] INFO: [GpuLayer] MaxPool_98
[TensorRT] INFO: [GpuLayer] onnx::MaxPool_340 copy
[TensorRT] INFO: [GpuLayer] Conv_100
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_101), Mul_102)
[TensorRT] INFO: [GpuLayer] Conv_103 || Conv_106
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_104), Mul_105)
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_107), Mul_108)
[TensorRT] INFO: [GpuLayer] Conv_109
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_110), Mul_111)
[TensorRT] INFO: [GpuLayer] Conv_112
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_113), Mul_114)
[TensorRT] INFO: [GpuLayer] Conv_115
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_116), Mul_117)
[TensorRT] INFO: [GpuLayer] GlobalAveragePool_119
[TensorRT] INFO: [GpuLayer] Conv_120
[TensorRT] INFO: [GpuLayer] PWN(HardSigmoid_121, Mul_122)
[TensorRT] INFO: [GpuLayer] Conv_123
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_124), Mul_125)
[TensorRT] INFO: [GpuLayer] Conv_126
[TensorRT] INFO: [GpuLayer] Reshape_133
[TensorRT] INFO: [GpuLayer] Mul_134 + ReduceSum_135 + Sqrt_136
[TensorRT] INFO: [GpuLayer] PWN(PWN(PWN(PWN(onnx::Mul_390 + (Unnamed Layer* 136) [Shuffle], Mul_138), Clip_139), Div_140), PWN(head.mlp.0.g + (Unnamed Layer* 141) [Shuffle], Mul_141))
[TensorRT] INFO: [GpuLayer] MatMul_142
[TensorRT] INFO: [GpuLayer] Mul_143 + ReduceSum_144 + Sqrt_145
[TensorRT] INFO: [GpuLayer] PWN(PWN(PWN(PWN(onnx::Mul_404 + (Unnamed Layer* 150) [Shuffle], Mul_147), Clip_148), Div_149), PWN(head.gau.ln.g + (Unnamed Layer* 155) [Shuffle], Mul_150))
[TensorRT] INFO: [GpuLayer] MatMul_151
[TensorRT] INFO: [GpuLayer] PWN(PWN(Sigmoid_152), Mul_153)
[TensorRT] INFO: [GpuLayer] Split_154
[TensorRT] INFO: [GpuLayer] Split_154_0
[TensorRT] INFO: [GpuLayer] Split_154_1
[TensorRT] INFO: [GpuLayer] Unsqueeze_155
[TensorRT] INFO: [GpuLayer] PWN(Mul_156, Add_157)
[TensorRT] INFO: [GpuLayer] Split_158
[TensorRT] INFO: [GpuLayer] Split_158_2
[TensorRT] INFO: [GpuLayer] Squeeze_159
[TensorRT] INFO: [GpuLayer] Squeeze_160 + Transpose_161
[TensorRT] INFO: [GpuLayer] MatMul_162
[TensorRT] INFO: [GpuLayer] PWN(PWN(onnx::Div_431 + (Unnamed Layer* 208) [Shuffle], Div_164 + Relu_165), Mul_166)
[TensorRT] INFO: [GpuLayer] MatMul_167
[TensorRT] INFO: [GpuLayer] Mul_168
[TensorRT] INFO: [GpuLayer] onnx::MatMul_552 + (Unnamed Layer* 215) [Shuffle]
[TensorRT] INFO: [GpuLayer] MatMul_169
[TensorRT] INFO: [GpuLayer] head.gau.res_scale.scale + (Unnamed Layer* 218) [Shuffle]
[TensorRT] INFO: [GpuLayer] PWN(Mul_170, Add_171)
[TensorRT] INFO: [GpuLayer] onnx::MatMul_553 + (Unnamed Layer* 222) [Shuffle]
[TensorRT] INFO: [GpuLayer] MatMul_172
[TensorRT] INFO: [GpuLayer] onnx::MatMul_554 + (Unnamed Layer* 225) [Shuffle]
[TensorRT] INFO: [GpuLayer] MatMul_173
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +226, GPU +202, now: CPU 675, GPU 4792 (MiB)
[TensorRT] WARNING: Detected invalid timing cache, setup a local cache instead
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 681, GPU 4816 (MiB)
[TensorRT] ERROR: 10: [optimizer.cpp::computeCosts::1855] Error Code 10: Internal Error (Could not find any implementation for node GlobalAveragePool_29.)
[TensorRT] ERROR: 2: [builder.cpp::buildSerializedNetwork::417] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed.)
Traceback (most recent call last):
  File "tools/onnx2tensorrt.py", line 73, in <module>
    main()
  File "tools/onnx2tensorrt.py", line 67, in main
    device_id=device_id)
  File "/home/zhanglei/Desktop/zhanglei/mmdeploy/mmdeploy/backend/tensorrt/utils.py", line 248, in from_onnx
    assert engine is not None, 'Failed to create TensorRT engine'
AssertionError: Failed to create TensorRT engine
da13132 commented 1 year ago

After flashing, update tensorrt to the latest version to solve this problem