open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.69k stars 617 forks source link

[Bug] 关于RTMpose转换为onnx中出现的问题 #2519

Closed Billccx closed 9 months ago

Billccx commented 10 months ago

Checklist

Describe the bug

您好,我尝试将rtmpose模型转换为onnx格式时,出现了visualize onnxruntime model failed.导致转换停止。

Reproduction

您好,我尝试运行以下命令尝试将rtmpose模型转换为onnx格式:

python tools/deploy.py \
    configs/mmpose/pose-detection_simcc_onnxruntime_dynamic.py \
    /home/ccx/code/trt/mm/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-l_8xb256-420e_aic-coco-256x192.py \
    https://download.openmmlab.com/mmpose/v1/projects/rtmposev1/rtmpose-l_simcc-aic-coco_pt-aic-coco_420e-256x192-f016ffe0_20230126.pth \
    demo/resources/human-pose.jpg \
    --work-dir rtmpose-onnx/rtmpose-l \
    --device cuda \
    --log-level INFO \
    --show \
    --dump-info

出现了以下报错:

10/27 16:05:06 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
10/27 16:05:06 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "mmpose_tasks" registry tree. As a workaround, the current "mmpose_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
10/27 16:05:07 - mmengine - INFO - Start pipeline mmdeploy.apis.pytorch2onnx.torch2onnx in subprocess
10/27 16:05:09 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
10/27 16:05:09 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "mmpose_tasks" registry tree. As a workaround, the current "mmpose_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
Loads checkpoint by http backend from path: https://download.openmmlab.com/mmpose/v1/projects/rtmposev1/rtmpose-l_simcc-aic-coco_pt-aic-coco_420e-256x192-f016ffe0_20230126.pth
Downloading: "https://download.openmmlab.com/mmpose/v1/projects/rtmposev1/rtmpose-l_simcc-aic-coco_pt-aic-coco_420e-256x192-f016ffe0_20230126.pth" to /home/ccx/.cache/torch/hub/checkpoints/rtmpose-l_simcc-aic-coco_pt-aic-coco_420e-256x192-f016ffe0_20230126.pth
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 106M/106M [00:09<00:00, 11.6MB/s]
/home/ccx/code/trt/mm/mmpose/mmpose/datasets/datasets/utils.py:102: UserWarning: The metainfo config file "configs/_base_/datasets/coco.py" does not exist. A matched config file "/home/ccx/code/trt/mm/mmpose/mmpose/.mim/configs/_base_/datasets/coco.py" will be used instead.
  warnings.warn(
10/27 16:05:21 - mmengine - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future. 
10/27 16:05:21 - mmengine - INFO - Export PyTorch model to ONNX: rtmpose-onnx/rtmpose-l/end2end.onnx.
10/27 16:05:21 - mmengine - WARNING - Can not find torch.nn.functional.scaled_dot_product_attention, function rewrite will not be applied
10/27 16:05:21 - mmengine - WARNING - Can not find torch._C._jit_pass_onnx_autograd_function_process, function rewrite will not be applied
10/27 16:05:21 - mmengine - WARNING - Can not find models.yolox_pose_head.YOLOXPoseHead.predict, function rewrite will not be applied
10/27 16:05:21 - mmengine - WARNING - Can not find models.yolox_pose_head.YOLOXPoseHead.predict_by_feat, function rewrite will not be applied
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
10/27 16:05:26 - mmengine - INFO - Execute onnx optimize passes.
10/27 16:05:26 - mmengine - WARNING - Can not optimize model, please build torchscipt extension.
More details: https://github.com/open-mmlab/mmdeploy/tree/main/docs/en/experimental/onnx_optimizer.md
10/27 16:05:27 - mmengine - INFO - Finish pipeline mmdeploy.apis.pytorch2onnx.torch2onnx
10/27 16:05:27 - mmengine - INFO - Start pipeline mmdeploy.apis.utils.utils.to_backend in main process
10/27 16:05:28 - mmengine - INFO - Finish pipeline mmdeploy.apis.utils.utils.to_backend
10/27 16:05:28 - mmengine - INFO - visualize onnxruntime model start.
10/27 16:05:31 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
10/27 16:05:31 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "mmpose_tasks" registry tree. As a workaround, the current "mmpose_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
10/27 16:05:31 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "backend_segmentors" registry tree. As a workaround, the current "backend_segmentors" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
10/27 16:05:31 - mmengine - INFO - Successfully loaded onnxruntime custom ops from /home/ccx/anaconda3/envs/tensorrt/lib/python3.8/site-packages/mmdeploy/lib/libmmdeploy_onnxruntime_ops.so
2023-10-27 16:05:32.469626319 [W:onnxruntime:, session_state.cc:1162 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-10-27 16:05:32.469650607 [W:onnxruntime:, session_state.cc:1164 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
/home/ccx/code/trt/mm/mmpose/mmpose/datasets/datasets/utils.py:102: UserWarning: The metainfo config file "configs/_base_/datasets/coco.py" does not exist. A matched config file "/home/ccx/code/trt/mm/mmpose/mmpose/.mim/configs/_base_/datasets/coco.py" will be used instead.
  warnings.warn(
2023-10-27 16:05:34.601097874 [E:onnxruntime:Default, cuda_call.cc:116 CudaCall] CUBLAS failure 7: CUBLAS_STATUS_INVALID_VALUE ; GPU=0 ; hostname=3d6eaf0e0c1c ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/math/matmul.cc ; line=154 ; expr=cublasGemmHelper( GetCublasHandle(ctx), transB, transA, static_cast<int>(helper.N()), static_cast<int>(helper.M()), static_cast<int>(helper.K()), &alpha, reinterpret_cast<const CudaT*>(right_X->Data<T>()), ldb, reinterpret_cast<const CudaT*>(left_X->Data<T>()), lda, &zero, reinterpret_cast<CudaT*>(Y->MutableData<T>()), ldc, device_prop); 
2023-10-27 16:05:34.601152628 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running MatMul node. Name:'MatMul_282' Status Message: CUBLAS failure 7: CUBLAS_STATUS_INVALID_VALUE ; GPU=0 ; hostname=3d6eaf0e0c1c ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/math/matmul.cc ; line=154 ; expr=cublasGemmHelper( GetCublasHandle(ctx), transB, transA, static_cast<int>(helper.N()), static_cast<int>(helper.M()), static_cast<int>(helper.K()), &alpha, reinterpret_cast<const CudaT*>(right_X->Data<T>()), ldb, reinterpret_cast<const CudaT*>(left_X->Data<T>()), lda, &zero, reinterpret_cast<CudaT*>(Y->MutableData<T>()), ldc, device_prop); 
2023-10-27:16:05:34 - root - ERROR - Error in execution: Non-zero status code returned while running MatMul node. Name:'MatMul_282' Status Message: CUBLAS failure 7: CUBLAS_STATUS_INVALID_VALUE ; GPU=0 ; hostname=3d6eaf0e0c1c ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/math/matmul.cc ; line=154 ; expr=cublasGemmHelper( GetCublasHandle(ctx), transB, transA, static_cast<int>(helper.N()), static_cast<int>(helper.M()), static_cast<int>(helper.K()), &alpha, reinterpret_cast<const CudaT*>(right_X->Data<T>()), ldb, reinterpret_cast<const CudaT*>(left_X->Data<T>()), lda, &zero, reinterpret_cast<CudaT*>(Y->MutableData<T>()), ldc, device_prop); 
Traceback (most recent call last):
  File "/home/ccx/anaconda3/envs/tensorrt/lib/python3.8/site-packages/mmdeploy/utils/utils.py", line 41, in target_wrapper
    result = target(*args, **kwargs)
  File "/home/ccx/anaconda3/envs/tensorrt/lib/python3.8/site-packages/mmdeploy/apis/visualize.py", line 72, in visualize_model
    result = model.test_step(model_inputs)[0]
  File "/home/ccx/anaconda3/envs/tensorrt/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 145, in test_step
    return self._run_forward(data, mode='predict')  # type: ignore
  File "/home/ccx/anaconda3/envs/tensorrt/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 346, in _run_forward
    results = self(**data, mode=mode)
  File "/home/ccx/anaconda3/envs/tensorrt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ccx/anaconda3/envs/tensorrt/lib/python3.8/site-packages/mmdeploy/codebase/mmpose/deploy/pose_detection_model.py", line 99, in forward
    batch_outputs = self.wrapper({self.input_name: inputs})
  File "/home/ccx/anaconda3/envs/tensorrt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ccx/anaconda3/envs/tensorrt/lib/python3.8/site-packages/mmdeploy/backend/onnxruntime/wrapper.py", line 108, in forward
    self.__ort_execute(self.io_binding)
  File "/home/ccx/anaconda3/envs/tensorrt/lib/python3.8/site-packages/mmdeploy/utils/timer.py", line 67, in fun
    result = func(*args, **kwargs)
  File "/home/ccx/anaconda3/envs/tensorrt/lib/python3.8/site-packages/mmdeploy/backend/onnxruntime/wrapper.py", line 126, in __ort_execute
    self.sess.run_with_iobinding(io_binding)
  File "/home/ccx/anaconda3/envs/tensorrt/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 331, in run_with_iobinding
    self._sess.run_with_iobinding(iobinding._iobinding, run_options)
RuntimeError: Error in execution: Non-zero status code returned while running MatMul node. Name:'MatMul_282' Status Message: CUBLAS failure 7: CUBLAS_STATUS_INVALID_VALUE ; GPU=0 ; hostname=3d6eaf0e0c1c ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/math/matmul.cc ; line=154 ; expr=cublasGemmHelper( GetCublasHandle(ctx), transB, transA, static_cast<int>(helper.N()), static_cast<int>(helper.M()), static_cast<int>(helper.K()), &alpha, reinterpret_cast<const CudaT*>(right_X->Data<T>()), ldb, reinterpret_cast<const CudaT*>(left_X->Data<T>()), lda, &zero, reinterpret_cast<CudaT*>(Y->MutableData<T>()), ldc, device_prop); 
10/27 16:05:35 - mmengine - ERROR - tools/deploy.py - create_process - 82 - visualize onnxruntime model failed.

请问这是什么原因造成的呢?希望得到您团队的回复,谢谢!

Environment

10/27 16:14:36 - mmengine - INFO - 

10/27 16:14:36 - mmengine - INFO - **********Environmental information**********
10/27 16:14:37 - mmengine - INFO - sys.platform: linux
10/27 16:14:37 - mmengine - INFO - Python: 3.8.17 | packaged by conda-forge | (default, Jun 16 2023, 07:06:00) [GCC 11.4.0]
10/27 16:14:37 - mmengine - INFO - CUDA available: True
10/27 16:14:37 - mmengine - INFO - numpy_random_seed: 2147483648
10/27 16:14:37 - mmengine - INFO - GPU 0: NVIDIA TITAN Xp
10/27 16:14:37 - mmengine - INFO - CUDA_HOME: /usr/local/cuda-11.3
10/27 16:14:37 - mmengine - INFO - NVCC: Cuda compilation tools, release 11.3, V11.3.58
10/27 16:14:37 - mmengine - INFO - GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
10/27 16:14:37 - mmengine - INFO - PyTorch: 1.12.0
10/27 16:14:37 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.3.2  (built against CUDA 11.5)
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

10/27 16:14:37 - mmengine - INFO - TorchVision: 0.13.0
10/27 16:14:37 - mmengine - INFO - OpenCV: 3.4.16
10/27 16:14:37 - mmengine - INFO - MMEngine: 0.9.0
10/27 16:14:37 - mmengine - INFO - MMCV: 2.1.0
10/27 16:14:37 - mmengine - INFO - MMCV Compiler: GCC 9.3
10/27 16:14:37 - mmengine - INFO - MMCV CUDA Compiler: 11.3
10/27 16:14:37 - mmengine - INFO - MMDeploy: 1.3.0+e31f5a6
10/27 16:14:37 - mmengine - INFO - 

10/27 16:14:37 - mmengine - INFO - **********Backend information**********
10/27 16:14:37 - mmengine - INFO - tensorrt:    8.6.1
10/27 16:14:37 - mmengine - INFO - tensorrt custom ops: Available
10/27 16:14:37 - mmengine - INFO - ONNXRuntime: None
10/27 16:14:37 - mmengine - INFO - ONNXRuntime-gpu:     1.16.1
10/27 16:14:37 - mmengine - INFO - ONNXRuntime custom ops:      Available
10/27 16:14:37 - mmengine - INFO - pplnn:       None
10/27 16:14:37 - mmengine - INFO - ncnn:        None
10/27 16:14:37 - mmengine - INFO - snpe:        None
10/27 16:14:37 - mmengine - INFO - openvino:    None
10/27 16:14:37 - mmengine - INFO - torchscript: 1.12.0
10/27 16:14:37 - mmengine - INFO - torchscript custom ops:      NotAvailable
10/27 16:14:37 - mmengine - INFO - rknn-toolkit:        None
10/27 16:14:37 - mmengine - INFO - rknn-toolkit2:       None
10/27 16:14:37 - mmengine - INFO - ascend:      None
10/27 16:14:37 - mmengine - INFO - coreml:      None
10/27 16:14:37 - mmengine - INFO - tvm: None
10/27 16:14:37 - mmengine - INFO - vacc:        None
10/27 16:14:37 - mmengine - INFO - 

10/27 16:14:37 - mmengine - INFO - **********Codebase information**********
10/27 16:14:37 - mmengine - INFO - mmdet:       3.2.0
10/27 16:14:37 - mmengine - INFO - mmseg:       None
10/27 16:14:37 - mmengine - INFO - mmpretrain:  None
10/27 16:14:37 - mmengine - INFO - mmocr:       None
10/27 16:14:37 - mmengine - INFO - mmagic:      None
10/27 16:14:37 - mmengine - INFO - mmdet3d:     None
10/27 16:14:37 - mmengine - INFO - mmpose:      1.2.0
10/27 16:14:37 - mmengine - INFO - mmrotate:    None
10/27 16:14:37 - mmengine - INFO - mmaction:    None
10/27 16:14:37 - mmengine - INFO - mmrazor:     None
10/27 16:14:37 - mmengine - INFO - mmyolo:      None

Error traceback

No response

RunningLeon commented 10 months ago

hi, could you try with official docker image? It works fine with the official model config and ckpt. You may have to check what you've changed.

python3 ./tools/deploy.py configs/mmpose/pose-detection_simcc_onnxruntime_dynamic.py \
../mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-l_8xb256-420e_aic-coco-256x192.py \
https://download.openmmlab.com/mmpose/v1/projects/rtmposev1/rtmpose-l_simcc-aic-coco_pt-aic-coco_420e-256x192-f016ffe0_20230126.pth \
"../mmpose/tests/data/coco/000000000785.jpg" \
--work-dir ../test_rtmpose/ \
--device cuda:0 \
--log-level INFO \
--test-img ./demo/resources/human-pose.jpg
github-actions[bot] commented 10 months ago

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

github-actions[bot] commented 9 months ago

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.

1334233852 commented 1 month ago

请问你转换的时候有没有遇到这个问题? WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. 2024-07-17:20:49:10 - root - ERROR - 'InstanceData' object has no attribute 'bbox_centers'