open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.77k stars 634 forks source link

[Bug] RTMDet : Error when deploying onnx/engine model using tensorRT backend configs #2788

Open SushmaDG opened 4 months ago

SushmaDG commented 4 months ago

Checklist

Describe the bug

I am trying to deploy pytorch model to a backend tensorrt model using tools/deploy.py as below:

python mmdeploy/tools/deploy.py \
    mmdeploy\configs\mmdet\detection\detection_tensorrt_static-640x640.py \
    mmdetection\rtmdet_s_8xb32-300e_coco.py \
    mmdetection\rtmdet_s_8xb32-300e_coco_20220905_161602-387a891e.pth \
    mmdetection/demo/demo.jpg \
    --work-dir mmdeploy_model\rtmdet\trt \
    --device cuda \
    --dump-info

The checkpoint was downloaded from the link ->

wget -P checkpoint https://download.openmmlab.com/mmdetection/v3.0/rtmdet/rtmdet_s_8xb32-300e_coco/rtmdet_s_8xb32-300e_coco_20220905_161602-387a891e.pth

Error Traceback

07/04 17:43:16 - mmengine - INFO - Start pipeline mmdeploy.apis.pytorch2onnx.torch2onnx in subprocess
07/04 17:43:18 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
07/04 17:43:18 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
Loads checkpoint by local backend from path: mmdetection\rtmdet_s_8xb32-300e_coco_20220905_161602-387a891e.pth
The model and loaded state dict do not match exactly

unexpected key in source state_dict: data_preprocessor.mean, data_preprocessor.std

07/04 17:43:19 - mmengine - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future.
07/04 17:43:19 - mmengine - INFO - Export PyTorch model to ONNX: mmdeploy\mmdeploy_models\mmdet\trt\end2end.onnx.
07/04 17:43:23 - mmengine - WARNING - Can not find torch.nn.functional._scaled_dot_product_attention, function rewrite will not be applied
07/04 17:43:23 - mmengine - WARNING - Can not find mmdet.models.utils.transformer.PatchMerging.forward, function rewrite will not be applied
C:\Users\.conda\envs\openmmlab2\lib\site-packages\mmdeploy\codebase\mmdet\models\detectors\single_stage.py:80: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  img_shape = [int(val) for val in img_shape]
C:\Users\.conda\envs\openmmlab2\lib\site-packages\mmdeploy\codebase\mmdet\models\detectors\single_stage.py:80: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  img_shape = [int(val) for val in img_shape]
C:\Users\.conda\envs\openmmlab2\lib\site-packages\mmdeploy\core\optimizers\function_marker.py:161: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  ys_shape = tuple(int(s) for s in shape_)
C:\Users\.conda\envs\openmmlab2\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:3527.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
C:\Users\.conda\envs\openmmlab2\lib\site-packages\mmdeploy\mmcv\ops\nms.py:477: TracerWarning: Converting a tensor to a NumPy array might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  int(scores.numpy().shape[-1]),
C:\Users\.conda\envs\openmmlab2\lib\site-packages\mmdeploy\mmcv\ops\nms.py:148: TracerWarning: Converting a tensor to a NumPy array might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  num_boxes = num_boxes.numpy()
07/04 17:43:29 - mmengine - ERROR - C:\Users\.conda\envs\openmmlab2\lib\site-packages\mmdeploy\apis\core\pipeline_manager.py - pop_mp_output - 80 - `mmdeploy.apis.pytorch2onnx.torch2onnx` with Call id: 0 failed. exit.

Reproduction

Environment

I get the following on running `python tools/check_env.py`

07/04 17:29:10 - mmengine - INFO -

07/04 17:29:10 - mmengine - INFO - **********Environmental information**********
07/04 17:29:33 - mmengine - INFO - sys.platform: win32
07/04 17:29:33 - mmengine - INFO - Python: 3.9.19 (main, May  6 2024, 20:12:36) [MSC v.1916 64 bit (AMD64)]
07/04 17:29:33 - mmengine - INFO - CUDA available: True
07/04 17:29:33 - mmengine - INFO - MUSA available: False
07/04 17:29:33 - mmengine - INFO - numpy_random_seed: 2147483648
07/04 17:29:33 - mmengine - INFO - GPU 0: NVIDIA GeForce RTX 3050 Ti Laptop GPU
07/04 17:29:33 - mmengine - INFO - CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8
07/04 17:29:33 - mmengine - INFO - NVCC: Cuda compilation tools, release 11.8, V11.8.89
07/04 17:29:33 - mmengine - INFO - MSVC: Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30147 for x64
07/04 17:29:33 - mmengine - INFO - GCC: n/a
07/04 17:29:33 - mmengine - INFO - PyTorch: 2.1.2+cu118
07/04 17:29:33 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - C++ Version: 199711
  - MSVC 192930151
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
  - OpenMP 2019
  - LAPACK is enabled (usually provided by MKL)
  - CPU capability usage: AVX512
  - CUDA Runtime 11.8
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.7
  - Magma 2.5.4
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/builder/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /bigobj /FS -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /utf-8 /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=OFF, TORCH_VERSION=2.1.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF,

07/04 17:29:33 - mmengine - INFO - TorchVision: 0.16.2+cu118
07/04 17:29:33 - mmengine - INFO - OpenCV: 4.10.0
07/04 17:29:33 - mmengine - INFO - MMEngine: 0.10.4
07/04 17:29:33 - mmengine - INFO - MMCV: 2.1.0
07/04 17:29:33 - mmengine - INFO - MMCV Compiler: MSVC 192930148
07/04 17:29:33 - mmengine - INFO - MMCV CUDA Compiler: 11.8
07/04 17:29:33 - mmengine - INFO - MMDeploy: 1.3.1+5a3be94
07/04 17:29:33 - mmengine - INFO -

07/04 17:29:33 - mmengine - INFO - **********Backend information**********
07/04 17:29:34 - mmengine - INFO - tensorrt:    8.6.1
07/04 17:29:34 - mmengine - INFO - tensorrt custom ops: Available
07/04 17:29:36 - mmengine - INFO - ONNXRuntime: 1.8.1
07/04 17:29:36 - mmengine - INFO - ONNXRuntime-gpu:     1.18.1
07/04 17:29:36 - mmengine - INFO - ONNXRuntime custom ops:      Available
07/04 17:29:36 - mmengine - INFO - pplnn:       None
07/04 17:29:36 - mmengine - INFO - ncnn:        None
07/04 17:29:36 - mmengine - INFO - snpe:        None
07/04 17:29:36 - mmengine - INFO - openvino:    None
07/04 17:29:36 - mmengine - INFO - torchscript: 2.1.2+cu118
07/04 17:29:36 - mmengine - INFO - torchscript custom ops:      NotAvailable
07/04 17:29:36 - mmengine - INFO - rknn-toolkit:        None
07/04 17:29:36 - mmengine - INFO - rknn-toolkit2:       None
07/04 17:29:36 - mmengine - INFO - ascend:      None
07/04 17:29:36 - mmengine - INFO - coreml:      None
07/04 17:29:36 - mmengine - INFO - tvm: None
07/04 17:29:36 - mmengine - INFO - vacc:        None
07/04 17:29:36 - mmengine - INFO -

07/04 17:29:36 - mmengine - INFO - **********Codebase information**********
07/04 17:29:36 - mmengine - INFO - mmdet:       3.3.0
07/04 17:29:36 - mmengine - INFO - mmseg:       None
07/04 17:29:36 - mmengine - INFO - mmpretrain:  None
07/04 17:29:36 - mmengine - INFO - mmocr:       None
07/04 17:29:36 - mmengine - INFO - mmagic:      None
07/04 17:29:36 - mmengine - INFO - mmdet3d:     None
07/04 17:29:36 - mmengine - INFO - mmpose:      None
07/04 17:29:36 - mmengine - INFO - mmrotate:    None
07/04 17:29:36 - mmengine - INFO - mmaction:    None
07/04 17:29:36 - mmengine - INFO - mmrazor:     None
07/04 17:29:36 - mmengine - INFO - mmyolo:      None

Additional information:

Any help regarding this bug is appreciated. Thank you

ljl02521 commented 4 months ago

detection_tensorrt_static-640x640.py换成dynamic320x320-1344x1344的试试

SushmaDG commented 4 months ago

detection_tensorrt_static-640x640.py换成dynamic320x320-1344x1344的试试

I get the same error:

mmengine - ERROR - C:\Users\.conda\envs\openmmlab2\lib\site-packages\mmdeploy\apis\core\pipeline_manager.py 
- pop_mp_output - 80 - `mmdeploy.apis.pytorch2onnx.torch2onnx` with Call id: 0 failed. exit.
ljl02521 commented 4 months ago

detection_tensorrt_static-640x640.py换成dynamic320x320-1344x1344的试试

I get the same error:

mmengine - ERROR - C:\Users\.conda\envs\openmmlab2\lib\site-packages\mmdeploy\apis\core\pipeline_manager.py 
- pop_mp_output - 80 - `mmdeploy.apis.pytorch2onnx.torch2onnx` with Call id: 0 failed. exit.

I was answering your question.