open-mmlab / mmrotate

OpenMMLab Rotated Object Detection Toolbox and Benchmark
https://mmrotate.readthedocs.io/en/latest/
Apache License 2.0
1.84k stars 542 forks source link

[Bug] Error conversion FCOS to TRT with mmdeploy #736

Open stopmosk opened 1 year ago

stopmosk commented 1 year ago

Prerequisite

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

master branch https://github.com/open-mmlab/mmrotate

Environment

I use official mmdeploy docker image, but updated TensorRT to 8.5 (have this issue in official version with TRT 8.4 too).

root@da34cc94b7e7:/mmr# python mmrotate/mmrotate/utils/collect_env.py
/opt/conda/lib/python3.8/site-packages/mmcv/__init__.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
  warnings.warn(
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
sys.platform: linux
Python: 3.8.16 (default, Jan 17 2023, 23:13:24) [GCC 11.2.0]
CUDA available: True
GPU 0: NVIDIA GeForce RTX 3080
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.8, V11.8.89
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
PyTorch: 1.10.0+cu111
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.11.0+cu111
OpenCV: 4.7.0
MMCV: 1.7.0
MMCV Compiler: GCC 9.3
MMCV CUDA Compiler: 11.3
MMRotate: 0.3.4+
root@da34cc94b7e7:/mmr#

mmdeploy env info:

root@da34cc94b7e7:/mmr# python  mmdeploy/tools/check_env.py
/opt/conda/lib/python3.8/site-packages/mmcv/__init__.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
  warnings.warn(
2023-02-20 12:46:07,311 - mmdeploy - INFO -

2023-02-20 12:46:07,311 - mmdeploy - INFO - **********Environmental information**********
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
2023-02-20 12:46:07,454 - mmdeploy - INFO - sys.platform: linux
2023-02-20 12:46:07,455 - mmdeploy - INFO - Python: 3.8.16 (default, Jan 17 2023, 23:13:24) [GCC 11.2.0]
2023-02-20 12:46:07,455 - mmdeploy - INFO - CUDA available: True
2023-02-20 12:46:07,455 - mmdeploy - INFO - GPU 0: NVIDIA GeForce RTX 3080
2023-02-20 12:46:07,455 - mmdeploy - INFO - CUDA_HOME: /usr/local/cuda
2023-02-20 12:46:07,455 - mmdeploy - INFO - NVCC: Cuda compilation tools, release 11.8, V11.8.89
2023-02-20 12:46:07,455 - mmdeploy - INFO - GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
2023-02-20 12:46:07,455 - mmdeploy - INFO - PyTorch: 1.10.0+cu111
2023-02-20 12:46:07,455 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

2023-02-20 12:46:07,455 - mmdeploy - INFO - TorchVision: 0.11.0+cu111
2023-02-20 12:46:07,455 - mmdeploy - INFO - OpenCV: 4.7.0
2023-02-20 12:46:07,455 - mmdeploy - INFO - MMCV: 1.7.0
2023-02-20 12:46:07,455 - mmdeploy - INFO - MMCV Compiler: GCC 9.3
2023-02-20 12:46:07,455 - mmdeploy - INFO - MMCV CUDA Compiler: 11.3
2023-02-20 12:46:07,455 - mmdeploy - INFO - MMDeploy: 0.13.0+
2023-02-20 12:46:07,455 - mmdeploy - INFO -

2023-02-20 12:46:07,455 - mmdeploy - INFO - **********Backend information**********
2023-02-20 12:46:07,490 - mmdeploy - INFO - tensorrt:   8.5.1.7
2023-02-20 12:46:07,490 - mmdeploy - INFO - tensorrt custom ops:        Available
2023-02-20 12:46:07,514 - mmdeploy - INFO - ONNXRuntime:        None
2023-02-20 12:46:07,514 - mmdeploy - INFO - ONNXRuntime-gpu:    1.8.1
2023-02-20 12:46:07,514 - mmdeploy - INFO - ONNXRuntime custom ops:     Available
2023-02-20 12:46:07,515 - mmdeploy - INFO - pplnn:      None
2023-02-20 12:46:07,519 - mmdeploy - INFO - ncnn:       None
2023-02-20 12:46:07,521 - mmdeploy - INFO - snpe:       None
2023-02-20 12:46:07,522 - mmdeploy - INFO - openvino:   None
2023-02-20 12:46:07,523 - mmdeploy - INFO - torchscript:        1.10.0+cu111
2023-02-20 12:46:07,524 - mmdeploy - INFO - torchscript custom ops:     NotAvailable
2023-02-20 12:46:07,545 - mmdeploy - INFO - rknn-toolkit:       None
2023-02-20 12:46:07,545 - mmdeploy - INFO - rknn2-toolkit:      None
2023-02-20 12:46:07,546 - mmdeploy - INFO - ascend:     None
2023-02-20 12:46:07,547 - mmdeploy - INFO - coreml:     None
2023-02-20 12:46:07,548 - mmdeploy - INFO - tvm:        None
2023-02-20 12:46:07,548 - mmdeploy - INFO -

2023-02-20 12:46:07,548 - mmdeploy - INFO - **********Codebase information**********
2023-02-20 12:46:08,301 - mmdeploy - INFO - mmdet:      2.28.1
2023-02-20 12:46:08,302 - mmdeploy - INFO - mmseg:      None
2023-02-20 12:46:08,302 - mmdeploy - INFO - mmcls:      None
2023-02-20 12:46:08,302 - mmdeploy - INFO - mmocr:      None
2023-02-20 12:46:08,302 - mmdeploy - INFO - mmedit:     None
2023-02-20 12:46:08,302 - mmdeploy - INFO - mmdet3d:    None
2023-02-20 12:46:08,302 - mmdeploy - INFO - mmpose:     None
2023-02-20 12:46:08,302 - mmdeploy - INFO - mmrotate:   0.3.4
2023-02-20 12:46:08,302 - mmdeploy - INFO - mmaction:   None
root@da34cc94b7e7:/mmr#

Reproduces the problem - code sample

python mmdeploy/tools/deploy.py \
mmdeploy/configs/mmrotate/rotated-detection_tensorrt_dynamic-320x320-1024x1024.py \
mmrotate/rotated_fcos_r50_fpn_1x_dota_le90.py \
mmrotate/rotated_fcos_r50_fpn_1x_dota_le90-d87568ed.pth \
mmrotate/demo/demo.jpg \
--work-dir mmdeploy_model/rotated_fcos_base \
--device cuda \
--dump-info

Reproduces the problem - command or script

python mmdeploy/tools/deploy.py \
mmdeploy/configs/mmrotate/rotated-detection_tensorrt_dynamic-320x320-1024x1024.py \
mmrotate/rotated_fcos_r50_fpn_1x_dota_le90.py \
mmrotate/rotated_fcos_r50_fpn_1x_dota_le90-d87568ed.pth \
mmrotate/demo/demo.jpg \
--work-dir mmdeploy_model/rotated_fcos_base \
--device cuda \
--dump-info

Reproduces the problem - error message

[02/20/2023-12:50:24] [TRT] [E] parsers/onnx/ModelImporter.cpp:726: While parsing node number 965 [TopK -> "1923"]:
[02/20/2023-12:50:24] [TRT] [E] parsers/onnx/ModelImporter.cpp:727: --- Begin node ---
[02/20/2023-12:50:24] [TRT] [E] parsers/onnx/ModelImporter.cpp:728: input: "1918"
input: "1922"
output: "1923"
output: "1924"
name: "TopK_965"
op_type: "TopK"
attribute {
  name: "axis"
  i: 0
  type: INT
}
attribute {
  name: "largest"
  i: 1
  type: INT
}

[02/20/2023-12:50:24] [TRT] [E] parsers/onnx/ModelImporter.cpp:729: --- End node ---
[02/20/2023-12:50:24] [TRT] [E] parsers/onnx/ModelImporter.cpp:731: ERROR: parsers/onnx/ModelImporter.cpp:168 In function parseGraph:
[6] Invalid Node - TopK_965
This version of TensorRT only supports input K as an initializer. Try applying constant folding on the model using Polygraphy: https://github.com/NVIDIA/TensorRT/tree/master/tools/Polygraphy/examples/cli/surgeon/02_folding_constants

Additional information

I also tried:

  1. tensorrt static configuration with fixed image shapes
  2. TensorRT 8.4 & 8.5
  3. opset 9, 10, 11

But no success.

zytx121 commented 1 year ago

Hi @stopmosk At present, we do not support converting the FCOS model to ONNX. Maybe you should try to rewrite the TopK op for rotated boxes.