open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.73k stars 627 forks source link

[Bug] YOLOX exported model doens't predict well #2201

Closed smrlehdgus closed 1 year ago

smrlehdgus commented 1 year ago

Checklist

Describe the bug

I trained a yolox_m model with custom dataset. The pytorch model predicts well but exported models(onnx, tensorRT engine) don't.

Prediction of Pytorch model frame_0146_torch

Prediction of ONNX model frame_0146_onnx

The ONNX model predicts even non-existent objects. I tried adjusting the Threshold values but they didn't work. (Because sometimes the Confidence value of the false prediction was over 0.6.)

Reproduction

  1. TRAIN
    
    python tools/train.py \
    configs/yolox/yolox_m_fast_8xb8-300e-rtmdet-hyp_custom.py
2. TEST

python tools/test.py \ configs/yolox/yolox_m_fast_8xb8-300e-rtmdet-hyp_custom.py \ work_dirs/yolox_m_fast_8xb8-300e-rtmdet-hyp_custom/best_coco_bbox_mAP_epoch_50.pth \ --show-dir show_results

3. EXPORT ONNX ENGINE

python export.py \ configs/yolox/yolox_m_fast_8xb8-300e-rtmdet-hyp_custom.py \ work_dirs/yolox_m_fast_8xb8-300e-rtmdet-hyp_custom/best_coco_bbox_mAP_epoch_50.pth \ --work-dir work_dirs/yolox_m_fast_8xb8-300e-rtmdet-hyp_custom/onnx \ --img-size 640 640 \ --batch 1 \ --device cuda:0 \ --opset 11 \ --backend 1 \ --pre-topk 1000 \ --keep-topk 100 \ --iou-threshold 0.5 \ --score-threshold 0.3 \ --simplify

4. TEST ONNX ENGINE

python image-demo.py \ /home/data/ObjectDetection/LGUP/20221017/images/train/ \ configs/yolox/yolox_m_fast_8xb8-300e-rtmdet-hyp_custom.py \ work_dirs/yolox_m_fast_8xb8-300e-rtmdet-hyp_custom/onnx/end2end.onnx \ --device cuda:0 \ --out-dir work_dirs/yolox_m_fast_8xb8-300e-rtmdet-hyp_custom/outputs/onnx/


### Environment

```Shell
06/20 16:10:05 - mmengine - INFO - 

06/20 16:10:05 - mmengine - INFO - **********Environmental information**********
06/20 16:10:05 - mmengine - INFO - sys.platform: linux
06/20 16:10:05 - mmengine - INFO - Python: 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0]
06/20 16:10:05 - mmengine - INFO - CUDA available: True
06/20 16:10:05 - mmengine - INFO - numpy_random_seed: 2147483648
06/20 16:10:05 - mmengine - INFO - GPU 0: NVIDIA GeForce RTX 4080
06/20 16:10:05 - mmengine - INFO - CUDA_HOME: /usr/local/cuda
06/20 16:10:05 - mmengine - INFO - NVCC: Cuda compilation tools, release 12.0, V12.0.140
06/20 16:10:05 - mmengine - INFO - GCC: x86_64-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
06/20 16:10:05 - mmengine - INFO - PyTorch: 1.14.0a0+44dac51
06/20 16:10:05 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 9.4
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.1-Product Build 20201104 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.7.0 (Git Hash N/A)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: NO AVX
  - CUDA Runtime 12.0
  - NVCC architecture flags: -gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_90,code=compute_90
  - CuDNN 8.7  (built against CUDA 11.8)
  - Magma 2.6.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.0, CUDNN_VERSION=8.7.0, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS=-fno-gnu-unique -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=1.14.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

06/20 16:10:05 - mmengine - INFO - TorchVision: 0.15.0a0
06/20 16:10:05 - mmengine - INFO - OpenCV: 4.6.0
06/20 16:10:05 - mmengine - INFO - MMEngine: 0.7.4
06/20 16:10:05 - mmengine - INFO - MMCV: 2.0.1
06/20 16:10:05 - mmengine - INFO - MMCV Compiler: GCC 9.4
06/20 16:10:05 - mmengine - INFO - MMCV CUDA Compiler: 12.0
06/20 16:10:05 - mmengine - INFO - MMDeploy: 1.1.0+90f17f2
06/20 16:10:05 - mmengine - INFO - 

06/20 16:10:05 - mmengine - INFO - **********Backend information**********
06/20 16:10:05 - mmengine - INFO - tensorrt:    8.5.3.1
06/20 16:10:05 - mmengine - INFO - tensorrt custom ops: NotAvailable
06/20 16:10:05 - mmengine - INFO - ONNXRuntime: 1.15.1
06/20 16:10:05 - mmengine - INFO - ONNXRuntime-gpu:     1.15.1
06/20 16:10:05 - mmengine - INFO - ONNXRuntime custom ops:      NotAvailable
06/20 16:10:05 - mmengine - INFO - pplnn:       None
06/20 16:10:05 - mmengine - INFO - ncnn:        None
06/20 16:10:05 - mmengine - INFO - snpe:        None
06/20 16:10:05 - mmengine - INFO - openvino:    None
06/20 16:10:05 - mmengine - INFO - torchscript: 1.14.0a0+44dac51
06/20 16:10:05 - mmengine - INFO - torchscript custom ops:      NotAvailable
06/20 16:10:05 - mmengine - INFO - rknn-toolkit:        None
06/20 16:10:05 - mmengine - INFO - rknn-toolkit2:       None
06/20 16:10:05 - mmengine - INFO - ascend:      None
06/20 16:10:05 - mmengine - INFO - coreml:      None
06/20 16:10:05 - mmengine - INFO - tvm: None
06/20 16:10:05 - mmengine - INFO - vacc:        None
06/20 16:10:05 - mmengine - INFO - 

06/20 16:10:05 - mmengine - INFO - **********Codebase information**********
06/20 16:10:05 - mmengine - INFO - mmdet:       3.0.0
06/20 16:10:05 - mmengine - INFO - mmseg:       None
06/20 16:10:05 - mmengine - INFO - mmpretrain:  None
06/20 16:10:05 - mmengine - INFO - mmocr:       1.0.0
06/20 16:10:05 - mmengine - INFO - mmagic:      None
06/20 16:10:05 - mmengine - INFO - mmdet3d:     None
06/20 16:10:05 - mmengine - INFO - mmpose:      None
06/20 16:10:05 - mmengine - INFO - mmrotate:    None
06/20 16:10:05 - mmengine - INFO - mmaction:    None
06/20 16:10:05 - mmengine - INFO - mmrazor:     None

Error traceback

Also when I changed backends to 2 or 3, Error occured

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from work_dirs/yolox_m_fast_8xb8-300e-rtmdet-hyp_custom/onnx/end2end.onnx failed:Fatal error: TRT:EfficientNMS_TRT(-1) is not a registered function/op

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from work_dirs/yolox_m_fast_8xb8-300e-rtmdet-hyp_custom/onnx/end2end.onnx failed:Fatal error: TRT:BatchedNMSDynamic_TRT(-1) is not a registered function/op
RunningLeon commented 1 year ago

@smrlehdgus hi

  1. if you are using easydeploy, please refer to this doc. If you have any question, create the issue in mmyolo
  2. if you want to use mmdeploy, pls. refer to this doc. BTW, could try with official yolox trained on coco at first.
github-actions[bot] commented 1 year ago

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

github-actions[bot] commented 1 year ago

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.