[Bug] mmdeploy部署时会导致部分检测结果缺失

kydyahah commented 8 months ago

Checklist

[X] I have searched related issues but cannot get the expected help.
[X] 2. I have read the FAQ documentation but cannot get the expected help.
[X] 3. The bug has not been fixed in the latest version.

Describe the bug

部分目标使用pth推理可以检测到，当导出onnx含有动态轴时，会导致这些检测结果缺失。目标检测或者实例分割都存在这个问题，且无论是使用onnxruntime或者TensorRT后端都存在这个问题。

Reproduction

使用pth推理代码如下：

import os
from mmdet.apis import init_detector, inference_detector
from tqdm import tqdm
from mmdet.registry import VISUALIZERS
import mmcv

config = r'C:\Users\AI\Desktop\split2\pth_model\best.py'
checkpoint = r'C:\Users\AI\Desktop\split2\pth_model\best.pth'
input_path = r'C:\Users\AI\Desktop\split2\imgs\val'
output_path=r"C:\Users\AI\Desktop\split2\pth_out\val"

model = init_detector(config, checkpoint, device='cuda:0')
visualizer = VISUALIZERS.build(model.cfg.visualizer)
visualizer.dataset_meta = model.dataset_meta

images=os.listdir(input_path)
bar=tqdm(enumerate(images),total=len(images))
for i,image in bar:
    img = mmcv.imread(os.path.join(input_path, image), channel_order='bgr')
    result = inference_detector(model, img)
    visualizer.add_datasample(
        'result',
        img[:,:,::-1],
        data_sample=result,
        draw_gt=False,
        wait_time=0,
        out_file=f"{output_path}/{os.path.basename(image)}",
        pred_score_thr=0.5
    )

使用mmdeploy推理代码如下：

import os
os.environ["path"] = r"C:\Users\AI\build_mmdeploy\deps\cudnn-windows-x86_64-8.9.3.28_cuda11-archive\bin;" + os.environ["path"]
os.environ["path"] = r"C:\Users\AI\build_mmdeploy\deps\TensorRT-8.6.1.6.Windows10.x86_64.cuda-11.8\TensorRT-8.6.1.6\lib;" + os.environ["path"]

from mmdeploy.apis.utils import build_task_processor
from mmdeploy.utils import get_input_shape, load_config
import torch

# device = 'cpu'
# image_dir = r'C:\Users\AI\Desktop\split2\imgs\val'
# backend_model = [r'C:\Users\AI\Desktop\split2\onnxruntime_model\best.onnx']
# deploy_cfg = r'C:\Users\AI\Desktop\split2\onnxruntime_model\deploy_cfg.py'
# model_cfg = r'C:\Users\AI\Desktop\split2\onnxruntime_model\best.py'
# output_dir = r"C:\Users\AI\Desktop\split2\mmdeploy_python_api_out_onnxruntime\val"

device = "cuda"
image_dir = r'C:\Users\AI\Desktop\split2\imgs\val'
backend_model = [r'C:\Users\AI\Desktop\split2\tensorrt_fp32_model\best.engine']
deploy_cfg = r'C:\Users\AI\Desktop\split2\tensorrt_fp32_model\deploy_cfg.py'
model_cfg = r'C:\Users\AI\Desktop\split2\tensorrt_fp32_model\best.py'
output_dir = r"C:\Users\AI\Desktop\split2\mmdeploy_python_api_out\val"

# read deploy_cfg and model_cfg
deploy_cfg, model_cfg = load_config(deploy_cfg, model_cfg)

# build task and backend model
task_processor = build_task_processor(model_cfg, deploy_cfg, device)
model = task_processor.build_backend_model(backend_model)

# process input image
input_shape = get_input_shape(deploy_cfg)
for img in os.listdir(image_dir):
    model_inputs, _ = task_processor.create_input(os.path.join(image_dir,img), input_shape)

    # do model inference
    with torch.no_grad():
        result = model.test_step(model_inputs)

    # visualize results
    task_processor.visualize(
        image=os.path.join(image_dir,img),
        model=model,
        result=result[0],
        window_name='visualize',
        output_file=os.path.join(output_dir, img))

Environment

12/27 15:46:53 - mmengine - INFO -

12/27 15:46:53 - mmengine - INFO - **********Environmental information**********
12/27 15:46:56 - mmengine - INFO - sys.platform: win32
12/27 15:46:56 - mmengine - INFO - Python: 3.10.10 | packaged by Anaconda, Inc. | (main, Mar 21 2023, 18:39:17) [MSC v.1916 64 bit (AMD64)]
12/27 15:46:56 - mmengine - INFO - CUDA available: True
12/27 15:46:56 - mmengine - INFO - numpy_random_seed: 2147483648
12/27 15:46:56 - mmengine - INFO - GPU 0: NVIDIA GeForce RTX 3090
12/27 15:46:56 - mmengine - INFO - CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8
12/27 15:46:56 - mmengine - INFO - NVCC: Cuda compilation tools, release 11.8, V11.8.89
12/27 15:46:56 - mmengine - INFO - MSVC: 用于 x64 的 Microsoft (R) C/C++ 优化编译器 19.38.33133 版
12/27 15:46:56 - mmengine - INFO - GCC: n/a
12/27 15:46:56 - mmengine - INFO - PyTorch: 2.0.1+cu118
12/27 15:46:56 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - C++ Version: 199711
  - MSVC 193431937
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
  - OpenMP 2019
  - LAPACK is enabled (usually provided by MKL)
  - CPU capability usage: AVX2
  - CUDA Runtime 11.8
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.7
  - Magma 2.5.4
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/builder/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj /FS -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=OFF, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF,

12/27 15:46:56 - mmengine - INFO - TorchVision: 0.15.2+cu118
12/27 15:46:56 - mmengine - INFO - OpenCV: 4.8.1
12/27 15:46:56 - mmengine - INFO - MMEngine: 0.8.4
12/27 15:46:56 - mmengine - INFO - MMCV: 2.0.1
12/27 15:46:56 - mmengine - INFO - MMCV Compiler: MSVC 192930148
12/27 15:46:56 - mmengine - INFO - MMCV CUDA Compiler: 11.8
12/27 15:46:56 - mmengine - INFO - MMDeploy: 1.3.0+1132e82
12/27 15:46:56 - mmengine - INFO -

12/27 15:46:56 - mmengine - INFO - **********Backend information**********
12/27 15:46:56 - mmengine - INFO - tensorrt:    8.6.1
12/27 15:46:56 - mmengine - INFO - tensorrt custom ops: Available
12/27 15:46:56 - mmengine - INFO - ONNXRuntime: 1.12.0
12/27 15:46:56 - mmengine - INFO - ONNXRuntime-gpu:     None
12/27 15:46:56 - mmengine - INFO - ONNXRuntime custom ops:      Available
12/27 15:46:56 - mmengine - INFO - pplnn:       None
12/27 15:46:56 - mmengine - INFO - ncnn:        None
12/27 15:46:56 - mmengine - INFO - snpe:        None
12/27 15:46:56 - mmengine - INFO - openvino:    None
12/27 15:46:56 - mmengine - INFO - torchscript: 2.0.1+cu118
12/27 15:46:56 - mmengine - INFO - torchscript custom ops:      NotAvailable
12/27 15:46:56 - mmengine - INFO - rknn-toolkit:        None
12/27 15:46:56 - mmengine - INFO - rknn-toolkit2:       None
12/27 15:46:56 - mmengine - INFO - ascend:      None
12/27 15:46:56 - mmengine - INFO - coreml:      None
12/27 15:46:56 - mmengine - INFO - tvm: None
12/27 15:46:56 - mmengine - INFO - vacc:        None
12/27 15:46:56 - mmengine - INFO -

12/27 15:46:56 - mmengine - INFO - **********Codebase information**********
12/27 15:46:56 - mmengine - INFO - mmdet:       3.1.0
12/27 15:46:56 - mmengine - INFO - mmseg:       1.1.1
12/27 15:46:56 - mmengine - INFO - mmpretrain:  1.0.2
12/27 15:46:56 - mmengine - INFO - mmocr:       None
12/27 15:46:56 - mmengine - INFO - mmagic:      None
12/27 15:46:56 - mmengine - INFO - mmdet3d:     None
12/27 15:46:56 - mmengine - INFO - mmpose:      None
12/27 15:46:56 - mmengine - INFO - mmrotate:    None
12/27 15:46:56 - mmengine - INFO - mmaction:    None
12/27 15:46:56 - mmengine - INFO - mmrazor:     None
12/27 15:46:56 - mmengine - INFO - mmyolo:      None

Error traceback

No response

kydyahah commented 8 months ago

当导出的onnx不包含动态轴时不存在这个问题

kydyahah commented 8 months ago

又验证了，导出的onnx不包含动态轴时也会漏失，但是要比包含动态轴的漏失少一些。

duyanfang123 commented 7 months ago

Have you solved the problem yet? I have too

ao-zz commented 5 months ago

相似的问题在 #2072 中有详细的讨论，也许对查到此 issue 的人有所启发

open-mmlab / mmdeploy

[Bug] mmdeploy部署时会导致部分检测结果缺失 #2617

Checklist

Describe the bug

Reproduction

Environment

Error traceback