open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.74k stars 628 forks source link

[Bug] Detection TensorRT has zero output binding shape #2737

Open vision-gtc opened 6 months ago

vision-gtc commented 6 months ago

Checklist

Describe the bug

I want to convert the yolov8 torch model into a TensorRT model and perform inference. The torch model was converted to ONNX format, and the ONNX model was converted to a tensorRT model.

There were no errors during deployment and inference. However, unlike the torch and onnx models, the tensorRT model could not infer any bboxes. This seems to be because the output is bound with a size of 0, such as (1, 0, 5). Both dynamic and static versions have problems in inference as their shapes become zero.

What modifications can be made so that the output's get_binding_shape becomes non-zero?

Reproduction

configs for deploying

torch config ~~~python ... model = dict( type='YOLODetector', init_cfg=dict( type='Pretrained', checkpoint= './pretrained/detection/yolov8/yolov8_s_syncbn_fast_8xb16-500e_coco_20230117_180101-5aa5f0f1.pth' ), data_preprocessor=dict( type='mmdet.DetDataPreprocessor', mean=[0.0, 0.0, 0.0], std=[255.0, 255.0, 255.0], bgr_to_rgb=True, pad_size_divisor=32), backbone=dict( type='YOLOv8CSPDarknet', arch='P5', last_stage_out_channels=1024, deepen_factor=0.33, widen_factor=0.5, norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), act_cfg=dict(type='SiLU', inplace=True)), neck=dict( type='YOLOv8PAFPN', deepen_factor=0.33, widen_factor=0.5, in_channels=[256, 512, 1024], out_channels=[256, 512, 1024], num_csp_blocks=3, norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), act_cfg=dict(type='SiLU', inplace=True)), bbox_head=dict( type='YOLOv8Head', head_module=dict( type='YOLOv8HeadModule', num_classes=1, in_channels=[256, 512, 1024], widen_factor=0.5, reg_max=16, norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), act_cfg=dict(type='SiLU', inplace=True), featmap_strides=[8, 16, 32]), prior_generator=dict( type='mmdet.MlvlPointGenerator', offset=0.5, strides=[8, 16, 32]), bbox_coder=dict(type='DistancePointBBoxCoder'), loss_cls=dict( type='mmdet.CrossEntropyLoss', use_sigmoid=True, reduction='none', loss_weight=0.5), loss_bbox=dict( type='IoULoss', iou_mode='ciou', bbox_format='xyxy', reduction='sum', loss_weight=7.5, return_iou=False), loss_dfl=dict( type='mmdet.DistributionFocalLoss', reduction='mean', loss_weight=0.375)), train_cfg=dict( assigner=dict( type='BatchTaskAlignedAssigner', num_classes=1, use_ciou=True, topk=10, alpha=0.5, beta=6.0, eps=1e-09)), test_cfg=dict( multi_label=True, nms_pre=30000, score_thr=0.001, nms=dict(type='nms', iou_threshold=0.65), max_per_img=300)) ... ~~~
onnx deploy config (dynamic ver.) ~~~python backend_config = dict(type='onnxruntime') codebase_config = dict( model_type='end2end', module=[ 'mmyolo.deploy', ], post_processing=dict( background_label_id=-1, confidence_threshold=0.005, iou_threshold=0.5, keep_top_k=100, max_output_boxes_per_class=200, pre_top_k=2000, score_threshold=0.05), task='ObjectDetection', type='mmyolo') onnx_config = dict( dynamic_axes=dict( dets=dict({ 0: 'batch', 1: 'num_dets' }), input=dict({ 0: 'batch', 2: 'height', 3: 'width' }), labels=dict({ 0: 'batch', 1: 'num_dets' })), export_params=True, input_names=[ 'input', ], input_shape=None, keep_initializers_as_inputs=False, opset_version=11, optimize=True, output_names=[ 'dets', 'labels', ], save_file='end2end.onnx', type='onnx') ~~~
TensorRT deploy config (dynamic ver.) ~~~python backend_config = dict( common_config=dict(fp16_mode=True, max_workspace_size=4294967296), model_inputs=[ dict( input_shapes=dict( input=dict( max_shape=[ 4, 3, 4096, 4096, ], min_shape=[ 1, 3, 2816, 2816, ], opt_shape=[ 1, 3, 2816, 4096, ]))), ], type='tensorrt') codebase_config = dict( model_type='end2end', module=[ 'mmyolo.deploy', ], post_processing=dict( background_label_id=-1, confidence_threshold=0.005, iou_threshold=0.5, keep_top_k=100, max_output_boxes_per_class=200, pre_top_k=2000, score_threshold=0.05), task='ObjectDetection', type='mmyolo') onnx_config = dict( dynamic_axes=dict( dets=dict({ 0: 'batch', 1: 'num_dets' }), input=dict({ 0: 'batch', 2: 'height', 3: 'width' }), labels=dict({ 0: 'batch', 1: 'num_dets' })), export_params=True, input_names=[ 'input', ], input_shape=( 4096, 2816, ), keep_initializers_as_inputs=False, opset_version=11, optimize=True, output_names=[ 'dets', 'labels', ], save_file='end2end.onnx', type='onnx') use_efficientnms = False ~~~

inference code

from mmdeploy.apis import torch2onnx
from mmdeploy.apis.tensorrt import onnx2tensorrt

...

onnx_ckpt = os.path.join(out, onnx_filename)
trt_ckpt = os.path.join(out, trt_filename)

# torch -> onnx
torch2onnx(img, out, onnx_filename, onnx_deploy_cfg, torch_cfg, torch_ckpt, 'cuda')

# onnx -> tensorrt
model_id = 0
onnx2tensorrt(out, trt_filename, model_id, trt_deploy_cfg, onnx_ckpt, device='cuda')

# inference with onnx/tensorrt model
class DeployAPI:
    def __init__(self, model_cfg, deploy_cfg, backend_files, device='cuda'):
        from mmdeploy.utils import get_input_shape, load_config
        self.deploy_cfg, self.model_cfg = load_config(deploy_cfg, model_cfg)

        from mmdeploy.apis.utils import build_task_processor
        self.task_processor = build_task_processor(self.model_cfg, self.deploy_cfg, device)

        self.model = self.task_processor.build_backend_model(
            backend_files, self.task_processor.update_data_preprocessor)

        self.input_shape = get_input_shape(self.deploy_cfg)

    def inference_model(self, img: np.ndarray):
        model_inputs, _ = self.task_processor.create_input(img, self.input_shape)
        with torch.no_grad():
            result = self.model.test_step(model_inputs)

        return result

onnx_api = DeployAPI(torch_cfg, onnx_cfg, [onnx_ckpt], device)
trt_api = DeployAPI(torch_cfg, trt_cfg, [trt_ckpt])

img = cv2.imread(img)
onnx_result = onnx_api.inference_model(img) # 5 bboxes
trt_result = trt_api.inference_model(img) # 0 bboxes (empty output data sample)

inference_model called, tensor-rt model output binded as zero shape in mmdeploy.backend.tensorrt.wrapper.py

 # line 163
        outputs = {}
        for output_name in self._output_names:
            idx = self.engine.get_binding_index(output_name) # output_name==dets , idx==1
            dtype = torch_dtype_from_trt(self.engine.get_binding_dtype(idx)) # dtype==torch.float32
            shape = tuple(self.context.get_binding_shape(idx)) # shape==(1, 0, 5)
            device = torch_device_from_trt(self.engine.get_location(idx)) 
            output = torch.empty(size=shape, dtype=dtype, device=device) # output== tensor([], device='cuda:0', size=(1, 0, 5))
            outputs[output_name] = output
            bindings[idx] = output.data_ptr()  # bindings[1] == 0

Environment

04/16 11:21:46 - mmengine - INFO - **********Environmental information**********
04/16 11:22:27 - mmengine - INFO - sys.platform: win32
04/16 11:22:27 - mmengine - INFO - Python: 3.8.10 (tags/v3.8.10:3d8993a, May  3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)]
04/16 11:22:27 - mmengine - INFO - CUDA available: True
04/16 11:22:27 - mmengine - INFO - MUSA available: False
04/16 11:22:27 - mmengine - INFO - numpy_random_seed: 2147483648
04/16 11:22:27 - mmengine - INFO - GPU 0: NVIDIA RTX A4000
04/16 11:22:27 - mmengine - INFO - CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7
04/16 11:22:27 - mmengine - INFO - NVCC: Cuda compilation tools, release 11.7, V11.7.64
04/16 11:22:27 - mmengine - INFO - MSVC: Microsoft (R) C/C++ 최적화 컴파일러 버전 19.29.30147(x64)
04/16 11:22:27 - mmengine - INFO - GCC: n/a
04/16 11:22:27 - mmengine - INFO - PyTorch: 1.13.1+cu117
04/16 11:22:27 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - C++ Version: 199711
  - MSVC 192829337
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 2019
  - LAPACK is enabled (usually provided by MKL)
  - CPU capability usage: AVX2
  - CUDA Runtime 11.7
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.5
  - Magma 2.5.4
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/builder/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -IC:/actions-runner/_work/pytorch/pytorch/builder/windows/mkl/include -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF,

04/16 11:22:27 - mmengine - INFO - TorchVision: 0.14.1+cu117
04/16 11:22:27 - mmengine - INFO - OpenCV: 4.7.0
04/16 11:22:27 - mmengine - INFO - MMEngine: 0.10.3
04/16 11:22:27 - mmengine - INFO - MMCV: 2.0.1
04/16 11:22:27 - mmengine - INFO - MMCV Compiler: MSVC 192930148
04/16 11:22:27 - mmengine - INFO - MMCV CUDA Compiler: 11.7
04/16 11:22:27 - mmengine - INFO - MMDeploy: 1.3.1+bc75c9d
04/16 11:22:27 - mmengine - INFO -

04/16 11:22:27 - mmengine - INFO - **********Backend information**********
04/16 11:22:27 - mmengine - INFO - tensorrt:    8.6.1
04/16 11:22:27 - mmengine - INFO - tensorrt custom ops: Available
04/16 11:22:27 - mmengine - INFO - ONNXRuntime: 1.16.0
04/16 11:22:27 - mmengine - INFO - ONNXRuntime-gpu:     1.16.0
04/16 11:22:27 - mmengine - INFO - ONNXRuntime custom ops:      Available
04/16 11:22:27 - mmengine - INFO - pplnn:       None
04/16 11:22:27 - mmengine - INFO - ncnn:        None
04/16 11:22:27 - mmengine - INFO - snpe:        None
04/16 11:22:27 - mmengine - INFO - openvino:    None
04/16 11:22:27 - mmengine - INFO - torchscript: 1.13.1+cu117
04/16 11:22:27 - mmengine - INFO - torchscript custom ops:      NotAvailable
04/16 11:22:28 - mmengine - INFO - rknn-toolkit:        None
04/16 11:22:28 - mmengine - INFO - rknn-toolkit2:       None
04/16 11:22:28 - mmengine - INFO - ascend:      None
04/16 11:22:28 - mmengine - INFO - coreml:      None
04/16 11:22:28 - mmengine - INFO - tvm: None
04/16 11:22:28 - mmengine - INFO - vacc:        None
04/16 11:22:28 - mmengine - INFO -

04/16 11:22:28 - mmengine - INFO - **********Codebase information**********
04/16 11:22:28 - mmengine - INFO - mmdet:       3.3.0
04/16 11:22:28 - mmengine - INFO - mmseg:       1.2.2
04/16 11:22:28 - mmengine - INFO - mmpretrain:  1.2.0
04/16 11:22:28 - mmengine - INFO - mmocr:       None
04/16 11:22:28 - mmengine - INFO - mmagic:      None
04/16 11:22:28 - mmengine - INFO - mmdet3d:     None
04/16 11:22:28 - mmengine - INFO - mmpose:      None
04/16 11:22:28 - mmengine - INFO - mmrotate:    None
04/16 11:22:28 - mmengine - INFO - mmaction:    None
04/16 11:22:28 - mmengine - INFO - mmrazor:     None

Error traceback

[04/16/2024-11:38:10] [TRT] [E] 3: [executionContext.cpp::nvinfer1::rt::ExecutionContext::enqueueInternal::795] Error Code 3: API Usage Error (Parameter check failed at: executionContext.cpp::nvinfer1::rt::ExecutionContext::enqueueInternal::795, condition: bindings[x] || nullBindingOK
)
[04/16/2024-11:38:13] [TRT] [E] 3: [executionContext.cpp::nvinfer1::rt::ExecutionContext::enqueueInternal::795] Error Code 3: API Usage Error (Parameter check failed at: executionContext.cpp::nvinfer1::rt::ExecutionContext::enqueueInternal::795, condition: bindings[x] || nullBindingOK
)
kingstarcraft commented 4 months ago

Same problem without TensorRT when using detector.cxx to infer solov2 model.