[Bug] inference on centerpoint_02pillar_second_secfpn_circlenms_4x8_cyclic_20e_nus.pth and end2end.onnx (that you provide, and refer to docs/zh_cn/04-supported-codebases/mmdet3d.md ) and use mmdetection3d/tests/data/nuscenes/sweeps/LIDAR_TOP/n008-2018-09-18-12-07-26-0400__LIDAR_TOP__1537287083900561.pcd.bin , results are inconsistent...

sylivahf commented 1 year ago

Checklist

[X] I have searched related issues but cannot get the expected help.
[X] 2. I have read the FAQ documentation but cannot get the expected help.
[X] 3. The bug has not been fixed in the latest version.

Describe the bug

Q1: when load .pth file，print：The model and loaded state dict do not match exactly size mismatch for pts_voxel_encoder.pfn_layers.0.linear.weight: copying a param with shape torch.Size([64, 10]) from checkpoint, the shape in current model is torch.Size([64, 11]). Q2: results are not same. as follows： ======== torch模型推理 ============ bboxes: LiDARInstance3DBoxes( tensor([[-2.2048e+01, 4.5879e-01, -3.4730e+00, 5.5220e-01, 1.7534e+00, 1.0230e+00, -2.9996e+00, 6.4907e-09, -2.2300e-10], [-1.9456e+01, 2.7721e-01, -2.9449e+00, 4.9635e-01, 1.8013e+00, 1.0433e+00, 3.0218e+00, 6.4907e-09, -2.2300e-10], [-3.5420e+00, -2.5560e-01, -1.7169e+00, 1.5123e+00, 4.7465e-01, 1.1190e+00, 2.7349e+00, -1.2182e-09, -1.8843e-09], [-1.9399e+01, 3.0180e-01, -2.6115e+00, 2.0330e+00, 6.6581e-01, 1.3884e+00, 1.4075e+00, 1.4518e-02, 8.6695e-04], [-2.2165e+01, 5.3310e-01, -3.9107e+00, 2.0958e+00, 7.8873e-01, 1.6001e+00, -2.8884e-01, -1.6687e-05, 9.6859e-03], [-2.2057e+01, 1.1213e+00, -2.9663e+00, 6.6428e-01, 7.2668e-01, 1.8508e+00, 7.0642e-01, 4.0863e-02, 8.3566e-03], [-3.5352e+00, -1.4072e-01, -1.8142e+00, 6.2023e-01, 5.8055e-01, 1.7103e+00, -2.1606e+00, -1.1876e-03, 9.9198e-03], [-1.4823e+01, 4.4471e-01, -3.0109e+00, 6.4587e-01, 6.9740e-01, 1.8230e+00, 2.6232e+00, -1.4208e-09, -1.0301e-08]], device='cuda:0')) scores: tensor([0.1688, 0.1259, 0.1620, 0.1152, 0.1008, 0.1408, 0.1107, 0.1081], device='cuda:0') labels: tensor([5, 5, 7, 7, 6, 8, 8, 8], device='cuda:0', dtype=torch.int32)

======== onnx模型推理 ============ bboxes: LiDARInstance3DBoxes( tensor([[-1.3144e+01, -5.9227e-01, -2.2817e+00, 3.6645e-01, 2.5220e+00, 8.2899e-01, -1.1088e-01, -1.9604e-09, -4.6045e-10], [-1.3051e+01, 9.6707e-01, -2.3747e+00, 4.3470e-01, 3.1165e+00, 8.6941e-01, -1.1153e-01, -1.9604e-09, -4.6045e-10], [-1.7018e-01, -4.0455e-01, -9.8223e-01, 4.6753e-01, 4.9864e-01, 1.2611e+00, 3.2107e-01, -1.5343e-04, -7.2500e-05]])) scores: tensor([0.2222, 0.2105, 0.3128]) labels: tensor([5, 5, 9], dtype=torch.int32)

Q3: about .pth，executed again, the results are inconsistent. ======== torch模型推理 ============ bboxes: LiDARInstance3DBoxes( tensor([[-7.6455e+00, 1.2476e+00, -1.3899e+00, 8.5401e+00, 2.6876e+00, 3.4241e+00, 3.1323e+00, -4.9060e-10, 2.5482e-10], [-3.6816e+00, 2.8384e-01, -7.7362e-01, 2.0953e+00, 2.0531e+00, 3.2199e+00, 1.4697e+00, -1.6465e-09, 1.5545e-09], [-2.1928e+01, 4.8591e-01, -3.4773e+00, 3.9633e-01, 2.0722e+00, 8.9335e-01, 3.2418e-01, 6.4907e-09, -2.2300e-10], [-2.1966e+01, 6.0843e-01, -4.0183e+00, 2.0318e+00, 7.3325e-01, 1.4105e+00, 1.5185e+00, -1.4774e-02, -6.5357e-01], [-1.9332e+01, 2.8846e-01, -2.1425e+00, 1.9006e+00, 6.6670e-01, 1.2766e+00, 1.7748e+00, 2.3108e-03, 3.3878e-02], [-3.6679e+00, -2.1373e-01, -2.7283e-01, 2.0697e+00, 8.8521e-01, 1.5501e+00, 2.9671e+00, -1.2182e-09, -1.8843e-09], [-9.2660e+00, 2.3157e-01, -1.6942e+00, 6.6807e-01, 6.2215e-01, 1.7510e+00, -1.7736e+00, -4.5018e-03, -3.0482e-04], [-8.3196e+00, 2.0250e-01, -1.6046e+00, 6.5102e-01, 6.0909e-01, 1.7418e+00, -1.7251e+00, 3.9005e-04, 1.0049e-04], [-1.1603e+01, 5.7692e-01, -1.6174e+00, 6.4621e-01, 6.0476e-01, 1.7634e+00, -1.1972e+00, 2.0978e-02, -2.3189e-03]], device='cuda:0')) scores: tensor([0.1025, 0.5338, 0.1403, 0.2731, 0.1128, 0.1032, 0.1861, 0.1627, 0.1003], device='cuda:0') labels: tensor([2, 4, 5, 6, 6, 6, 8, 8, 8], device='cuda:0', dtype=torch.int32)

Reproduction

demo code ： #

import sys
from mmdeploy.apis import inference_model
from mmdet3d.apis import init_model, inference_detector
from mmdet3d.utils import register_all_modules
from mmdeploy.utils import get_input_shape, load_config
import torch
import numpy as np
from copy import deepcopy
from mmdeploy.apis import build_task_processor
from mmdeploy.codebase.mmdet3d.deploy.voxel_detection_model import VoxelDetectionModel
from mmdeploy.utils import Backend, Codebase
from mmdet3d.models.data_preprocessors.data_preprocessor import Det3DDataPreprocessor
from mmengine.dataset import Compose, pseudo_collate
from mmdet3d.registry import MODELS
from mmdet3d.structures import Box3DMode, Det3DDataSample, get_box_type

if __name__=='__main__':
    model_cfg = ('./mmdetection3d/configs/centerpoint/centerpoint_pillar02_second_secfpn_head-circlenms_8xb4-cyclic-20e_nus-3d.py')
    checkpoint = './pretrain/centerpoint_02pillar_second_secfpn_circlenms_4x8_cyclic_20e_nus.pth'
    deploy_cfg = ('./pretrain/voxel-detection_tensorrt_dynamic-nus-64x4.py')                 
    backend_files = ['pretrain/end2end.engine']
    img = './mmdetection3d/tests/data/nuscenes/sweeps/LIDAR_TOP/n008-2018-09-18-12-07-26-0400__LIDAR_TOP__1537287083900561.pcd.bin'
    device = 'cuda:0'
    # torch模型测试
    register_all_modules()
    print('======== torch模型推理 ============')
    model = init_model(model_cfg, checkpoint, device)
    result, data = inference_detector(model, img) # 返回Det3DDataSample类别
    preds = result.pred_instances_3d
    print('bboxes: ', result.pred_instances_3d.bboxes_3d) 
    print('scores: ', result.pred_instances_3d.scores_3d) 
    print('labels: ', result.pred_instances_3d.labels_3d) 

    # 数据前处理
    cfg = model.cfg
    # build the data pipeline
    test_pipeline = deepcopy(cfg.test_dataloader.dataset.pipeline)
    test_pipeline = Compose(test_pipeline)
    box_type_3d, box_mode_3d = get_box_type(cfg.test_dataloader.dataset.box_type_3d)

    data = []
    # load from point cloud file
    data_ = dict(
        lidar_points=dict(lidar_path=img),
        timestamp=1,
        # for ScanNet demo we need axis_align_matrix
        axis_align_matrix=np.eye(4),
        box_type_3d=box_type_3d,
        box_mode_3d=box_mode_3d)

    data_ = test_pipeline(data_)
    data.append(data_)

    collate_data = pseudo_collate(data) # 相当于[dict]转{key:[]}

    ## onnx 测试
    print('\n======== onnx模型推理 ============')

    deploy_cfg, model_cfg = load_config(deploy_cfg, model_cfg)

    # 点云体素化
    DP = Det3DDataPreprocessor(voxel=True, voxel_layer=dict(
            max_num_points=20,voxel_size=[0.2, 0.2, 8],
            point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0],
            max_voxels=(30000, 40000))) # voxel_type可决定静态还是动态pillars
    voxel_dicts = DP.voxelize(collate_data['inputs']['points'])

    import pickle
    f_save = open('onnx_input.pkl', 'wb')
    pickle.dump(voxel_dicts, f_save)
    f_save.close()

    input_dict = {
                'voxels': voxel_dicts['voxels'], #torch.rand((3945, 32, 4))
                'num_points': voxel_dicts['num_points'], #torch.ones((3945), dtype=torch.int32)
                'coors': voxel_dicts['coors'] #torch.ones((3945, 4), dtype=torch.int32)
            }
    import onnxruntime
    ort_session = onnxruntime.InferenceSession("./pretrain/end2end.onnx")
    input_dict['voxels'] = input_dict['voxels'].cpu().numpy()
    input_dict['num_points'] = input_dict['num_points'].cpu().numpy()
    input_dict['coors'] = input_dict['coors'].cpu().numpy()
    ort_output = ort_session.run(['cls_score', 'bbox_pred', 'dir_cls_pred'], input_dict)

    # 后处理
    outputs_onnx = {}
    outputs_onnx['cls_score'] = torch.tensor(ort_output[0])
    outputs_onnx['bbox_pred'] = torch.tensor(ort_output[1])
    outputs_onnx['dir_cls_pred'] = torch.tensor(ort_output[2])

    from mmdeploy.codebase.mmdet3d.deploy.voxel_detection_model import VoxelDetectionModel
    prediction = VoxelDetectionModel.postprocess(
            model_cfg=model_cfg,
            deploy_cfg=deploy_cfg,
            outs=outputs_onnx,
            metas=collate_data['data_samples'])
    onnx_result = prediction[0].pred_instances_3d
    print('bboxes: ', onnx_result.bboxes_3d) 
    print('scores: ', onnx_result.scores_3d) 
    print('labels: ', onnx_result.labels_3d)

Environment

06/15 20:35:37 - mmengine - INFO - 

06/15 20:35:37 - mmengine - INFO - **********Environmental information**********
06/15 20:35:38 - mmengine - INFO - sys.platform: linux
06/15 20:35:38 - mmengine - INFO - Python: 3.7.13 (default, Oct 18 2022, 18:57:03) [GCC 11.2.0]
06/15 20:35:38 - mmengine - INFO - CUDA available: True
06/15 20:35:38 - mmengine - INFO - numpy_random_seed: 2147483648
06/15 20:35:38 - mmengine - INFO - GPU 0,1,2,3: NVIDIA A100-SXM4-40GB
06/15 20:35:38 - mmengine - INFO - CUDA_HOME: /usr/local/cuda
06/15 20:35:38 - mmengine - INFO - NVCC: Cuda compilation tools, release 11.3, V11.3.58
06/15 20:35:38 - mmengine - INFO - GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
06/15 20:35:38 - mmengine - INFO - PyTorch: 1.10.2+cu113
06/15 20:35:38 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.2
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

06/15 20:35:38 - mmengine - INFO - TorchVision: 0.11.3+cu113
06/15 20:35:38 - mmengine - INFO - OpenCV: 4.5.1
06/15 20:35:38 - mmengine - INFO - MMEngine: 0.7.4
06/15 20:35:38 - mmengine - INFO - MMCV: 2.0.0rc1
06/15 20:35:38 - mmengine - INFO - MMCV Compiler: GCC 9.3
06/15 20:35:38 - mmengine - INFO - MMCV CUDA Compiler: 11.3
06/15 20:35:38 - mmengine - INFO - MMDeploy: 1.1.0+439f88b
06/15 20:35:38 - mmengine - INFO - 

06/15 20:35:38 - mmengine - INFO - **********Backend information**********
06/15 20:35:38 - mmengine - INFO - tensorrt:    8.4.3.1
06/15 20:35:38 - mmengine - INFO - tensorrt custom ops: Available
06/15 20:35:38 - mmengine - INFO - ONNXRuntime: None
06/15 20:35:38 - mmengine - INFO - ONNXRuntime-gpu:     1.8.1
06/15 20:35:38 - mmengine - INFO - ONNXRuntime custom ops:      Available
06/15 20:35:38 - mmengine - INFO - pplnn:       None
06/15 20:35:38 - mmengine - INFO - ncnn:        None
06/15 20:35:38 - mmengine - INFO - snpe:        None
06/15 20:35:38 - mmengine - INFO - openvino:    None
06/15 20:35:38 - mmengine - INFO - torchscript: 1.10.2+cu113
06/15 20:35:38 - mmengine - INFO - torchscript custom ops:      NotAvailable
06/15 20:35:38 - mmengine - INFO - rknn-toolkit:        None
06/15 20:35:38 - mmengine - INFO - rknn-toolkit2:       None
06/15 20:35:38 - mmengine - INFO - ascend:      None
06/15 20:35:38 - mmengine - INFO - coreml:      None
06/15 20:35:38 - mmengine - INFO - tvm: None
06/15 20:35:38 - mmengine - INFO - vacc:        None
06/15 20:35:38 - mmengine - INFO - 

06/15 20:35:38 - mmengine - INFO - **********Codebase information**********
06/15 20:35:38 - mmengine - INFO - mmdet:       3.0.0rc1
06/15 20:35:38 - mmengine - INFO - mmseg:       1.0.0rc0
06/15 20:35:38 - mmengine - INFO - mmpretrain:  None
06/15 20:35:38 - mmengine - INFO - mmocr:       None
06/15 20:35:38 - mmengine - INFO - mmagic:      None
06/15 20:35:38 - mmengine - INFO - mmdet3d:     1.1.0rc1
06/15 20:35:38 - mmengine - INFO - mmpose:      None
06/15 20:35:38 - mmengine - INFO - mmrotate:    None
06/15 20:35:38 - mmengine - INFO - mmaction:    None
06/15 20:35:38 - mmengine - INFO - mmrazor:     None

Error traceback

No response

RunningLeon commented 1 year ago

@sylivahf hi A1: you model config and ckpt are not matched A2: the shape of deploy cfg and input_img and test_img should be aligned.

tested ok with this scripts:

python3 ./tools/deploy.py \
configs/mmdet3d/voxel-detection/voxel-detection_tensorrt_dynamic-nus-20x5.py \
../mmdetection3d/configs/centerpoint/centerpoint_pillar02_second_secfpn_head-circlenms_8xb4-cyclic-20e_nus-3d.py \
"../mmdeploy_checkpoints/mmdet3d/centerpoint/centerpoint_02pillar_second_secfpn_circlenms_4x8_cyclic_20e_nus_20220811_031844-191a3822.pth" \
"tests/data/n008-2018-08-01-15-16-36-0400__LIDAR_TOP__1533151612397179.pcd.bin" \
--work-dir "../workdir/test_mmdet3d/mmdet3d/centerpoint/tensorrt/static/fp32/centerpoint_02pillar_second_secfpn_circlenms_4x8_cyclic_20e_nus_20220811_031844-191a3822" \
--device cuda:0 \
--log-level INFO \
--test-img tests/data/n008-2018-08-01-15-16-36-0400__LIDAR_TOP__1533151612397179.pcd.bin

sylivahf commented 1 year ago

@RunningLeon Thanks for your answer. my main problem is that the pth model that downloaded did not match onnx.

now, I converted onnx and trt based on the downloaded centerpoint_pillar_nuscenes, but when I inferred trt, the result was empty. Traceability finds the empty result returned directly when TRT inference. （question)

pth and onnx, trt result as follows: (./mmdetection3d/tests/data/nuscenes/sweeps/LIDAR_TOP/n008-2018-09-18-12-07-26-0400__LIDAR_TOP__1537287083900561.pcd.bin) ======== torch模型推理 ============ bboxes: LiDARInstance3DBoxes( tensor([[-1.3145e+01, -5.9344e-01, -2.2842e+00, 3.6708e-01, 2.5209e+00, 8.3522e-01, -1.1192e-01, -1.9604e-09, -4.6045e-10], [-1.3050e+01, 9.7002e-01, -2.3780e+00, 4.3426e-01, 3.1145e+00, 8.7034e-01, -1.1072e-01, -1.9604e-09, -4.6045e-10], [-1.7004e-01, -4.0453e-01, -9.8386e-01, 4.6877e-01, 4.9995e-01, 1.2595e+00, 3.0697e-01, -1.5224e-04, -7.4921e-05]], device='cuda:0')) scores: tensor([0.2247, 0.2155, 0.3161], device='cuda:0') labels: tensor([5, 5, 9], device='cuda:0', dtype=torch.int32) ======== onnx模型推理 ============ bboxes: LiDARInstance3DBoxes( tensor([[-1.3144e+01, -5.9194e-01, -2.2820e+00, 3.6647e-01, 2.5214e+00, 8.2906e-01, -1.1093e-01, -1.9604e-09, -4.6045e-10], [-1.3051e+01, 9.6715e-01, -2.3750e+00, 4.3467e-01, 3.1164e+00, 8.6940e-01, -1.1152e-01, -1.9604e-09, -4.6045e-10], [-1.7019e-01, -4.0463e-01, -9.8242e-01, 4.6757e-01, 4.9867e-01, 1.2611e+00, 3.2327e-01, -1.5310e-04, -7.2168e-05]])) scores: tensor([0.2223, 0.2106, 0.3128]) labels: tensor([5, 5, 9], dtype=torch.int32) ======== trt模型推理 ============ 06/25 16:15:27 - mmengine - INFO - Successfully loaded tensorrt plugins from /home/mmdeploy/mmdeploy/lib/libmmdeploy_tensorrt_ops.so 06/25 16:15:27 - mmengine - INFO - Successfully loaded tensorrt plugins from /home/mmdeploy/mmdeploy/lib/libmmdeploy_tensorrt_ops.so <InstanceData(

META INFORMATION

DATA FIELDS
bboxes_3d: LiDARInstance3DBoxes(
        tensor([], device='cuda:0', size=(0, 9)))
labels_3d: tensor([], device='cuda:0', dtype=torch.int32)
scores_3d: tensor([], device='cuda:0')

) at 0x7fd1408fbc90>

and in torch2onnx and onnx2tensorrt，There are no errors。

RunningLeon commented 1 year ago

Something wrong with inference of TensorRT engine. Could you test the trt engine with trtexec provided by TensorRT? https://github.com/NVIDIA/TensorRT/tree/master/samples/trtexec#using-trtexec

sylivahf commented 1 year ago

Something wrong with inference of TensorRT engine. Could you test the trt engine with trtexec provided by TensorRT?

https://github.com/NVIDIA/TensorRT/tree/master/samples/trtexec#using-trtexec

@RunningLeon Thank you for your prompt reply.

I follow the link you gave to test the trt engine and show pass. the main information: [06/27/2023-09:19:58] [I] === Performance summary === [06/27/2023-09:19:58] [I] Throughput: 270.575 qps [06/27/2023-09:19:58] [I] Latency: min = 3.90543 ms, max = 5.56116 ms, mean = 4.17728 ms, median = 4.16284 ms, percentile(99%) = 4.27029 ms [06/27/2023-09:19:58] [I] Enqueue Time: min = 0.33667 ms, max = 0.988037 ms, mean = 0.51903 ms, median = 0.549469 ms, percentile(99%) = 0.697998 ms [06/27/2023-09:19:58] [I] H2D Latency: min = 0.03125 ms, max = 0.143127 ms, mean = 0.0362435 ms, median = 0.0357666 ms, percentile(99%) = 0.0424194 ms [06/27/2023-09:19:58] [I] GPU Compute Time: min = 3.40891 ms, max = 4.17584 ms, mean = 3.68487 ms, median = 3.67407 ms, percentile(99%) = 3.7796 ms [06/27/2023-09:19:58] [I] D2H Latency: min = 0.436798 ms, max = 1.79608 ms, mean = 0.456166 ms, median = 0.452148 ms, percentile(99%) = 0.468994 ms [06/27/2023-09:19:58] [I] Total Host Walltime: 3.0121 s [06/27/2023-09:19:58] [I] Total GPU Compute Time: 3.00317 s [06/27/2023-09:19:58] [I] Explanations of the performance metrics are printed in the verbose logs. [06/27/2023-09:19:58] [I] &&&& PASSED TensorRT.trtexec [TensorRT v8403] # trtexec --loadEngine=/home/mmdeploy/result/end2end.engine --shapes=voxels:1000x20x5,num_points:1000,coors:1000x4

Is there any possible reasons? I'm trying to build tensorrt8.4.3, mmdet3d1.1.0rc1 environment based on ubuntu20.04-cuda11.3-mmdeploy pulled by docker.

RunningLeon commented 1 year ago

hi, if trtexec inference with dummy input is good, then should have non-empty result for trt inference in mmeploy. Could you double check the outputs from trt engine in here? https://github.com/open-mmlab/mmdeploy/blob/a5de11947891d6ef25a308785a55c700f88c8d74/mmdeploy/codebase/mmdet3d/deploy/voxel_detection_model.py#L92

sylivahf commented 1 year ago

hi, if trtexec inference with dummy input is good, then should have non-empty result for trt inference in mmeploy. Could you double check the outputs from trt engine in here?

https://github.com/open-mmlab/mmdeploy/blob/a5de11947891d6ef25a308785a55c700f88c8d74/mmdeploy/codebase/mmdet3d/deploy/voxel_detection_model.py#L92

@RunningLeon hi，'outputs = self.wrapper(input_dict) ' here some values of outputs are nan . I really check carefully in conda and docker environment, and input_dict is the same onnx‘s inputs.

I'm going to try cuda10.2, hope to fixe the issue.

RunningLeon commented 1 year ago

@sylivahf hi, maybe you could test with different pytorch(1.8-2.0) and tensorrt(8.2-8.6) and see if there's a good combination.

sylivahf commented 1 year ago

@sylivahf hi, maybe you could test with different pytorch(1.8-2.0) and tensorrt(8.2-8.6) and see if there's a good combination.

@RunningLeon Okay, maybe I'll try it later.

Now，i can deploy and infer pointpillars on cuda10.2+tensorrt8.4.3.1+onnxruntime1.8.1。

But i have a new question：I trained the model with mmdet3d:1.0.0rc4, and now mmdeploy version is v0.9. If I want to convert and infer CenterPoint (pillar&dynamic_pillar) on an existing mmdeploy(v0.9), can i refer mmdeploy: v0.11.0 to modify the local mmdeploy: v0.9？

RunningLeon commented 1 year ago

suggest upgrading to v0.11.0.

sylivahf commented 1 year ago

suggest upgrading to v0.11.0.

@RunningLeon Thank you for your help.

I refer to post-processing of mmdet3d:1.0.0rc4 for rewriting mmdeploy:v0.9 post-processing and have successfully inferred centerpoint_pillar. Next, i will try centerpoint_dynamic_pillar.

Do you have a successful inference of mmdet3d_engine model on CUDA11.x? If so, what is the specific version environment?

JunLFang commented 1 year ago

@sylivahf hi, maybe you could test with different pytorch(1.8-2.0) and tensorrt(8.2-8.6) and see if there's a good combination.

@RunningLeon Okay, maybe I'll try it later.

Now，i can deploy and infer pointpillars on cuda10.2+tensorrt8.4.3.1+onnxruntime1.8.1。

But i have a new question：I trained the model with mmdet3d:1.0.0rc4, and now mmdeploy version is v0.9. If I want to convert and infer CenterPoint (pillar&dynamic_pillar) on an existing mmdeploy(v0.9), can i refer mmdeploy: v0.11.0 to modify the local mmdeploy: v0.9？

@RunningLeon Dear , I found that you have infer pointpillar sucessfully, how about the version of mmdeploy you used When you used the below codebase versions: 06/15 20:35:38 - mmengine - INFO - **Codebase information** 06/15 20:35:38 - mmengine - INFO - mmdet: 3.0.0rc1 06/15 20:35:38 - mmengine - INFO - mmseg: 1.0.0rc0 06/15 20:35:38 - mmengine - INFO - mmpretrain: None 06/15 20:35:38 - mmengine - INFO - mmocr: None 06/15 20:35:38 - mmengine - INFO - mmagic: None 06/15 20:35:38 - mmengine - INFO - mmdet3d: 1.1.0rc1 06/15 20:35:38 - mmengine - INFO - mmpose: None 06/15 20:35:38 - mmengine - INFO - mmrotate: None 06/15 20:35:38 - mmengine - INFO - mmaction: None 06/15 20:35:38 - mmengine - INFO - mmrazor: None

Thank you very much

MarvinKlemp commented 1 year ago

Okay, so after a few workdays here is what I can confirm:

For all tests I used mim install "mmengine>=0.7.1" "mmcv>=2.0.0" "mmdet>=3.0.0" and mmdet3d==1.2.0 + mmdeploy==1.2.0

Cuda 12

Pointpillars returns an empty result -> I didn't further investigate as I am mainly interested in Centerpoint

I could get Centerpoint to run. The TRT engine produces outputs. However, there are some index errors probably coming from voxel_detection_model.py and it seems that instead of cls_score, bbox_pred and dir_cls_pred the outs var in postprocess has the keys cls_score0, bbox_pred0, ...

Cuda 11

I tried several different combinations of CUDA11 + TRT but could not get them correctly running.

Unfortunately, even the GPU/Release container of this mmdeploy doesn't work as it runs into the TensorRT 8.2 issue. Which is discussed in other github issues + is mentioned in the docs.

Cuda 10.2

As @sylivahf described, CUDA 10.2 + TRT 8.4 works with pointpillars. Regarding Centerpoint I run into the same index issues.

However it was a stuggle to create a Dockerfile that supports this. The official CUDA 10.2 images are not on dockerhub anymore as Ubuntu 18.04 reached its end of life.

Furthermore, this is not a solution for me (and probably also not for others) as CUDA 10.2 is NOT maintained by newer GPUs anymore. Even while I have an RTX2080 which still supports CUDA 10.2, the GPU I actually intend to use requires at least CUDA 11.

I have the feeling that this is a bigger strugges as it should be. IMO this is mainly due to the fact that the official docker container DOES NOT work with mmdetection3d due to the TRT 8.2 bug.

So I think it the best idea maybe is to create a separate new docker container (using CUDA > 11) where mmdet3d runs with centerpoint AND pointpillars. I am willing to help in this regard. However, as I don't have TRT experience I am struggling to debug the issues mentioned above.

@RunningLeon do you know anyone of the team who could help?

RunningLeon commented 1 year ago

@MarvinKlemp hi, for cuda11.3, have you tried mmdeploy docker image openmmlab/mmdeploy:ubuntu20.04-cuda11.3-mmdeploy + TensorRT-8.4.1.5.Linux.x86_64-gnu.cuda-11.6.cudnn8.4.tar.gz ?

sylivahf commented 1 year ago

@JunLFang My environment is : 2023-07-12 21:58:59,166 - mmdeploy - INFO - TorchVision: 0.9.2+cu102 2023-07-12 21:58:59,166 - mmdeploy - INFO - OpenCV: 4.5.5 2023-07-12 21:58:59,167 - mmdeploy - INFO - MMCV: 1.5.3 2023-07-12 21:58:59,167 - mmdeploy - INFO - MMCV Compiler: GCC 7.5 2023-07-12 21:58:59,167 - mmdeploy - INFO - MMCV CUDA Compiler: 10.2 2023-07-12 21:58:59,167 - mmdeploy - INFO - MMDeploy: 0.9.0+unknown 2023-07-12 21:58:59,167 - mmdeploy - INFO -

2023-07-12 21:58:59,167 - mmdeploy - INFO - **Backend information** 2023-07-12 21:59:02,964 - mmdeploy - INFO - onnxruntime: 1.8.1 ops_is_avaliable : True 2023-07-12 21:59:03,039 - mmdeploy - INFO - tensorrt: 8.4.3.1 ops_is_avaliable : True 2023-07-12 21:59:03,073 - mmdeploy - INFO - ncnn: None ops_is_avaliable : False 2023-07-12 21:59:03,077 - mmdeploy - INFO - pplnn_is_avaliable: False 2023-07-12 21:59:03,088 - mmdeploy - INFO - openvino_is_avaliable: False 2023-07-12 21:59:03,101 - mmdeploy - INFO - snpe_is_available: False 2023-07-12 21:59:03,134 - mmdeploy - INFO - ascend_is_available: False 2023-07-12 21:59:03,158 - mmdeploy - INFO - coreml_is_available: False 2023-07-12 21:59:03,158 - mmdeploy - INFO -

2023-07-12 21:59:03,159 - mmdeploy - INFO - **Codebase information** 2023-07-12 21:59:03,342 - mmdeploy - INFO - mmdet: 2.24.0 2023-07-12 21:59:03,342 - mmdeploy - INFO - mmseg: 0.20.0 2023-07-12 21:59:03,342 - mmdeploy - INFO - mmcls: None 2023-07-12 21:59:03,342 - mmdeploy - INFO - mmocr: None 2023-07-12 21:59:03,342 - mmdeploy - INFO - mmedit: None 2023-07-12 21:59:03,343 - mmdeploy - INFO - mmdet3d: 1.0.0rc4 2023-07-12 21:59:03,343 - mmdeploy - INFO - mmpose: None 2023-07-12 21:59:03,343 - mmdeploy - INFO - mmrotate: None

Note, about centerpoint_pillar, i have changed post-process on mmdeploy codebase.

@MarvinKlemp About CUDA 10.2 images, i pull nvidia/cuda: 10.2-cudnn8-devel-ubuntu18.04 on RTX 2080 Ti, then i create a container and install package needed.

I have tried mmdeploy docker image openmmlab/mmdeploy:ubuntu20.04-cuda11.3-mmdeploy, but TRT engine outputs are empty .

MarvinKlemp commented 1 year ago

@sylivahf nvidia/cuda: 10.2-cudnn8-devel-ubuntu18.04 doesnt exists anymore as 18.04 reached its EOL https://hub.docker.com/layers/nvidia/cuda/10.2-cudnn8-devel-ubuntu18.04/

Maybe you still have it locally and can pull it. But I created a container with 10.2 myself and got it running.

MarvinKlemp commented 1 year ago

@RunningLeon

This should be a minimal Dockerfile to reproduce the indexing error. You only require the TensorRT-8.4.3.1.Linux.x86_64-gnu.cuda-11.6.cudnn8.4.tar.gz in your current directory.

FROM openmmlab/mmdeploy:ubuntu20.04-cuda11.3-mmdeploy

# install TRT 8.4.3.1
COPY ./TensorRT-8.4.3.1.Linux.x86_64-gnu.cuda-11.6.cudnn8.4.tar.gz /root/workspace
RUN tar -xvf TensorRT-8.4.3.1.Linux.x86_64-gnu.cuda-11.6.cudnn8.4.tar.gz
RUN pip install TensorRT-8.4.3.1/python/tensorrt-8.4.3.1-cp38-none-linux_x86_64.whl
ENV TENSORRT_DIR=/root/workspace/TensorRT-8.4.3.1
ENV LD_LIBRARY_PATH=/root/workspace/TensorRT-8.4.3.1/lib:$LD_LIBRARY_PATH

# install mmdet3d + configs
RUN git clone https://github.com/open-mmlab/mmdetection3d.git -b v1.2.0 /root/workspace/mmdetection3d
RUN mim install "mmdet3d==1.2.0"

# Fix Error:
# mmengine 0.7.4 is not comppatible with mmdeploy 1.2.0/ mmdet3d 1.2.0
RUN mim install "mmengine==0.8.0" 

# get weights from mmdet3d for centerpoint
RUN wget -P/root/workspace/weights https://download.openmmlab.com/mmdetection3d/v1.0.0_models/centerpoint/centerpoint_02pillar_second_secfpn_circlenms_4x8_cyclic_20e_nus/centerpoint_02pillar_second_secfpn_circlenms_4x8_cyclic_20e_nus_20220811_031844-191a3822.pth

ENV CUDA_VISIBLE_DEVICES=0
WORKDIR /root/workspace/mmdeploy

# Fix Error:
# 2023-07-14:07:39:00 - root - ERROR - Input shape should be between (5000, 20, 5) and (30000, 20, 5) but get (101, 20, 5).

RUN sed -i "s/5000,/50,/" /root/workspace/mmdeploy/configs/mmdet3d/voxel-detection/voxel-detection_tensorrt_dynamic-nus-20x5.py
RUN sed -i "s/5000],/50],/" /root/workspace/mmdeploy/configs/mmdet3d/voxel-detection/voxel-detection_tensorrt_dynamic-nus-20x5.py

# generate engine
RUN python3 tools/deploy.py --device="cuda" configs/mmdet3d/voxel-detection/voxel-detection_tensorrt_dynamic-nus-20x5.py \
    /root/workspace/mmdetection3d/configs/centerpoint/centerpoint_pillar02_second_secfpn_head-circlenms_8xb4-cyclic-20e_nus-3d.py \
    /root/workspace/weights/centerpoint_02pillar_second_secfpn_circlenms_4x8_cyclic_20e_nus_20220811_031844-191a3822.pth \
    /root/workspace/mmdetection3d/tests/data/nuscenes/sweeps/LIDAR_TOP/n008-2018-09-18-12-07-26-0400__LIDAR_TOP__1537287083900561.pcd.bin \
    --work-dir /root/workspace/centerpoint

What I always thought is stange is the two sed -i commands I have to run. This is for changing the default config so it works with the test pointcloud from mmdet3d

The error is actually the same as with CUDA 12. (Some indexing issues regarding outs in voxel_detection_model::postprocess)

Error:

 2023-07-14:07:42:58 - root - ERROR - list indices must be integers or slices, not tuple
Traceback (most recent call last):
  File "/root/workspace/mmdeploy/mmdeploy/utils/utils.py", line 41, in target_wrapper
    result = target(*args, **kwargs)
  File "/root/workspace/mmdeploy/mmdeploy/apis/visualize.py", line 72, in visualize_model
    result = model.test_step(model_inputs)[0]
  File "/usr/local/lib/python3.8/dist-packages/mmengine/model/base_model/base_model.py", line 145, in test_step
    return self._run_forward(data, mode='predict')  # type: ignore
  File "/usr/local/lib/python3.8/dist-packages/mmengine/model/base_model/base_model.py", line 340, in _run_forward
    results = self(**data, mode=mode)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/workspace/mmdeploy/mmdeploy/codebase/mmdet3d/deploy/voxel_detection_model.py", line 104, in forward
    prediction = VoxelDetectionModel.postprocess(
  File "/root/workspace/mmdeploy/mmdeploy/codebase/mmdet3d/deploy/voxel_detection_model.py", line 273, in postprocess
    batch_heatmap = cls_score[:,
TypeError: list indices must be integers or slices, not tuple

RunningLeon commented 1 year ago

@MarvinKlemp hi, could get the Tensor element from the list to Tensor for centerpoint after this line: https://github.com/open-mmlab/mmdeploy/blob/0a8cbe2286dcb226f3140d0bae8700cdfaf37a47/mmdeploy/codebase/mmdet3d/deploy/voxel_detection_model.py#L259

add

        cls_score = cls_score[0]
        bbox_pred = bbox_pred[0]
        dir_cls_pred = dir_cls_pred[0]

MarvinKlemp commented 1 year ago

Thats more or less what I used to fix the error. However, with this I get the same issue as with CUDA 12: The outputs are empty.

I added your fix to the Dockerfile. Do you get outputs?

FROM openmmlab/mmdeploy:ubuntu20.04-cuda11.3-mmdeploy

COPY ./TensorRT-8.4.3.1.Linux.x86_64-gnu.cuda-11.6.cudnn8.4.tar.gz /root/workspace
RUN tar -xvf TensorRT-8.4.3.1.Linux.x86_64-gnu.cuda-11.6.cudnn8.4.tar.gz
RUN pip install TensorRT-8.4.3.1/python/tensorrt-8.4.3.1-cp38-none-linux_x86_64.whl
ENV TENSORRT_DIR=/root/workspace/TensorRT-8.4.3.1
ENV LD_LIBRARY_PATH=/root/workspace/TensorRT-8.4.3.1/lib:$LD_LIBRARY_PATH

RUN git clone https://github.com/open-mmlab/mmdetection3d.git -b v1.1.1 /root/workspace/mmdetection3d

RUN mim install "mmdet3d==1.2.0"

# Fix Error:
# mmengine 0.7.4 is not comppatible with mmdeploy 1.2.0/ mmdet3d 1.2.0
RUN mim install "mmengine==0.8.0" 

RUN wget -P/root/workspace/weights https://download.openmmlab.com/mmdetection3d/v1.0.0_models/centerpoint/centerpoint_02pillar_second_secfpn_circlenms_4x8_cyclic_20e_nus/centerpoint_02pillar_second_secfpn_circlenms_4x8_cyclic_20e_nus_20220811_031844-191a3822.pth

ENV CUDA_VISIBLE_DEVICES=0
WORKDIR /root/workspace/mmdeploy

# Fix Error:
# 2023-07-14:07:39:00 - root - ERROR - Input shape should be between (5000, 20, 5) and (30000, 20, 5) but get (101, 20, 5).
RUN sed -i "s/5000,/50,/" /root/workspace/mmdeploy/configs/mmdet3d/voxel-detection/voxel-detection_tensorrt_dynamic-nus-20x5.py
RUN sed -i "s/5000],/50],/" /root/workspace/mmdeploy/configs/mmdet3d/voxel-detection/voxel-detection_tensorrt_dynamic-nus-20x5.py
# Fix Index Error
COPY ./voxel_detection_model.py /root/workspace/mmdeploy/mmdeploy/codebase/mmdet3d/deploy/voxel_detection_model.py

# Create TRT Engine
RUN python3 tools/deploy.py --device="cuda" configs/mmdet3d/voxel-detection/voxel-detection_tensorrt_dynamic-nus-20x5.py \
    /root/workspace/mmdetection3d/configs/centerpoint/centerpoint_pillar02_second_secfpn_head-circlenms_8xb4-cyclic-20e_nus-3d.py \
    /root/workspace/weights/centerpoint_02pillar_second_secfpn_circlenms_4x8_cyclic_20e_nus_20220811_031844-191a3822.pth \
    /root/workspace/mmdetection3d/tests/data/nuscenes/sweeps/LIDAR_TOP/n008-2018-09-18-12-07-26-0400__LIDAR_TOP__1537287083900561.pcd.bin \
    --work-dir /root/workspace/centerpoint

# Run TRT Engine
RUN mim install "mmdeploy==1.2.0"
COPY run.py /root/workspace/mmdeploy/run.py
RUN python3 run.py

run.py

from mmdeploy.apis.utils import build_task_processor
from mmdeploy.utils import get_input_shape, load_config
import torch

if __name__ == "__main__":
    base = "/root/workspace"

    deploy_cfg = f'{base}/mmdeploy/configs/mmdet3d/voxel-detection/voxel-detection_tensorrt_dynamic-nus-20x5.py'
    model_cfg = f'{base}/mmdetection3d/configs/centerpoint/centerpoint_pillar02_second_secfpn_head-circlenms_8xb4-cyclic-20e_nus-3d.py'
    device = 'cuda:0'
    backend_model = [f'{base}/centerpoint/end2end.engine']
    image = f'{base}/mmdetection3d/tests/data/nuscenes/sweeps/LIDAR_TOP/n008-2018-09-18-12-07-26-0400__LIDAR_TOP__1537287083900561.pcd.bin'

    deploy_cfg, model_cfg = load_config(deploy_cfg, model_cfg)
    task_processor = build_task_processor(model_cfg, deploy_cfg, device)
    model = task_processor.build_backend_model(backend_model)

    input_shape = get_input_shape(deploy_cfg)
    model_inputs, _ = task_processor.create_input(image, input_shape)

    # do model inference
    with torch.no_grad():
        result = model.test_step(model_inputs)
        print(result)
        print(result[0].pred_instances_3d)
        print(len(result[0].pred_instances_3d))

Output of python3 run.py

root@b21d53d40404:~/workspace/mmdeploy# python3 run.py 
/usr/local/lib/python3.8/dist-packages/mmdet3d/evaluation/functional/kitti_utils/eval.py:10: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  def get_thresholds(scores: np.ndarray, num_gt, num_sample_pts=41):
07/14 10:55:10 - mmengine - WARNING - Failed to search registry with scope "mmdet3d" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet3d" is a correct scope, or whether the registry is initialized.
07/14 10:55:10 - mmengine - WARNING - Failed to search registry with scope "mmdet3d" in the "mmdet3d_tasks" registry tree. As a workaround, the current "mmdet3d_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet3d" is a correct scope, or whether the registry is initialized.
07/14 10:55:10 - mmengine - WARNING - Failed to search registry with scope "mmdet3d" in the "backend_voxel_detectors" registry tree. As a workaround, the current "backend_voxel_detectors" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet3d" is a correct scope, or whether the registry is initialized.
07/14 10:55:10 - mmengine - INFO - Successfully loaded tensorrt plugins from /root/workspace/mmdeploy/mmdeploy/lib/libmmdeploy_tensorrt_ops.so
07/14 10:55:10 - mmengine - INFO - Successfully loaded tensorrt plugins from /root/workspace/mmdeploy/mmdeploy/lib/libmmdeploy_tensorrt_ops.so
[07/14/2023-10:55:12] [TRT] [W] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.2.0
[07/14/2023-10:55:12] [TRT] [W] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.2.0
/usr/local/lib/python3.8/dist-packages/mmdet3d/models/task_modules/coders/centerpoint_bbox_coders.py:207: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  self.post_center_range = torch.tensor(
/usr/local/lib/python3.8/dist-packages/mmdet3d/models/task_modules/coders/centerpoint_bbox_coders.py:207: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  self.post_center_range = torch.tensor(
[<Det3DDataSample(

    META INFORMATION
    box_mode_3d: <Box3DMode.LIDAR: 0>
    transformation_3d_flow: ['R', 'S', 'T']
    pcd_vertical_flip: False
    pcd_trans: array([0., 0., 0.])
    axis_align_matrix: array([[1., 0., 0., 0.],
               [0., 1., 0., 0.],
               [0., 0., 1., 0.],
               [0., 0., 0., 1.]])
    box_type_3d: <class 'mmdet3d.structures.bbox_3d.lidar_box3d.LiDARInstance3DBoxes'>
    lidar_path: '/root/workspace/mmdetection3d/tests/data/nuscenes/sweeps/LIDAR_TOP/n008-2018-09-18-12-07-26-0400__LIDAR_TOP__1537287083900561.pcd.bin'
    pcd_horizontal_flip: False
    pcd_rotation: tensor([[1., 0., 0.],
                [-0., 1., 0.],
                [0., 0., 1.]])
    pcd_rotation_angle: 0.0
    pcd_scale_factor: 1.0
    flip: False

    DATA FIELDS
    pred_instances: <InstanceData(

            META INFORMATION

            DATA FIELDS
        ) at 0x7fb3121d4fd0>
    gt_instances_3d: <InstanceData(

            META INFORMATION

            DATA FIELDS
        ) at 0x7fb3121d4ee0>
    gt_instances: <InstanceData(

            META INFORMATION

            DATA FIELDS
        ) at 0x7fb3354b79a0>
    eval_ann_info: None
    pred_instances_3d: <InstanceData(

            META INFORMATION

            DATA FIELDS
            scores_3d: tensor([], device='cuda:0')
            labels_3d: tensor([], device='cuda:0', dtype=torch.int32)
            bboxes_3d: LiDARInstance3DBoxes(
                    tensor([], device='cuda:0', size=(0, 9)))
        ) at 0x7fb3121d4df0>
    gt_pts_seg: <PointData(

            META INFORMATION

            DATA FIELDS
        ) at 0x7fb3148c86a0>
) at 0x7fb3121d4f10>]
<InstanceData(

    META INFORMATION

    DATA FIELDS
    scores_3d: tensor([], device='cuda:0')
    labels_3d: tensor([], device='cuda:0', dtype=torch.int32)
    bboxes_3d: LiDARInstance3DBoxes(
            tensor([], device='cuda:0', size=(0, 9)))
) at 0x7fb3121d4df0>
0

changed voxel_detection_model.py (your changes cls_score = cls_core[0]...) (click details to see fully)

```python # Copyright (c) OpenMMLab. All rights reserved. from typing import Any, Dict, List, Optional, Sequence, Union import mmcv import torch from mmdet3d.structures.det3d_data_sample import SampleList from mmengine import Config from mmengine.model.base_model.data_preprocessor import BaseDataPreprocessor from mmengine.registry import Registry from mmengine.structures import BaseDataElement, InstanceData from mmdeploy.codebase.base import BaseBackendModel from mmdeploy.utils import (Backend, get_backend, get_codebase_config, load_config) __BACKEND_MODEL = Registry('backend_voxel_detectors') @__BACKEND_MODEL.register_module('end2end') class VoxelDetectionModel(BaseBackendModel): """End to end model for inference of 3d voxel detection. Args: backend (Backend): The backend enum, specifying backend type. backend_files (Sequence[str]): Paths to all required backend files (e.g. '.onnx' for ONNX Runtime, '.param' and '.bin' for ncnn). device (str): A string specifying device type. model_cfg (str | Config): The model config. deploy_cfg (str|Config): Deployment config file or loaded Config object. data_preprocessor (dict|torch.nn.Module): The input preprocessor """ def __init__(self, backend: Backend, backend_files: Sequence[str], device: str, model_cfg: Union[str, Config], deploy_cfg: Union[str, Config], data_preprocessor: Optional[Union[dict, torch.nn.Module]] = None, **kwargs): super().__init__( deploy_cfg=deploy_cfg, data_preprocessor=data_preprocessor) self.model_cfg = model_cfg self.deploy_cfg = deploy_cfg self.device = device self._init_wrapper( backend=backend, backend_files=backend_files, device=device) def _init_wrapper(self, backend: Backend, backend_files: Sequence[str], device: str): """Initialize backend wrapper. Args: backend (Backend): The backend enum, specifying backend type. backend_files (Sequence[str]): Paths to all required backend files (e.g. '.onnx' for ONNX Runtime, '.param' and '.bin' for ncnn). device (str): A string specifying device type. """ output_names = self.output_names self.wrapper = BaseBackendModel._build_wrapper( backend=backend, backend_files=backend_files, device=device, input_names=[self.input_name], output_names=output_names, deploy_cfg=self.deploy_cfg) def forward(self, inputs: dict, data_samples: Optional[List[BaseDataElement]] = None, **kwargs) -> Any: """Run forward inference. Args: inputs (dict): A dict contains `voxels` which wrapped `voxels`, `num_points` and `coors` data_samples (List[BaseDataElement]): A list of meta info for image(s). Returns: list: A list contains predictions. """ preprocessed = inputs['voxels'] input_dict = { 'voxels': preprocessed['voxels'].to(self.device), 'num_points': preprocessed['num_points'].to(self.device), 'coors': preprocessed['coors'].to(self.device) } outputs = self.wrapper(input_dict) num_level = len(outputs) // 3 new_outputs = dict( cls_score=[outputs[f'cls_score{i}'] for i in range(num_level)], bbox_pred=[outputs[f'bbox_pred{i}'] for i in range(num_level)], dir_cls_pred=[ outputs[f'dir_cls_pred{i}'] for i in range(num_level) ]) outputs = new_outputs if data_samples is None: return outputs prediction = VoxelDetectionModel.postprocess( model_cfg=self.model_cfg, deploy_cfg=self.deploy_cfg, outs=outputs, metas=data_samples) return prediction def show_result(self, data: Dict, result: List, out_dir: str, file_name: str, show=False, snapshot=False, **kwargs): from mmcv.parallel import DataContainer as DC from mmdet3d.core import show_result if isinstance(data['points'][0], DC): points = data['points'][0]._data[0][0].numpy() elif mmcv.is_list_of(data['points'][0], torch.Tensor): points = data['points'][0][0] else: ValueError(f"Unsupported data type {type(data['points'][0])} " f'for visualization!') pred_bboxes = result[0]['boxes_3d'] pred_labels = result[0]['labels_3d'] pred_bboxes = pred_bboxes.tensor.cpu().numpy() show_result( points, None, pred_bboxes, out_dir, file_name, show=show, snapshot=snapshot, pred_labels=pred_labels) @staticmethod def convert_to_datasample( data_samples: SampleList, data_instances_3d: Optional[List[InstanceData]] = None, data_instances_2d: Optional[List[InstanceData]] = None, ) -> SampleList: """Convert results list to `Det3DDataSample`. Subclasses could override it to be compatible for some multi-modality 3D detectors. Args: data_samples (list[:obj:`Det3DDataSample`]): The input data. data_instances_3d (list[:obj:`InstanceData`], optional): 3D Detection results of each sample. data_instances_2d (list[:obj:`InstanceData`], optional): 2D Detection results of each sample. Returns: list[:obj:`Det3DDataSample`]: Detection results of the input. Each Det3DDataSample usually contains 'pred_instances_3d'. And the ``pred_instances_3d`` normally contains following keys. - scores_3d (Tensor): Classification scores, has a shape (num_instance, ) - labels_3d (Tensor): Labels of 3D bboxes, has a shape (num_instances, ). - bboxes_3d (Tensor): Contains a tensor with shape (num_instances, C) where C >=7. When there are image prediction in some models, it should contains `pred_instances`, And the ``pred_instances`` normally contains following keys. - scores (Tensor): Classification scores of image, has a shape (num_instance, ) - labels (Tensor): Predict Labels of 2D bboxes, has a shape (num_instances, ). - bboxes (Tensor): Contains a tensor with shape (num_instances, 4). """ assert (data_instances_2d is not None) or \ (data_instances_3d is not None),\ 'please pass at least one type of data_samples' if data_instances_2d is None: data_instances_2d = [ InstanceData() for _ in range(len(data_instances_3d)) ] if data_instances_3d is None: data_instances_3d = [ InstanceData() for _ in range(len(data_instances_2d)) ] for i, data_sample in enumerate(data_samples): data_sample.pred_instances_3d = data_instances_3d[i] data_sample.pred_instances = data_instances_2d[i] return data_samples @staticmethod def postprocess(model_cfg: Union[str, Config], deploy_cfg: Union[str, Config], outs: Dict, metas: Dict): """postprocess outputs to datasamples. Args: model_cfg (Union[str, Config]): The model config from trainning repo deploy_cfg (Union[str, Config]): The deploy config to specify backend and input shape outs (Dict): output bbox, cls and score metas (Dict): DataSample3D for bbox3d render Raises: NotImplementedError: Only support mmdet3d model with `bbox_head` Returns: DataSample3D: datatype for render """ if 'cls_score' not in outs or 'bbox_pred' not in outs or 'dir_cls_pred' not in outs: # noqa: E501 raise RuntimeError('output tensor not found') if 'test_cfg' not in model_cfg.model: raise RuntimeError('test_cfg not found') from mmengine.registry import MODELS cls_score = outs['cls_score'] bbox_pred = outs['bbox_pred'] dir_cls_pred = outs['dir_cls_pred'] batch_input_metas = [data_samples.metainfo for data_samples in metas] head = None cfg = None if 'bbox_head' in model_cfg.model: # pointpillars postprocess head = MODELS.build(model_cfg.model['bbox_head']) cfg = model_cfg.model.test_cfg elif 'pts_bbox_head' in model_cfg.model: # centerpoint postprocess head = MODELS.build(model_cfg.model['pts_bbox_head']) cfg = model_cfg.model.test_cfg.pts else: raise NotImplementedError('mmdet3d model bbox_head not found') if not hasattr(head, 'task_heads'): data_instances_3d = head.predict_by_feat( cls_scores=cls_score, bbox_preds=bbox_pred, dir_cls_preds=dir_cls_pred, batch_input_metas=batch_input_metas, cfg=cfg) data_samples = VoxelDetectionModel.convert_to_datasample( data_samples=metas, data_instances_3d=data_instances_3d) else: pts = model_cfg.model.test_cfg.pts cls_score = cls_score[0] bbox_pred = bbox_pred[0] dir_cls_pred = dir_cls_pred[0] rets = [] scores_range = [0] bbox_range = [0] dir_range = [0] for i, _ in enumerate(head.task_heads): scores_range.append(scores_range[i] + head.num_classes[i]) bbox_range.append(bbox_range[i] + 8) dir_range.append(dir_range[i] + 2) for task_id in range(len(head.num_classes)): num_class_with_bg = head.num_classes[task_id] batch_heatmap = cls_score[:, scores_range[task_id]:scores_range[ task_id + 1], ...].sigmoid() batch_reg = bbox_pred[:, bbox_range[task_id]:bbox_range[task_id] + 2, ...] batch_hei = bbox_pred[:, bbox_range[task_id] + 2:bbox_range[task_id] + 3, ...] if head.norm_bbox: batch_dim = torch.exp(bbox_pred[:, bbox_range[task_id] + 3:bbox_range[task_id] + 6, ...]) else: batch_dim = bbox_pred[:, bbox_range[task_id] + 3:bbox_range[task_id] + 6, ...] batch_vel = bbox_pred[:, bbox_range[task_id] + 6:bbox_range[task_id + 1], ...] batch_rots = dir_cls_pred[:, dir_range[task_id]:dir_range[task_id + 1], ...][:, 0].unsqueeze(1) batch_rotc = dir_cls_pred[:, dir_range[task_id]:dir_range[task_id + 1], ...][:, 1].unsqueeze(1) temp = head.bbox_coder.decode( batch_heatmap, batch_rots, batch_rotc, batch_hei, batch_dim, batch_vel, reg=batch_reg, task_id=task_id) assert pts['nms_type'] in ['circle', 'rotate'] batch_reg_preds = [box['bboxes'] for box in temp] batch_cls_preds = [box['scores'] for box in temp] batch_cls_labels = [box['labels'] for box in temp] if pts['nms_type'] == 'circle': boxes3d = temp[0]['bboxes'] scores = temp[0]['scores'] labels = temp[0]['labels'] centers = boxes3d[:, [0, 1]] boxes = torch.cat([centers, scores.view(-1, 1)], dim=1) from mmdet3d.models.layers import circle_nms keep = torch.tensor( circle_nms( boxes.detach().cpu().numpy(), pts['min_radius'][task_id], post_max_size=pts['post_max_size']), dtype=torch.long, device=boxes.device) boxes3d = boxes3d[keep] scores = scores[keep] labels = labels[keep] ret = dict(bboxes=boxes3d, scores=scores, labels=labels) ret_task = [ret] rets.append(ret_task) else: rets.append( head.get_task_detections(num_class_with_bg, batch_cls_preds, batch_reg_preds, batch_cls_labels, batch_input_metas)) # Merge branches results num_samples = len(rets[0]) ret_list = [] for i in range(num_samples): temp_instances = InstanceData() for k in rets[0][i].keys(): if k == 'bboxes': bboxes = torch.cat([ret[i][k] for ret in rets]) bboxes[:, 2] = bboxes[:, 2] - bboxes[:, 5] * 0.5 bboxes = batch_input_metas[i]['box_type_3d']( bboxes, head.bbox_coder.code_size) elif k == 'scores': scores = torch.cat([ret[i][k] for ret in rets]) elif k == 'labels': flag = 0 for j, num_class in enumerate(head.num_classes): rets[j][i][k] += flag flag += num_class labels = torch.cat([ret[i][k].int() for ret in rets]) temp_instances.bboxes_3d = bboxes temp_instances.scores_3d = scores temp_instances.labels_3d = labels ret_list.append(temp_instances) data_samples = VoxelDetectionModel.convert_to_datasample( metas, data_instances_3d=ret_list) return data_samples def build_voxel_detection_model( model_files: Sequence[str], model_cfg: Union[str, Config], deploy_cfg: Union[str, Config], device: str, data_preprocessor: Optional[Union[Config, BaseDataPreprocessor]] = None, **kwargs): """Build 3d voxel object detection model for different backends. Args: model_files (Sequence[str]): Input model file(s). model_cfg (str | Config): Input model config file or Config object. deploy_cfg (str | Config): Input deployment config file or Config object. device (str): Device to input model data_preprocessor (BaseDataPreprocessor | Config): The data preprocessor of the model. Returns: VoxelDetectionModel: Detector for a configured backend. """ deploy_cfg, model_cfg = load_config(deploy_cfg, model_cfg) backend = get_backend(deploy_cfg) model_type = get_codebase_config(deploy_cfg).get('model_type', 'end2end') backend_detector = __BACKEND_MODEL.build( dict( type=model_type, backend=backend, backend_files=model_files, device=device, model_cfg=model_cfg, deploy_cfg=deploy_cfg, data_preprocessor=data_preprocessor, **kwargs)) return backend_detector ```

RunningLeon commented 1 year ago

@MarvinKlemp hi, sorry for the trouble. I've reproduced the problem and the result was empty. I'll check if there's any workaround for cuda11.x. BTW, have you tried with cuda10.2+trt8.4?

MarvinKlemp commented 1 year ago

Cuda 10.2 + TRT 8.4 also suffers from the index issues + empty results (For CenterPoint). It looks like PointPillars works with 10.2/8.4)

JunLFang commented 1 year ago

@JunLFang My environment is : 2023-07-12 21:58:59,166 - mmdeploy - INFO - TorchVision: 0.9.2+cu102 2023-07-12 21:58:59,166 - mmdeploy - INFO - OpenCV: 4.5.5 2023-07-12 21:58:59,167 - mmdeploy - INFO - MMCV: 1.5.3 2023-07-12 21:58:59,167 - mmdeploy - INFO - MMCV Compiler: GCC 7.5 2023-07-12 21:58:59,167 - mmdeploy - INFO - MMCV CUDA Compiler: 10.2 2023-07-12 21:58:59,167 - mmdeploy - INFO - MMDeploy: 0.9.0+unknown 2023-07-12 21:58:59,167 - mmdeploy - INFO -

2023-07-12 21:58:59,167 - mmdeploy - INFO - Backend information 2023-07-12 21:59:02,964 - mmdeploy - INFO - onnxruntime: 1.8.1 ops_is_avaliable : True 2023-07-12 21:59:03,039 - mmdeploy - INFO - tensorrt: 8.4.3.1 ops_is_avaliable : True 2023-07-12 21:59:03,073 - mmdeploy - INFO - ncnn: None ops_is_avaliable : False 2023-07-12 21:59:03,077 - mmdeploy - INFO - pplnn_is_avaliable: False 2023-07-12 21:59:03,088 - mmdeploy - INFO - openvino_is_avaliable: False 2023-07-12 21:59:03,101 - mmdeploy - INFO - snpe_is_available: False 2023-07-12 21:59:03,134 - mmdeploy - INFO - ascend_is_available: False 2023-07-12 21:59:03,158 - mmdeploy - INFO - coreml_is_available: False 2023-07-12 21:59:03,158 - mmdeploy - INFO -

2023-07-12 21:59:03,159 - mmdeploy - INFO - Codebase information 2023-07-12 21:59:03,342 - mmdeploy - INFO - mmdet: 2.24.0 2023-07-12 21:59:03,342 - mmdeploy - INFO - mmseg: 0.20.0 2023-07-12 21:59:03,342 - mmdeploy - INFO - mmcls: None 2023-07-12 21:59:03,342 - mmdeploy - INFO - mmocr: None 2023-07-12 21:59:03,342 - mmdeploy - INFO - mmedit: None 2023-07-12 21:59:03,343 - mmdeploy - INFO - mmdet3d: 1.0.0rc4 2023-07-12 21:59:03,343 - mmdeploy - INFO - mmpose: None 2023-07-12 21:59:03,343 - mmdeploy - INFO - mmrotate: None

Note, about centerpoint_pillar, i have changed post-process on mmdeploy codebase.

@MarvinKlemp About CUDA 10.2 images, i pull nvidia/cuda: 10.2-cudnn8-devel-ubuntu18.04 on RTX 2080 Ti, then i create a container and install package needed.

I have tried mmdeploy docker image openmmlab/mmdeploy:ubuntu20.04-cuda11.3-mmdeploy, but TRT engine outputs are empty .

Thank you very much for your feedback, now I am trying the ponitpillar , some other error happend , anyway , Thank you for your help

MarvinKlemp commented 1 year ago

@RunningLeon can I assist you in any way?

sylivahf commented 1 year ago

@RunningLeon hi, i have a new trouble【mmdeploy - ERROR - mmdeploy.apis.pytorch2onnx.torch2onnx with Call id: 0 failed】 in converting centerpoint_dynamic_pillar.

Details are below:

2023-07-18 08:33:48,690 - mmdeploy - INFO - Export PyTorch model to ONNX: /mmdeploy/result_centerpoint_dynamic/end2end.onnx.
2023-07-18 08:33:48,713 - mmdeploy - WARNING - Can not find torch.nn.functional._scaled_dot_product_attention, function rewrite will not be applied
2023-07-18 08:33:48,714 - mmdeploy - WARNING - Can not find torch._C._jit_pass_onnx_deduplicate_initializers, function rewrite will not be applied

=======centerpoint_dynamic_pillars forward test!!!=======

/usr/local/lib/python3.6/dist-packages/mmcv/ops/scatter_points.py:114: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if coors.size(-1) == 3:
/usr/local/lib/python3.6/dist-packages/mmcv/ops/deform_conv.py:185: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if not all(map(lambda s: s > 0, output_size)):
/usr/local/lib/python3.6/dist-packages/mmcv/ops/deform_conv.py:89: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  cur_im2col_step = min(ctx.im2col_step, input.size(0))
/usr/local/lib/python3.6/dist-packages/mmcv/ops/deform_conv.py:91: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  ) == 0, 'batch size must be divisible by im2col_step'
Process Process-2:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/mmdeploy/mmdeploy/apis/pytorch2onnx.py", line 110, in torch2onnx
    optimize=optimize)
  File "/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 356, in _wrap
    return self.call_function(func_name_, *args, **kwargs)
  File "/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 326, in call_function
    return self.call_function_local(func_name, *args, **kwargs)
  File "/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 275, in call_function_local
    return pipe_caller(*args, **kwargs)
  File "/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/mmdeploy/mmdeploy/apis/onnx/export.py", line 132, in export
    verbose=verbose)
  File "/usr/local/lib/python3.6/dist-packages/torch/onnx/__init__.py", line 276, in export
    custom_opsets, enable_onnx_checker, use_external_data_format)
  File "/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py", line 94, in export
    use_external_data_format=use_external_data_format)
  File "/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py", line 701, in _export
    dynamic_axes=dynamic_axes)
  File "/mmdeploy/mmdeploy/core/rewriters/rewriter_utils.py", line 383, in wrapper
    return self.func(self, *args, **kwargs)
  File "/mmdeploy/mmdeploy/apis/onnx/optimizer.py", line 10, in model_to_graph__custom_optimizer
    graph, params_dict, torch_out = ctx.origin_func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py", line 468, in _model_to_graph
    module=module)
  File "/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py", line 206, in _optimize_graph
    graph = torch._C._jit_pass_onnx(graph, operator_export_type)
  File "/usr/local/lib/python3.6/dist-packages/torch/onnx/__init__.py", line 309, in _run_symbolic_function
    return utils._run_symbolic_function(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py", line 997, in _run_symbolic_function
    return symbolic_fn(g, *inputs, **attrs)
  File "/usr/local/lib/python3.6/dist-packages/torch/onnx/symbolic_opset11.py", line 466, in constant_pad_nd
    pad = _prepare_onnx_paddings(g, sym_help._get_tensor_rank(input), padding)
  File "/mmdeploy/mmdeploy/core/rewriters/rewriter_utils.py", line 383, in wrapper
    return self.func(self, *args, **kwargs)
  File "/mmdeploy/mmdeploy/pytorch/ops/pad.py", line 32, in _prepare_onnx_paddings__tensorrt
    return ctx.origin_func(g, input, pad)
  File "/usr/local/lib/python3.6/dist-packages/torch/onnx/symbolic_opset11.py", line 445, in _prepare_onnx_paddings
    extension = g.op("Sub", g.op("Mul", g.op("Constant", value_t=torch.tensor(dim, dtype=torch.int64)),
TypeError: an integer is required (got type NoneType) 
(Occurred when translating constant_pad_nd).
2023-07-18 08:33:49,501 - mmdeploy - ERROR - `mmdeploy.apis.pytorch2onnx.torch2onnx` with Call id: 0 failed. exit.

I want to learn about rewriter function files, but i known nothing in mmdeploy.readthedocs. Do you have any guidance? Thank you for your interest and help.

RunningLeon commented 1 year ago

@sylivahf hi, here's a short tutorial you can refer to.

github-actions[bot] commented 1 year ago

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

github-actions[bot] commented 1 year ago

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.

JunLFang commented 1 year ago

@sylivahf hi, maybe you could test with different pytorch(1.8-2.0) and tensorrt(8.2-8.6) and see if there's a good combination.

@RunningLeon Okay, maybe I'll try it later.

Now，i can deploy and infer pointpillars on cuda10.2+tensorrt8.4.3.1+onnxruntime1.8.1。

But i have a new question：I trained the model with mmdet3d:1.0.0rc4, and now mmdeploy version is v0.9. If I want to convert and infer CenterPoint (pillar&dynamic_pillar) on an existing mmdeploy(v0.9), can i refer mmdeploy: v0.11.0 to modify the local mmdeploy: v0.9？

Dear could share the dockerfile if you configure pointpillar env with docker , because I want to run pointpillar , but in cuda 11. the output is empty, I would like to try your configuration : cuda10.2+tensorrt8.4.3.1+onnxruntime1.8.1 Thank you very much

open-mmlab / mmdeploy

Checklist

Describe the bug

Reproduction

Environment

Error traceback

Cuda 12

Cuda 11

Cuda 10.2

add