nuscenes - how to draw 3D bboxes onto multi-view images

Prerequisite

[X] I have searched Issues and Discussions but cannot get the expected help.
[X] I have read the FAQ documentation but cannot get the expected help.
[X] The bug has not been fixed in the latest version (dev-1.x) or latest version (dev-1.0).

Task

I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.

Branch

1.x branch https://github.com/open-mmlab/mmdetection3d/tree/dev-1.x

Environment

sys.platform: linux Python: 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0] CUDA available: True GPU 0,1,2,3,4,5,6,7: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.1, V11.1.105 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.10.0 PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX512
CUDA Runtime 11.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.0.5
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.11.1 OpenCV: 4.8.0 MMCV: 1.6.0 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMDetection: 2.24.0 MMSegmentation: 0.29.1 MMDetection3D: 1.0.0rc5+ spconv2.0: True

Reproduces the problem - code sample

Hello! I am trying to project 3D boxes onto 6 views images of nuscenes dataset().

Attempted Process: I have implemented this using the official tools/misc/visualize_results.py. The main modifications were done in the show() method of the CustomNuScenesDataset class.

` class CustomNuScenesDataset(NuScenesDataset): .......

def show(self, results, out_dir, show=False, pipeline=None): """Results visualization.

Args:
    results (list[dict]): List of bounding boxes results.
    out_dir (str): Output directory of visualization result.
    show (bool): Whether to visualize the results online.
        Default: False.
    pipeline (list[dict], optional): raw data loading for showing.
        Default: None.
"""
assert out_dir is not None, 'Expect out_dir, got none.'
pipeline = self._get_pipeline(pipeline)
for i, result in enumerate(results):
    # 绘制点云和3D框
    if 'pts_bbox' in result.keys():
        result = result['pts_bbox']
    data_info = self.data_infos[i]
    pts_path = data_info['lidar_path']
    file_name = osp.split(pts_path)[-1].split('.')[0]
    points = self._extract_data(i, pipeline, 'points').numpy()
    # for now we convert points into depth mode
    points = Coord3DMode.convert_point(points, Coord3DMode.LIDAR,
                                       Coord3DMode.DEPTH)
    inds = result['scores_3d'] > 0.1
    gt_bboxes = self.get_ann_info(i)['gt_bboxes_3d'].tensor.numpy()
    show_gt_bboxes = Box3DMode.convert(gt_bboxes, Box3DMode.LIDAR,
                                       Box3DMode.DEPTH)
    pred_bboxes = result['boxes_3d'][inds].tensor.numpy()
    show_pred_bboxes = Box3DMode.convert(pred_bboxes, Box3DMode.LIDAR,
                                         Box3DMode.DEPTH)
    show_result(points, show_gt_bboxes, show_pred_bboxes, out_dir,
                file_name, show)

    # 多模态可视化, 在图片上可视化投影 3D 框 
    if 'img_bbox' in result.keys():
        result = result['img_bbox']
    data_info = self.data_infos[i]

    img, img_metas = self._extract_data(i, pipeline,
                                        ['img', 'img_metas'])
    img_path = img_metas['filename']
    file_name = [ osp.split(tmp)[-1].split('.')[0] for tmp in img_path]

    # need to transpose channel to first dim
    img = img.numpy().transpose(0, 2, 3, 1)         # [view, c, w, h] -> [view, w, h, c]

    gt_bboxes = self.get_ann_info(i)['gt_bboxes_3d']
    pred_bboxes = result['boxes_3d']

    for j in range(img.shape[0]):               # for 6-views
        show_multi_modality_result(
            img[j, ...],
            gt_bboxes,
            pred_bboxes,
            img_metas['lidar2img'][j],
            out_dir,
            file_name[j],
            box_mode='lidar',
            show=False)              # false因为没有GUI界面来展示

In addition to the show() method, there is only one function has been added to the CustomNuScenesDataset class as follows: ` def get_data_info(self, index): """Get data info according to the given index.

    Args:
        index (int): Index of the sample data to get.

    Returns:
        dict: Data information that will be passed to the data \
            preprocessing pipelines. It includes the following keys:

            - sample_idx (str): Sample index.
            - pts_filename (str): Filename of point clouds.
            - sweeps (list[dict]): Infos of sweeps.
            - timestamp (float): Sample timestamp.
            - img_filename (str, optional): Image filename.
            - lidar2img (list[np.ndarray], optional): Transformations \
                from lidar to different cameras.
            - ann_info (dict): Annotation info.
    """
    info = self.data_infos[index]
    # standard protocal modified from SECOND.Pytorch
    input_dict = dict(
        sample_idx=info['token'],
        pts_filename=info['lidar_path'],
        sweeps=info['sweeps'],
        timestamp=info['timestamp'] / 1e6,
        img_sweeps=None if 'img_sweeps' not in info else info['img_sweeps'],
        radar_info=None if 'radars' not in info else info['radars']
    )

    if self.return_gt_info:
        input_dict['info'] = info

    if self.modality['use_camera']:
        image_paths = []
        lidar2img_rts = []
        lidar2cam_rts = []
        cam_intrinsics = []
        img_timestamp = []
        for cam_type, cam_info in info['cams'].items():
            img_timestamp.append(cam_info['timestamp'] / 1e6)
            image_paths.append(cam_info['data_path'])
            # obtain lidar to image transformation matrix
            lidar2cam_r = np.linalg.inv(cam_info['sensor2lidar_rotation'])
            lidar2cam_t = cam_info[
                'sensor2lidar_translation'] @ lidar2cam_r.T
            lidar2cam_rt = np.eye(4)
            lidar2cam_rt[:3, :3] = lidar2cam_r.T
            lidar2cam_rt[3, :3] = -lidar2cam_t
            intrinsic = cam_info['cam_intrinsic']
            viewpad = np.eye(4)
            viewpad[:intrinsic.shape[0], :intrinsic.shape[1]] = intrinsic
            lidar2img_rt = (viewpad @ lidar2cam_rt.T)
            lidar2img_rts.append(lidar2img_rt)

            cam_intrinsics.append(viewpad)
            lidar2cam_rts.append(lidar2cam_rt.T)

        input_dict.update(
            dict(
                img_timestamp=img_timestamp,
                img_filename=image_paths,
                lidar2img=lidar2img_rts,
                cam_intrinsic=cam_intrinsics,
                lidar2cam=lidar2cam_rts,
            ))

    if not self.test_mode:
        annos = self.get_ann_info(index)
        input_dict['ann_info'] = annos

    return input_dict`

I look forward to your response and appreciate any guidance from experienced individuals in this matter!

Reproduces the problem - command or script

none

Reproduces the problem - error message

For the latter part of the show() method, specifically in the "Visualizing projected 3D boxes on the image在图片上可视化投影 3D 框 " section, an error occurs during the execution of show_multi_modality_result(). The issue seems to be related to drawing the projection of pred_bboxes using the cv2 library.

Exception has occurred: error (note: full exception trace is shown but execution is paused at: _run_module_as_main) OpenCV(4.8.0) : -1 error: (-5:Bad argument) in function 'line'

Overload resolution failed:

Can't parse 'pt1'. Sequence item with index 0 has a wrong type Can't parse 'pt1'. Sequence item with index 0 has a wrong type File "/opt/conda/lib/python3.7/site-packages/mmdet3d/core/visualizer/image_vis.py", line 88, in plot_rect3d_on_img cv2.LINE_AA) File "/opt/conda/lib/python3.7/site-packages/mmdet3d/core/visualizer/image_vis.py", line 128, in draw_lidar_bbox3d_on_img return plot_rect3d_on_img(img, num_bbox, imgfov_pts_2d, color, thickness) File "/opt/conda/lib/python3.7/site-packages/mmdet3d/core/visualizer/show_result.py", line 290, in show_multi_modality_result pred_bboxes, img, proj_mat, img_metas, color=pred_bbox_color) File "/3dv/CMT/projects/mmdet3d_plugin/datasets/custom_nuscenes_dataset.py", line 176, in show show=False) File "/3dv/CMT/tools/result_models.py", line 63, in main dataset.show(results, args.show_dir, pipeline=eval_pipeline) File "/3dv/CMT/tools/result_models.py", line 74, in main() File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main (Current frame) "main", mod_spec) cv2.error: OpenCV(4.8.0) 👎 error: (-5:Bad argument) in function 'line' Overload resolution failed: Can't parse 'pt1'. Sequence item with index 0 has a wrong type Can't parse 'pt1'. Sequence item with index 0 has a wrong type

I have noticed that thegt_bboxes.cornersand pred_bboxes.corners inputs contain a large number of negative values. I'm not sure if this is the issue, but while I can visualize gt_bboxes (although it seems to be drawn on the feature map), I encounter an error when trying to visualize pred_bboxes with the error above.

Additional information

No response

open-mmlab / mmdetection3d