open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.
https://mmdetection3d.readthedocs.io/en/latest/
Apache License 2.0
5.18k stars 1.52k forks source link

Run Kitti dataset using group free 3D #1031

Closed kx-Z closed 2 years ago

kx-Z commented 2 years ago

RuntimeError: stack expects each tensor to be equal size, but got [19280, 4] at entry 0 and [21555, 4] at entry 1.

The reason for this problem seems to be that the read num points are different. How do I modify the code?

filaPro commented 2 years ago

The first thing is that GroupFree is designed for indoor scenes, so you will hardly reach any reasonable results on outdoor KITTI dataset.

Second, you may provide the full config and traceback of your experiment and error. If it is connected with stacking of different number of points per scene you can try adding dict(type='IndoorPointSample', num_points=20000) to your config.

kx-Z commented 2 years ago

Thank you for your reply,maybe I didn't read carefully enough,but I didn't find the model specifically designed for indoor scenes in relevant papers, such as votenet. Which work mentioned this problem?

kx-Z commented 2 years ago

The configuration is as follows:

`base = [ '../base/datasets/kitti-3d-3class.py', '../base/models/groupfree3d.py', '../base/schedules/schedule_3x.py', '../base/default_runtime.py' ]

model settings

model = dict( backbone=dict( type='PointNet2SASSG', in_channels=3, num_points=(2048, 1024, 512, 256), radius=(0.2, 0.4, 0.8, 1.2), num_samples=(64, 32, 16, 16), sa_channels=((128, 128, 256), (256, 256, 512), (256, 256, 512), (256, 256, 512)), fp_channels=((512, 512), (512, 288)), norm_cfg=dict(type='BN2d'), sa_cfg=dict( type='PointSAModule', pool_mod='max', use_xyz=True, normalize_xyz=True)), bbox_head=dict( num_classes=3, num_decoder_layers=6, size_cls_agnostic=False, bbox_coder=dict( type='GroupFree3DBBoxCoder', num_sizes=3, num_dir_bins=1, with_rot=False, size_cls_agnostic=False, mean_sizes=[[3.9, 1.6, 1.56], [0.8, 0.6, 1.73], [1.76, 0.6, 1.73]]), sampling_objectness_loss=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=8.0), objectness_loss=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), center_loss=dict( type='SmoothL1Loss', beta=0.04, reduction='sum', loss_weight=10.0), dir_class_loss=dict( type='CrossEntropyLoss', reduction='sum', loss_weight=1.0), dir_res_loss=dict( type='SmoothL1Loss', reduction='sum', loss_weight=10.0), size_class_loss=dict( type='CrossEntropyLoss', reduction='sum', loss_weight=1.0), size_res_loss=dict( type='SmoothL1Loss', beta=1.0 / 9.0, reduction='sum', loss_weight=10.0 / 9.0), semantic_loss=dict( type='CrossEntropyLoss', reduction='sum', loss_weight=1.0)), test_cfg=dict( sample_mod='kps', nms_thr=0.25, score_thr=0.0, per_class_proposal=True, prediction_stages='last_three'))

dataset settings

dataset_type = 'KittiDataset' data_root = 'data/kitti/' class_names = ['Pedestrian', 'Cyclist', 'Car'] point_cloud_range = [0, -40, -3, 70.4, 40, 1] input_modality = dict(use_lidar=True, use_camera=False) db_sampler = dict( data_root=data_root, info_path=data_root + 'kitti_dbinfos_train.pkl', rate=1.0, prepare=dict( filter_by_difficulty=[-1], filter_by_min_points=dict(Car=5, Pedestrian=10, Cyclist=10)), classes=class_names, sample_groups=dict(Car=12, Pedestrian=6, Cyclist=6))

file_client_args = dict(backend='disk')

train_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=[0, 1, 2],

use_dim=4,

    file_client_args=file_client_args),
dict(
    type='LoadAnnotations3D',
    with_bbox_3d=True,
    with_label_3d=True,
    file_client_args=file_client_args),
dict(type='ObjectSample', db_sampler=db_sampler),
#dict(type='IndoorPointSample', num_points=20000),
#dict(type='PointSample', num_points=50000),
dict(
    type='ObjectNoise',
    num_try=100,
    translation_std=[1.0, 1.0, 0.5],
    global_rot_range=[0.0, 0.0],
    rot_range=[-0.78539816, 0.78539816]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(
    type='GlobalRotScaleTrans',
    rot_range=[-0.78539816, 0.78539816],
    scale_ratio_range=[0.95, 1.05]),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='PointShuffle'),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])

] test_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=[0, 1, 2],

use_dim=4,

    file_client_args=file_client_args),
dict(
    type='MultiScaleFlipAug3D',
    img_scale=(1333, 800),
    pts_scale_ratio=1,
    flip=False,
    transforms=[
        dict(
            type='GlobalRotScaleTrans',
            rot_range=[0, 0],
            scale_ratio_range=[1., 1.],
            translation_std=[0, 0, 0]),
        dict(type='RandomFlip3D'),
        dict(
            type='PointsRangeFilter', point_cloud_range=point_cloud_range),
        dict(
            type='DefaultFormatBundle3D',
            class_names=class_names,
            with_label=False),
        dict(type='Collect3D', keys=['points'])
    ])

]

construct a pipeline for data and gt loading in show function

please keep its loading function consistent with test_pipeline (e.g. client)

eval_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=[0, 1, 2],

use_dim=4,

    file_client_args=file_client_args),
dict(
    type='DefaultFormatBundle3D',
    class_names=class_names,
    with_label=False),
dict(type='Collect3D', keys=['points'])

]

data = dict( samples_per_gpu=1, workers_per_gpu=2, train=dict( type='RepeatDataset', times=2, dataset=dict( type=dataset_type, data_root=data_root, ann_file=data_root + 'kitti_infos_train.pkl', split='training', pts_prefix='velodyne_reduced', pipeline=train_pipeline, modality=input_modality, classes=class_names, test_mode=False,

we use box_type_3d='LiDAR' in kitti and nuscenes dataset

        # and box_type_3d='Depth' in sunrgbd and scannet dataset.
        box_type_3d='LiDAR')),
val=dict(
    type=dataset_type,
    data_root=data_root,
    ann_file=data_root + 'kitti_infos_val.pkl',
    split='training',
    pts_prefix='velodyne_reduced',
    pipeline=test_pipeline,
    modality=input_modality,
    classes=class_names,
    test_mode=True,
    box_type_3d='LiDAR'),
test=dict(
    type=dataset_type,
    data_root=data_root,
    ann_file=data_root + 'kitti_infos_val.pkl',
    split='training',
    pts_prefix='velodyne_reduced',
    pipeline=test_pipeline,
    modality=input_modality,
    classes=class_names,
    test_mode=True,
    box_type_3d='LiDAR'))

evaluation = dict(interval=1, pipeline=eval_pipeline)

optimizer

lr = 0.002 optimizer = dict( lr=lr, weight_decay=0.0005, paramwise_cfg=dict( custom_keys={ 'bbox_head.decoder_layers': dict(lr_mult=0.1, decay_mult=1.0), 'bbox_head.decoder_self_posembeds': dict( lr_mult=0.1, decay_mult=1.0), 'bbox_head.decoder_cross_posembeds': dict( lr_mult=0.1, decay_mult=1.0), 'bbox_head.decoder_query_proj': dict(lr_mult=0.1, decay_mult=1.0), 'bbox_head.decoder_key_proj': dict(lr_mult=0.1, decay_mult=1.0) }))

optimizer_config = dict(grad_clip=dict(max_norm=0.1, norm_type=2)) lr_config = dict(policy='step', warmup=None, step=[56, 68])

runtime settings

runner = dict(type='EpochBasedRunner', max_epochs=80) checkpoint_config = dict(interval=1, max_keep_ckpts=10)`

error: Original Traceback (most recent call last): File "/home/zhuyi/anaconda3/envs/mmd2/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop data = fetcher.fetch(index) File "/home/zhuyi/anaconda3/envs/mmd2/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/zhuyi/anaconda3/envs/mmd2/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/zhuyi/mmdetection3d-0.17.1/mmdet/datasets/dataset_wrappers.py", line 154, in getitem return self.dataset[idx % self._ori_len] File "/home/zhuyi/mmdetection3d-0.17.1/mmdet3d/datasets/custom_3d.py", line 357, in getitem data = self.prepare_train_data(idx) File "/home/zhuyi/mmdetection3d-0.17.1/mmdet3d/datasets/custom_3d.py", line 154, in prepare_train_data example = self.pipeline(input_dict) File "/home/zhuyi/mmdetection3d-0.17.1/mmdet/datasets/pipelines/compose.py", line 41, in call data = t(data) File "/home/zhuyi/mmdetection3d-0.17.1/mmdet3d/datasets/pipelines/transforms_3d.py", line 329, in call points = points.cat([sampled_points, points]) File "/home/zhuyi/mmdetection3d-0.17.1/mmdet3d/core/points/base_points.py", line 370, in cat torch.cat([p.tensor for p in points_list], dim=0), RuntimeError: Sizes of tensors must match except in dimension 0. Got 4 and 3 in dimension 1 (The offending index is 1)

filaPro commented 2 years ago

Thank you for your reply,maybe I didn't read carefully enough,but I didn't find the model specifically designed for indoor scenes in relevant papers, such as votenet. Which work mentioned this problem?

As I understand it is not about a single work mentioning this problem. You can see that 2 sets of papers introducing indoor (VoteNet, ImVoteNet, GroupFree3D, 3D-MPA, HGNet, BRNet, 3DETR, MLCVNet, VENet, ...) and outdoor (CenterPoint, PartA2, SECOND, MVXNet, 3DSSD, ...) are almost not intersecting. There are a couple of reasons why this division is as it is. The main reason is that outdoor scenes can be good enough approximated by their BEV projection, so almost all outdoor detectors have 2D head. And for indoor scenes it is not an option.

kx-Z commented 2 years ago

Thank you very much for your help.