open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.
https://mmdetection3d.readthedocs.io/en/latest/
Apache License 2.0
5.18k stars 1.52k forks source link

implementing Centerpoint on KITTI dataset #871

Closed Yaziwel closed 3 years ago

Yaziwel commented 3 years ago

Hello, I have tried to implement Centerpoint on kitti dataset and the results are shown below: image As you can see, it has reached reasonable results on most of the evaluation metrics except AOS. My first concern is that if it is related to the fact that the KITTI lidar frame is 90 degrees rotated from nuScenes lidar frame? If it is, what should I do to revise this error? Second, Centerpoint is a two-stage detector and I do not know if current reimplementation has achieved this?

Below is my config borrowed from SECOND and CENTERPOINT. In the meanwhile, I removed the velocity params in the model setting.

voxel_size = [0.05, 0.05, 0.1]
point_cloud_range=[0, -40, -3, 70.4, 40, 1]
# point_cloud_range = [0, -39.68, -3, 69.12, 39.68, 1]
model = dict(
    type='CenterPoint',
    pts_voxel_layer=dict(
        max_num_points=5, voxel_size=voxel_size, max_voxels=(16000, 40000),point_cloud_range=point_cloud_range),
    pts_voxel_encoder=dict(type='HardSimpleVFE', num_features=4),  ##Voxel feature encoder [voxel_num, num_per_v, num_features]->[voxel_num, num_features]
    pts_middle_encoder=dict(
        type='SparseEncoder',
        in_channels=4,
        sparse_shape=[41, 1600, 1408],
        output_channels=128,
        order=('conv', 'norm', 'act'),
        encoder_channels=((16, 16, 32), (32, 32, 64), (64, 64, 128), (128,
                                                                      128)),
        encoder_paddings=((0, 0, 1), (0, 0, 1), (0, 0, [0, 1, 1]), (0, 0)),
        block_type='basicblock'), ### return [N, C * D, H, W]
    pts_backbone=dict(
        type='SECOND',
        in_channels=256,
        out_channels=[128, 256],
        layer_nums=[5, 5],
        layer_strides=[1, 2],
        norm_cfg=dict(type='BN', eps=1e-3, momentum=0.01),
        conv_cfg=dict(type='Conv2d', bias=False)),
    pts_neck=dict(
        type='SECONDFPN',
        in_channels=[128, 256],
        out_channels=[256, 256],
        upsample_strides=[1, 2],
        norm_cfg=dict(type='BN', eps=1e-3, momentum=0.01),
        upsample_cfg=dict(type='deconv', bias=False),
        use_conv_for_no_stride=True),
    pts_bbox_head=dict(
        type='CenterHead',
        in_channels=sum([256, 256]),
        tasks=[
            dict(num_class=1, class_names=['Car']),
            dict(num_class=1, class_names=['Pedestrian']),
            dict(num_class=1, class_names=['Cyclist']),
        ],
        common_heads=dict(
            reg=(2, 2), height=(1, 2), dim=(3, 2), rot=(2, 2)),
        share_conv_channel=64,
        bbox_coder=dict(
            type='CenterPointBBoxCoder',
            post_center_range=point_cloud_range,
            max_num=100,
            score_threshold=0.1,
            out_size_factor=8,
            voxel_size=voxel_size[:2],
            code_size=7,
            pc_range=point_cloud_range[:2],
            ),
        separate_head=dict(
            type='SeparateHead', init_bias=-2.19, final_kernel=3),
        loss_cls=dict(type='GaussianFocalLoss', reduction='mean'),
        loss_bbox=dict(type='L1Loss', reduction='mean', loss_weight=0.25),
        norm_bbox=True),
    # model training and testing settings
    train_cfg=dict(

        pts=dict(
            point_cloud_range=point_cloud_range,
            grid_size=[1408, 1600, 40],
            voxel_size=voxel_size,
            out_size_factor=8,
            dense_reg=1,
            gaussian_overlap=0.1,
            max_objs=500,
            min_radius=2,
            code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0])),
    test_cfg=dict(
        pts=dict(
            point_cloud_range=point_cloud_range,
            post_center_limit_range=point_cloud_range,
            max_per_img=500,
            max_pool_nms=False,
            min_radius=[4, 12, 10, 1, 0.85, 0.175],
            score_threshold=0.1,
            out_size_factor=4,
            voxel_size=voxel_size[:2],
            nms_type='rotate',
            pre_max_size=4096,
            post_max_size=512,
            nms_thr=0.2)))

dataset_type = 'KittiDataset'
data_root = 'data/kitti/'
class_names = ['Pedestrian','Cyclist','Car']

input_modality = dict(use_lidar=True, use_camera=False)
db_sampler = dict(
    data_root=data_root,
    info_path=data_root + 'kitti_dbinfos_train.pkl',
    rate=1.0,
    prepare=dict(
        filter_by_difficulty=[-1],
        filter_by_min_points=dict(Car=5, Pedestrian=10, Cyclist=10)),
    classes=class_names,
    sample_groups=dict(Car=12, Pedestrian=10, Cyclist=10))

file_client_args = dict(backend='disk')
train_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=4,
        use_dim=4,
        file_client_args=file_client_args),
    dict(
        type='LoadAnnotations3D',
        with_bbox_3d=True,
        with_label_3d=True,
        file_client_args=file_client_args),
    dict(type='ObjectSample', db_sampler=db_sampler),
    dict(
        type='ObjectNoise',
        num_try=100,
        translation_std=[1.0, 1.0, 0.5],
        global_rot_range=[0.0, 0.0],
        rot_range=[-0.78539816, 0.78539816]),
    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
    dict(
        type='GlobalRotScaleTrans',
        rot_range=[-0.78539816, 0.78539816],
        scale_ratio_range=[0.95, 1.05]),
    dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='PointShuffle'),
    dict(type='DefaultFormatBundle3D', class_names=class_names),
    dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
test_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=4,
        use_dim=4,
        file_client_args=file_client_args),
    dict(
        type='MultiScaleFlipAug3D',
        img_scale=(1333, 800),
        pts_scale_ratio=1,
        flip=False,
        transforms=[
            dict(
                type='GlobalRotScaleTrans',
                rot_range=[0, 0],
                scale_ratio_range=[1., 1.],
                translation_std=[0, 0, 0]),
            dict(type='RandomFlip3D'),
            dict(
                type='PointsRangeFilter', point_cloud_range=point_cloud_range),
            dict(
                type='DefaultFormatBundle3D',
                class_names=class_names,
                with_label=False),
            dict(type='Collect3D', keys=['points'])
        ])
]
# construct a pipeline for data and gt loading in show function
# please keep its loading function consistent with test_pipeline (e.g. client)
eval_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=4,
        use_dim=4,
        file_client_args=file_client_args),
    dict(
        type='DefaultFormatBundle3D',
        class_names=class_names,
        with_label=False),
    dict(type='Collect3D', keys=['points'])
]

data = dict(
    samples_per_gpu=8,
    workers_per_gpu=4,
    train=dict(
        type='RepeatDataset',
        times=1,
        dataset=dict(
            type=dataset_type,
            data_root=data_root,
            ann_file=data_root + 'kitti_infos_train.pkl',
            split='training',
            pts_prefix='velodyne_reduced',
            pipeline=train_pipeline,
            modality=input_modality,
            classes=class_names,
            test_mode=False,
            # we use box_type_3d='LiDAR' in kitti and nuscenes dataset
            # and box_type_3d='Depth' in sunrgbd and scannet dataset.
            box_type_3d='LiDAR')),
    val=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'kitti_infos_val.pkl',
        split='training',
        pts_prefix='velodyne_reduced',
        pipeline=test_pipeline,
        modality=input_modality,
        classes=class_names,
        test_mode=True,
        box_type_3d='LiDAR'),
    test=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'kitti_infos_val.pkl',
        split='training',
        pts_prefix='velodyne_reduced',
        pipeline=test_pipeline,
        modality=input_modality,
        classes=class_names,
        test_mode=True,
        box_type_3d='LiDAR'))

evaluation = dict(interval=1, pipeline=eval_pipeline)

optimizer = dict(type='AdamW', lr=0.001, betas=(0.95, 0.99), weight_decay=0.01)
# optimizer = dict(type='Adam', lr=0.0002, betas=(0.9, 0.99), weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))

lr_config = dict(
    policy='cyclic',
    target_ratio=(10, 0.0001),
    cyclic_times=1,
    step_ratio_up=0.4)

momentum_config = dict(
    policy='cyclic',
    target_ratio=(0.8947368421052632, 1),
    cyclic_times=1,
    step_ratio_up=0.4)
runner = dict(type='EpochBasedRunner', max_epochs=80)

checkpoint_config = dict(interval=1)
log_config = dict(
    interval=50,
    hooks=[dict(type='TextLoggerHook'),
           dict(type='TensorboardLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/centerpoint_kitti'
load_from = None
resume_from = None
workflow = [('train', 1)]
gpu_ids = range(0, 1)
Yaziwel commented 3 years ago

Also, I am eager for your official implementation.

Tai-Wang commented 3 years ago

As your detection performance is reasonable, I guess there are some problems with the direction class setting, i.e., only the direction class is incorrect in your case. The overall performance is almost good enough. According to the implementation provided by the CenterPoint's authors, the one-stage CenterPoint should achieve the similar result with SECOND. You can refer to this repo.

tianweiy commented 3 years ago

Adding to the point, I think there are something different in mmdet3d's implementation and the aos will be better if you apply some transform to the rotation output (add 90 degree or sth, I don't remember exactly). Additionally, you can just ignore the aos number and only care about ap.

regarding two stage, you can see my comments here https://github.com/tianweiy/CenterPoint-KITTI/issues/20

xiaoMrzhang commented 3 years ago

I have meet same things before when I try to training centerpoint in KITTI dataset, and I find the roty of KITTI dataloader is different from centerpoint_head . You can just add pi in centerpoint_head.py file and aos will be fine.

Tai-Wang commented 3 years ago

Close due to inactivity. Feel free to reopen it if you have any further questions.

BigPig117 commented 2 years ago

Traceback (most recent call last): File "train.py", line 202, in main() File "train.py", line 116, in main model = build_network(model_cfg=cfg.MODEL, num_class=len(cfg.CLASS_NAMES), dataset=train_set) File "../pcdet/models/init.py", line 18, in build_network model_cfg=model_cfg, num_class=num_class, dataset=dataset File "../pcdet/models/detectors/init.py", line 30, in build_detector model_cfg=model_cfg, num_class=num_class, dataset=dataset File "../pcdet/models/detectors/centerpoint.py", line 7, in init self.module_list = self.build_networks() File "../pcdet/models/detectors/detector3d_template.py", line 47, in build_networks model_info_dict=model_info_dict File "../pcdet/models/detectors/detector3d_template.py", line 136, in build_dense_head voxel_size=model_info_dict.get('voxel_size', False) File "../pcdet/models/dense_heads/center_head.py", line 66, in init [self.class_names.index(x) for x in cur_class_names if x in class_names] RuntimeError: CUDA error: out of memory

Hello,I tried to use it on openpcdet, but I encountered an error. Can you give me some suggestions base on GTX3070 cuda11.1 pytorch1.8.2