open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.
https://mmdetection3d.readthedocs.io/en/latest/
Apache License 2.0
5.26k stars 1.53k forks source link

Error training Part-A2 on nuscenes #2066

Open AitorIglesias opened 1 year ago

AitorIglesias commented 1 year ago

Prerequisite

Task

I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.

Branch

master branch https://github.com/open-mmlab/mmdetection3d

Environment

sys.platform: linux Python: 3.7.7 (default, May 7 2020, 21:25:33) [GCC 7.3.0] CUDA available: True GPU 0,1,2,3: Tesla T4 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 10.1, V10.1.24 GCC: gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 PyTorch: 1.6.0 PyTorch compiling details: PyTorch built with:

TorchVision: 0.7.0 OpenCV: 4.6.0 MMCV: 1.6.2 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 10.1 MMDetection: 2.25.2 MMSegmentation: 0.29.0 MMDetection3D: 1.0.0rc5+8ea1752 spconv2.0: False

Reproduces the problem - code sample

cofings/base/models/parta2-nus.py

# model settings
voxel_size = [0.25, 0.25, 8]
point_cloud_range = [-50, -50, -5, 50, 50, 3]

model = dict(
    type='PartA2',
    voxel_layer=dict(
        max_num_points=64,  # max_points_per_voxel
        point_cloud_range=point_cloud_range,
        voxel_size=voxel_size,
        max_voxels=(16000, 40000)  # (training, testing) max_voxels
    ),
    voxel_encoder=dict(type='HardSimpleVFE'),
    middle_encoder=dict(
        type='SparseUNet',
        in_channels=4,
        sparse_shape=[41, 1600, 1408],
        order=('conv', 'norm', 'act')),
   backbone=dict(
        type='SECOND',
        in_channels=256,
        layer_nums=[5, 5],
        layer_strides=[1, 2],
        out_channels=[128, 256]),
    neck=dict(
        type='FPN',
        norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01),
        act_cfg=dict(type='ReLU'),
        in_channels=[128, 256],
        out_channels=512,
        start_level=0,
        num_outs=3),
    rpn_head=dict(
        type='PartA2RPNHead',
        num_classes=10,
        in_channels=512,
        feat_channels=512,
        use_direction_classifier=True,
        anchor_generator=dict(
            type='Anchor3DRangeGenerator',
            ranges=[[-50, -50, -1.8, 50, 50, -1.8]],
            scales=[1, 2, 4],
            sizes=[
                [2.5981, 0.8660, 1.],  # 1.5 / sqrt(3)
                [1.7321, 0.5774, 1.],  # 1 / sqrt(3)
                [1., 1., 1.],
                [0.4, 0.4, 1],
            ],
            rotations=[0, 1.57],
            reshape_out=True),
        assigner_per_size=False,
        diff_rad_by_sin=True,
        dir_offset=-0.7854,  # -pi / 4,
        bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0),
        loss_dir=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)),
    roi_head=dict(
        type='PartAggregationROIHead',
        num_classes=10,
        semantic_head=dict(
            type='PointwiseSemanticHead',
            in_channels=16,
            extra_width=0.2,
            seg_score_thr=0.3,
            num_classes=10,
            loss_seg=dict(
                type='FocalLoss',
                use_sigmoid=True,
                reduction='sum',
                gamma=2.0,
                alpha=0.25,
                loss_weight=1.0),
            loss_part=dict(
                type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)),
        seg_roi_extractor=dict(
            type='Single3DRoIAwareExtractor',
            roi_layer=dict(
                type='RoIAwarePool3d',
                out_size=14,
                max_pts_per_voxel=128,
                mode='max')),
        part_roi_extractor=dict(
            type='Single3DRoIAwareExtractor',
            roi_layer=dict(
                type='RoIAwarePool3d',
                out_size=14,
                max_pts_per_voxel=128,
                mode='avg')),
        bbox_head=dict(
            type='PartA2BboxHead',
            num_classes=10,
            seg_in_channels=16,
            part_in_channels=4,
            seg_conv_channels=[64, 64],
            part_conv_channels=[64, 64],
            merge_conv_channels=[128, 128],
            down_conv_channels=[128, 256],
            bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
            shared_fc_channels=[256, 512, 512, 512],
            cls_channels=[256, 256],
            reg_channels=[256, 256],
            dropout_ratio=0.1,
            roi_feat_size=14,
            with_corner_loss=True,
            loss_bbox=dict(
                type='SmoothL1Loss',
                beta=1.0 / 9.0,
                reduction='sum',
                loss_weight=1.0),
            loss_cls=dict(
                type='CrossEntropyLoss',
                use_sigmoid=True,
                reduction='sum',
                loss_weight=1.0))),
    # model training and testing settings
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                iou_calculator=dict(type='BboxOverlapsNearest3D'),
                pos_iou_thr=0.5,
                neg_iou_thr=0.4,
                min_pos_iou=0.4,
                ignore_iof_thr=-1,
                gpu_assign_thr=1),
            allowed_border=0,
            code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=9000,
            nms_post=512,
            max_num=512,
            nms_thr=0.8,
            score_thr=0,
            use_rotate_nms=False),
        rcnn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                iou_calculator=dict(type='BboxOverlapsNearest3D'),
                pos_iou_thr=0.55,
                neg_iou_thr=0.55,
                min_pos_iou=0.55,
                ignore_iof_thr=-1,
                gpu_assign_thr=1),
            sampler=dict(
                type='IoUNegPiecewiseSampler',
                num=128,
                pos_fraction=0.55,
                neg_piece_fractions=[0.8, 0.2],
                neg_iou_piece_thrs=[0.55, 0.1],
                neg_pos_ub=-1,
                add_gt_as_proposals=False,
                return_iou=True),
            cls_pos_thr=0.75,
            cls_neg_thr=0.25)),
    test_cfg=dict(
        rpn=dict(
            nms_pre=1024,
            nms_post=100,
            max_num=100,
            nms_thr=0.7,
            score_thr=0,
            use_rotate_nms=True),
        rcnn=dict(
            use_rotate_nms=True,
            use_raw_score=True,
            nms_thr=0.01,
            score_thr=0.1)))

cofings/parta2/hv_PartA2_secfpn_2x8_cyclic_80e_nus3d.py

_base_ = [
    '../_base_/schedules/cyclic_40e.py', '../_base_/default_runtime.py',
    '../_base_/models/parta2_nus.py'
]

point_cloud_range = [-50, -50, -5, 50, 50, 3]
# For nuScenes we usually do 10-class detection
class_names = [
    'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle',
    'motorcycle', 'pedestrian', 'traffic_cone', 'barrier'
]
dataset_type = 'NuScenesDataset'
data_root = 'data/nuscenes/'
# Input modality for nuScenes dataset, this is consistent with the submission
# format which requires the information in input_modality.
input_modality = dict(
    use_lidar=True,
    use_camera=False,
    use_radar=False,
    use_map=False,
    use_external=False)
file_client_args = dict(backend='disk')

train_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=5,
        use_dim=5,
        file_client_args=file_client_args),
    dict(
        type='LoadPointsFromMultiSweeps',
        sweeps_num=10,
        file_client_args=file_client_args),
    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
    dict(
        type='GlobalRotScaleTrans',
        rot_range=[-0.3925, 0.3925],
        scale_ratio_range=[0.95, 1.05],
        translation_std=[0, 0, 0]),
    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
    dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='ObjectNameFilter', classes=class_names),
    dict(type='PointShuffle'),
    dict(type='DefaultFormatBundle3D', class_names=class_names),
    dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
test_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=5,
        use_dim=5,
        file_client_args=file_client_args),
    dict(
        type='LoadPointsFromMultiSweeps',
        sweeps_num=10,
        file_client_args=file_client_args),
    dict(
        type='MultiScaleFlipAug3D',
        img_scale=(1333, 800),
        pts_scale_ratio=1,
        flip=False,
        transforms=[
            dict(
                type='GlobalRotScaleTrans',
                rot_range=[0, 0],
                scale_ratio_range=[1., 1.],
                translation_std=[0, 0, 0]),
            dict(type='RandomFlip3D'),
            dict(
                type='PointsRangeFilter', point_cloud_range=point_cloud_range),
            dict(
                type='DefaultFormatBundle3D',
                class_names=class_names,
                with_label=False),
            dict(type='Collect3D', keys=['points'])
        ])
]
# construct a pipeline for data and gt loading in show function
# please keep its loading function consistent with test_pipeline (e.g. client)
eval_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=5,
        use_dim=5,
        file_client_args=file_client_args),
    dict(
        type='LoadPointsFromMultiSweeps',
        sweeps_num=10,
        file_client_args=file_client_args),
    dict(
        type='DefaultFormatBundle3D',
        class_names=class_names,
        with_label=False),
    dict(type='Collect3D', keys=['points'])
]

data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'nuscenes_infos_train.pkl',
        pipeline=train_pipeline,
        classes=class_names,
        modality=input_modality,
        test_mode=False,
        # we use box_type_3d='LiDAR' in kitti and nuscenes dataset
        # and box_type_3d='Depth' in sunrgbd and scannet dataset.
        box_type_3d='LiDAR',
        with_velocity=False),
    val=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'nuscenes_infos_val.pkl',
        pipeline=test_pipeline,
        classes=class_names,
        modality=input_modality,
        test_mode=True,
        box_type_3d='LiDAR',
        with_velocity=False),
    test=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'nuscenes_infos_val.pkl',
        pipeline=test_pipeline,
        classes=class_names,
        modality=input_modality,
        test_mode=True,
        box_type_3d='LiDAR',
        with_velocity=False))
# For nuScenes dataset, we usually evaluate the model at the end of training.
# Since the models are trained by 40 epochs by default, we set evaluation
# interval to be 24. Please change the interval accordingly if you do not
# use a default schedule.
# Part-A2 uses a different learning rate from what SECOND uses.
lr = 0.001
optimizer = dict(lr=lr)
evaluation = dict(interval=40, pipeline=eval_pipeline)
find_unused_parameters = True

Reproduces the problem - command or script

python tools/train.py configs/parta2/hv_PartA2_secfpn_2x4_cyclic_80e_nus3d.py --work-dir checkpoints/

Reproduces the problem - error message

File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/user/.vscode-server/extensions/ms-python.python-2022.18.2/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/main.py", line 39, in cli.main() File "/home/user/.vscode-server/extensions/ms-python.python-2022.18.2/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main run() File "/home/user/.vscode-server/extensions/ms-python.python-2022.18.2/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file runpy.run_path(target, run_name="main") File "/home/user/.vscode-server/extensions/ms-python.python-2022.18.2/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 322, in run_path pkg_name=pkg_name, script_name=fname) File "/home/user/.vscode-server/extensions/ms-python.python-2022.18.2/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 136, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "/home/user/.vscode-server/extensions/ms-python.python-2022.18.2/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code exec(code, run_globals) File "tools/train.py", line 263, in main() File "tools/train.py", line 259, in main meta=meta) File "/mmdetection3d/mmdet3d/apis/train.py", line 351, in train_model meta=meta) File "/mmdetection3d/mmdet3d/apis/train.py", line 319, in train_detector runner.run(data_loaders, cfg.workflow) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 136, in run epoch_runner(data_loaders[i], kwargs) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 53, in train self.run_iter(data_batch, train_mode=True, kwargs) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 32, in run_iter kwargs) File "/opt/conda/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 77, in train_step return self.module.train_step(inputs[0], kwargs[0]) File "/opt/conda/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 248, in train_step losses = self(data) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(input, kwargs) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 116, in new_func return old_func(args, kwargs) File "/mmdetection3d/mmdet3d/models/detectors/base.py", line 60, in forward return self.forward_train(kwargs) File "/mmdetection3d/mmdet3d/models/detectors/parta2.py", line 132, in forward_train gt_bboxes_3d, gt_labels_3d) File "/mmdetection3d/mmdet3d/models/roi_heads/part_aggregation_roi_head.py", line 125, in forward_train voxels_dict, sample_results) File "/mmdetection3d/mmdet3d/models/roi_heads/part_aggregation_roi_head.py", line 189, in _bbox_forward_train rois) File "/mmdetection3d/mmdet3d/models/roi_heads/part_aggregation_roi_head.py", line 222, in _bbox_forward pooled_part_feats) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(input, kwargs) File "/mmdetection3d/mmdet3d/models/roi_heads/bbox_heads/parta2_bbox_head.py", line 270, in forward x_part = self.part_conv(part_features) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, *kwargs) File "/opt/conda/lib/python3.7/site-packages/mmcv/ops/sparse_modules.py", line 135, in forward input = module(input) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(input, kwargs) File "/opt/conda/lib/python3.7/site-packages/mmcv/ops/sparse_modules.py", line 139, in forward input.features = module(input.features) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 136, in forward self.weight, self.bias, bn_training, exponential_average_factor, self.eps) File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 2012, in batch_norm _verify_batch_size(input.size()) File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 1995, in _verify_batch_size raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size)) ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 64])

Additional information

I am trying to train PartA2 with the nuscenes dataset (the mini version), for this I have modified the files cofings/base/models/parta2.py and cofings/parta2/hv_PartA2_secfpn_2x8_cyclic_80e_kitti-3d-3class.py. I have saved these in cofings/base/models/parta2-nus.py and cofings/parta2/hv_PartA2_secfpn_2x8_cyclic_80e_nus3d.py. These files are the ones specified above.

When I try to train the model I get the error specified above, I have been looking for a solution but I only find that it is because the batch size is equal to 1 and that is not my case since I have batch size equal to 2.

I would like to know why the error occurs and how to solve it in order to train Part-A2 with the nuscene dataset.

JingweiZhang12 commented 1 year ago

Some tips:

  1. You can check the logs and see the samples_per_gpu you really used in the printed config in training.
  2. Make sure you install the mmdet3d in the develop mode.
  3. Make sure PYTHONPATH is the path of your modified project.
AitorIglesias commented 1 year ago

Thanks for the tips @JingweiZhang12 ,

  1. I have already checked in the log the samples_per_gpu=2.
  2. I have also checked that mmdet3d is installed in develop mode.
  3. And the PYTHONPATH is the one of my project.

Even so the error is still occurring

JingweiZhang12 commented 1 year ago

The number of proposals fed into the roi_head is equal to 1. Weird! It's too small. You can check it.

AitorIglesias commented 1 year ago

@JingweiZhang12 I have been debuging a bit, and actually the number of proposals fed into the roi_head is equal to 2, since it is the same length as the batch size.

Screenshot from 2022-12-01 09-13-22

I can increase the batch size to 3, but no more, even so the error is still ocurring. There might be another way to fix the error.