open-mmlab / OpenPCDet

OpenPCDet Toolbox for LiDAR-based 3D Object Detection.
Apache License 2.0
4.73k stars 1.31k forks source link

PV-RCNN++ training custom dataset #1162

Closed luoxiaoliaolan closed 1 year ago

luoxiaoliaolan commented 2 years ago

Hi! I want to try to use PV-RCNN++ to train my own data. My data is basically organized in the format of the KITTI dataset. The dimensions of the point cloud include (x, y, z, intensity), which appear during the training process. this error: Traceback (most recent call last): | 0/60545 [00:00<?, ?it/s] File "train.py", line 222, in <module> main() File "train.py", line 168, in main train_model( File "/mnt/NAS/liuyb/OpenPCDet/tools/train_utils/train_utils.py", line 150, in train_model accumulated_iter = train_one_epoch( File "/mnt/NAS/liuyb/OpenPCDet/tools/train_utils/train_utils.py", line 52, in train_one_epoch loss, tb_dict, disp_dict = model_func(model, batch) File "/mnt/NAS/liuyb/OpenPCDet/pcdet/models/__init__.py", line 42, in model_func ret_dict, tb_dict, disp_dict = model(batch_dict) File "/home/user/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/mnt/NAS/liuyb/OpenPCDet/pcdet/models/detectors/pv_rcnn_plusplus.py", line 13, in forward batch_dict = self.backbone_2d(batch_dict) File "/home/user/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/mnt/NAS/liuyb/OpenPCDet/pcdet/models/backbones_2d/base_bev_backbone.py", line 93, in forward x = self.blocks[i](x) File "/home/user/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/user/anaconda3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward input = module(input) File "/home/user/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/user/anaconda3/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 446, in forward return self._conv_forward(input, self.weight, self.bias) File "/home/user/anaconda3/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 442, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Given groups=1, weight of size [128, 256, 3, 3], expected input[4, 384, 152, 377] to have 256 channels, but got 384 channels instead

I don't know whether I should adjust the configuration of the training model or the process of data processing, please take a look at this issue and eagerly wait for help

Attach pv_rcnn_plusplus.yaml: `CLASS_NAMES: ['car', 'pedestrian', 'bicycle', 'tricycle', 'cyclist', 'motorcyclist', 'tricyclist', 'van', 'bus', 'truck', 'mini_truck', 'special_vehicle', 'traffic_cone', 'small_movable', 'small_unmovable', 'crash_barrel', 'construction_sign', 'noise', 'water_horse', 'other']

DATA_CONFIG: _BASECONFIG: cfgs/dataset_configs/as_dataset.yaml OUTPUT_PATH: '/mnt/NAS/liuyb/OpenPCDet/model'

MODEL: NAME: PVRCNNPlusPlus

VFE:
    NAME: MeanVFE

BACKBONE_3D:
    NAME: VoxelBackBone8x

MAP_TO_BEV:
    NAME: HeightCompression
    NUM_BEV_FEATURES: 256

BACKBONE_2D:
    NAME: BaseBEVBackbone

    LAYER_NUMS: [5, 5]
    LAYER_STRIDES: [1, 2]
    NUM_FILTERS: [128, 256]
    UPSAMPLE_STRIDES: [1, 2]
    NUM_UPSAMPLE_FILTERS: [256, 256]

DENSE_HEAD:
    NAME: CenterHead
    CLASS_AGNOSTIC: False

    CLASS_NAMES_EACH_HEAD: [
        [ 'car', 'pedestrian', 'bicycle', 'tricycle', 'cyclist', 'motorcyclist', 'tricyclist',
               'van', 'bus', 'truck', 'mini_truck', 'special_vehicle', 'traffic_cone', 'small_movable',
               'small_unmovable', 'crash_barrel', 'construction_sign', 'noise', 'water_horse', 'other' ]
    ]

    SHARED_CONV_CHANNEL: 64
    USE_BIAS_BEFORE_NORM: True
    NUM_HM_CONV: 2
    SEPARATE_HEAD_CFG:
        HEAD_ORDER: [ 'center', 'center_z', 'dim', 'rot' ]
        HEAD_DICT: {
            'center': { 'out_channels': 2, 'num_conv': 2 },
            'center_z': { 'out_channels': 1, 'num_conv': 2 },
            'dim': { 'out_channels': 3, 'num_conv': 2 },
            'rot': { 'out_channels': 2, 'num_conv': 2 },
        }

    TARGET_ASSIGNER_CONFIG:
        FEATURE_MAP_STRIDE: 8
        NUM_MAX_OBJS: 500
        GAUSSIAN_OVERLAP: 0.1
        MIN_RADIUS: 2

    LOSS_CONFIG:
        LOSS_WEIGHTS: {
            'cls_weight': 1.0,
            'loc_weight': 2.0,
            'code_weights': [ 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0 ]
        }

    POST_PROCESSING:
        SCORE_THRESH: 0.5
        POST_CENTER_LIMIT_RANGE: [ 0.0, -30, -2, 150.0, 30, 4 ]
        MAX_OBJ_PER_SAMPLE: 500
        NMS_CONFIG:
            NMS_TYPE: nms_gpu
            NMS_THRESH: 0.7
            NMS_PRE_MAXSIZE: 4096
            NMS_POST_MAXSIZE: 500

PFE:
    NAME: VoxelSetAbstraction
    POINT_SOURCE: raw_points

NUM_KEYPOINTS: 4096

    NUM_KEYPOINTS: 2048

NUM_OUTPUT_FEATURES: 90

    NUM_OUTPUT_FEATURES: 128
    SAMPLE_METHOD: SPC
    SPC_SAMPLING:
        NUM_POINTS_OF_EACH_SAMPLE_PART: 4000000
        NUM_SECTORS: 6
        SAMPLE_RADIUS_WITH_ROI: 1.6

    FEATURES_SOURCE: ['bev', 'x_conv3', 'x_conv4', 'raw_points']
    SA_LAYER:
        raw_points:
            NAME: VectorPoolAggregationModuleMSG
            NUM_GROUPS: 2
            LOCAL_AGGREGATION_TYPE: local_interpolation

NUM_REDUCED_CHANNELS: 2

            NUM_REDUCED_CHANNELS: 1
            NUM_CHANNELS_OF_LOCAL_AGGREGATION: 32
            MSG_POST_MLPS: [ 32 ]
            FILTER_NEIGHBOR_WITH_ROI: True
            RADIUS_OF_NEIGHBOR_WITH_ROI: 2.4

            GROUP_CFG_0:
                NUM_LOCAL_VOXEL: [ 2, 2, 2 ]
                MAX_NEIGHBOR_DISTANCE: 0.2
                NEIGHBOR_NSAMPLE: -1
                POST_MLPS: [ 32, 32 ]
            GROUP_CFG_1:
                NUM_LOCAL_VOXEL: [ 3, 3, 3 ]
                MAX_NEIGHBOR_DISTANCE: 0.4
                NEIGHBOR_NSAMPLE: -1
                POST_MLPS: [ 32, 32 ]

        x_conv3:
            DOWNSAMPLE_FACTOR: 4
            INPUT_CHANNELS: 64

            NAME: VectorPoolAggregationModuleMSG
            NUM_GROUPS: 2
            LOCAL_AGGREGATION_TYPE: local_interpolation
            NUM_REDUCED_CHANNELS: 32
            NUM_CHANNELS_OF_LOCAL_AGGREGATION: 32
            MSG_POST_MLPS: [128]
            FILTER_NEIGHBOR_WITH_ROI: True
            RADIUS_OF_NEIGHBOR_WITH_ROI: 4.0

            GROUP_CFG_0:
                NUM_LOCAL_VOXEL: [3, 3, 3]
                MAX_NEIGHBOR_DISTANCE: 1.2
                NEIGHBOR_NSAMPLE: -1
                POST_MLPS: [64, 64]
            GROUP_CFG_1:
                NUM_LOCAL_VOXEL: [ 3, 3, 3 ]
                MAX_NEIGHBOR_DISTANCE: 2.4
                NEIGHBOR_NSAMPLE: -1
                POST_MLPS: [ 64, 64 ]

        x_conv4:
            DOWNSAMPLE_FACTOR: 8
            INPUT_CHANNELS: 64

            NAME: VectorPoolAggregationModuleMSG
            NUM_GROUPS: 2
            LOCAL_AGGREGATION_TYPE: local_interpolation
            NUM_REDUCED_CHANNELS: 32
            NUM_CHANNELS_OF_LOCAL_AGGREGATION: 32
            MSG_POST_MLPS: [ 128 ]
            FILTER_NEIGHBOR_WITH_ROI: True
            RADIUS_OF_NEIGHBOR_WITH_ROI: 6.4

            GROUP_CFG_0:
                NUM_LOCAL_VOXEL: [ 3, 3, 3 ]
                MAX_NEIGHBOR_DISTANCE: 2.4
                NEIGHBOR_NSAMPLE: -1
                POST_MLPS: [ 64, 64 ]
            GROUP_CFG_1:
                NUM_LOCAL_VOXEL: [ 3, 3, 3 ]
                MAX_NEIGHBOR_DISTANCE: 4.8
                NEIGHBOR_NSAMPLE: -1
                POST_MLPS: [ 64, 64 ]

POINT_HEAD:
    NAME: PointHeadSimple
    CLS_FC: [256, 256]
    CLASS_AGNOSTIC: True
    USE_POINT_FEATURES_BEFORE_FUSION: True
    TARGET_CONFIG:
        GT_EXTRA_WIDTH: [0.2, 0.2, 0.2]
    LOSS_CONFIG:
        LOSS_REG: smooth-l1
        LOSS_WEIGHTS: {
            'point_cls_weight': 1.0,
        }

ROI_HEAD:
    NAME: PVRCNNHead
    CLASS_AGNOSTIC: True

    SHARED_FC: [256, 256]
    CLS_FC: [256, 256]
    REG_FC: [256, 256]
    DP_RATIO: 0.3

    NMS_CONFIG:
        TRAIN:
            NMS_TYPE: nms_gpu
            MULTI_CLASSES_NMS: False
            NMS_PRE_MAXSIZE: 9000
            NMS_POST_MAXSIZE: 512
            NMS_THRESH: 0.8
        TEST:
            NMS_TYPE: nms_gpu
            MULTI_CLASSES_NMS: False
            NMS_PRE_MAXSIZE: 1024
            NMS_POST_MAXSIZE: 100
            NMS_THRESH: 0.7
            SCORE_THRESH: 0.1

NMS_PRE_MAXSIZE: 4096

NMS_POST_MAXSIZE: 500

NMS_THRESH: 0.85

    ROI_GRID_POOL:
        GRID_SIZE: 6

        NAME: VectorPoolAggregationModuleMSG
        NUM_GROUPS: 2
        LOCAL_AGGREGATION_TYPE: voxel_random_choice

NUM_REDUCED_CHANNELS: 30

        NUM_REDUCED_CHANNELS: 64
        NUM_CHANNELS_OF_LOCAL_AGGREGATION: 32
        MSG_POST_MLPS: [ 128 ]

        GROUP_CFG_0:
            NUM_LOCAL_VOXEL: [ 3, 3, 3 ]
            MAX_NEIGHBOR_DISTANCE: 0.8
            NEIGHBOR_NSAMPLE: 32
            POST_MLPS: [ 64, 64 ]
        GROUP_CFG_1:
            NUM_LOCAL_VOXEL: [ 3, 3, 3 ]
            MAX_NEIGHBOR_DISTANCE: 1.6
            NEIGHBOR_NSAMPLE: 32
            POST_MLPS: [ 64, 64 ]

    TARGET_CONFIG:
        BOX_CODER: ResidualCoder
        ROI_PER_IMAGE: 128
        FG_RATIO: 0.5

        SAMPLE_ROI_BY_EACH_CLASS: True
        CLS_SCORE_TYPE: roi_iou

        CLS_FG_THRESH: 0.75
        CLS_BG_THRESH: 0.25
        CLS_BG_THRESH_LO: 0.1
        HARD_BG_RATIO: 0.8

        REG_FG_THRESH: 0.55

    LOSS_CONFIG:
        CLS_LOSS: BinaryCrossEntropy
        REG_LOSS: smooth-l1
        CORNER_LOSS_REGULARIZATION: True
        LOSS_WEIGHTS: {
            'rcnn_cls_weight': 1.0,
            'rcnn_reg_weight': 1.0,
            'rcnn_corner_weight': 1.0,
            'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
        }

POST_PROCESSING:
    RECALL_THRESH_LIST: [0.3, 0.5, 0.7]
    SCORE_THRESH: 0.5
    OUTPUT_RAW_SCORE: False

    EVAL_METRIC: kitti

    NMS_CONFIG:
        MULTI_CLASSES_NMS: False
        NMS_TYPE: nms_gpu
        NMS_THRESH: 0.7
        NMS_PRE_MAXSIZE: 4096
        NMS_POST_MAXSIZE: 500

OPTIMIZATION: BATCH_SIZE_PER_GPU: 2 NUM_EPOCHS: 30

OPTIMIZER: adam_onecycle
LR: 0.01
WEIGHT_DECAY: 0.001
MOMENTUM: 0.9

MOMS: [0.95, 0.85]
PCT_START: 0.4
DIV_FACTOR: 10
DECAY_STEP_LIST: [35, 45]
LR_DECAY: 0.1
LR_CLIP: 0.0000001

LR_WARMUP: False
WARMUP_EPOCH: 1

GRAD_NORM_CLIP: 10`
xuchangjia commented 2 years ago

maybe you should change your "POINT_CLOUD_RANGE".

luoxiaoliaolan commented 2 years ago

maybe you should change your "POINT_CLOUD_RANGE". @xuchangjia I tried changing "POINT_CLOUD_RANGE", but still getting that error. I debugged the program,and found that the reason for the misalignment of the number of channels is mainly here: 2022-11-02_11-06

/OpenPCDet/pcdet/models/backbones_2d/map_to_bev/height_compression.py: 2022-11-02_11-09 The values of these dimensions are obtained through the processing of VoxelBackBone8x (/OpenPCDet/pcdet/models/backbones_3d/spconv_backbone.py) 128 * 3 = 384
pv_rcnn_plusplus.yaml: MAP_TO_BEV: NAME: HeightCompression NUM_BEV_FEATURES: 256 384 != 256 D=3 not 2 How should I adjust this configuration to avoid this error? If it is convenient, I hope to communicate through WeChat, my WeChat ID: lyb543918165

Lizhinwafu commented 2 years ago

Did you debug successfully?

luoxiaoliaolan commented 2 years ago

Did you debug successfully?

Yes, I adjusted the configuration of the training model and it can be trained normally

Lizhinwafu commented 2 years ago

Did you debug successfully?

Yes, I adjusted the configuration of the training model and it can be trained normally

Thanks, I would like to add you WeChat to ask some questions.

BraunBenni commented 2 years ago

maybe you should change your "POINT_CLOUD_RANGE".

Can you please explain how the point cloud range relates to the number of features? I'm having a similar problem and would appreciate any help.

Edit: https://github.com/open-mmlab/OpenPCDet/issues/253#issuecomment-679190936 helped

luoxiaoliaolan commented 2 years ago

@BraunBenni
The range of the point cloud is related to the generated voxels. You should set the range of the point cloud according to your data, such as size parameters. Maybe you need to try several times.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

josyulavt commented 6 months ago

Hi for this point cloud range POINT_CLOUD_RANGE: [-32.0, -32.0, -3.0, 32.0, 32.0, 40.0]

How do I choose the voxel size ?


    - NAME: transform_points_to_voxels
      VOXEL_SIZE: [0.16, 0.16, 4]
      MAX_POINTS_PER_VOXEL: 50
      MAX_NUMBER_OF_VOXELS: {
        'train': 150000,
        'test': 150000
      }

No matter what I do I get the error :

ValueError: your out spatial shape [0, X, X] reach zero!!! input shape: [1, X, X]