open-mmlab / OpenPCDet

OpenPCDet Toolbox for LiDAR-based 3D Object Detection.
Apache License 2.0
4.62k stars 1.29k forks source link

MPPNet assumes the input point feature has 6 dim #1147

Closed kan-bayashi closed 1 year ago

kan-bayashi commented 2 years ago

Hi, Thank you for developing a great toolkit! MPPNet performance seems very attractive and I want to train MPPNet using custom dataset. At first, I try to train MPPnet with waymo dataset using different point features to match with my custom dataset.

I modified the following line: https://github.com/open-mmlab/OpenPCDet/blob/f221374a5cb9398fd089aa7194732c808c700355/tools/cfgs/waymo_models/mppnet_4frames.yaml#L47-L51 to

    POINT_FEATURE_ENCODING: {
        encoding_type: absolute_coordinates_encoding,
        used_feature_list: ['x', 'y', 'z', 'intensity', 'time'],
        src_feature_list: ['x', 'y', 'z', 'intensity', 'elongation', 'time'],
    }

But I got the following errors:

2022-10-12 19:30:12,575   INFO  **********************Start logging**********************
2022-10-12 19:30:12,575   INFO  CUDA_VISIBLE_DEVICES=2
2022-10-12 19:30:12,575   INFO  cfg_file         cfgs/waymo_models/mppnet_4frames.v2.yaml
2022-10-12 19:30:12,575   INFO  batch_size       2
2022-10-12 19:30:12,575   INFO  epochs           6
2022-10-12 19:30:12,575   INFO  workers          4
2022-10-12 19:30:12,576   INFO  extra_tag        default
2022-10-12 19:30:12,576   INFO  ckpt             None
2022-10-12 19:30:12,576   INFO  pretrained_model None
2022-10-12 19:30:12,576   INFO  launcher         none
2022-10-12 19:30:12,576   INFO  tcp_port         18888
2022-10-12 19:30:12,576   INFO  sync_bn          False
2022-10-12 19:30:12,576   INFO  fix_random_seed  False
2022-10-12 19:30:12,576   INFO  ckpt_save_interval 1
2022-10-12 19:30:12,577   INFO  local_rank       0
2022-10-12 19:30:12,577   INFO  max_ckpt_save_num 30
2022-10-12 19:30:12,577   INFO  merge_all_iters_to_one_epoch False
2022-10-12 19:30:12,577   INFO  set_cfgs         ['DATA_CONFIG.ROI_BOXES_PATH.train', '/work/dl-user/OpenPCDet/examples/waymo_mppnet/exp/centerpoint_4frames.v2/default/eval/epoch_10/train/default/result.pkl', 'DATA_CONFIG.ROI_BOXES_PATH.test', '/work/dl-user/OpenPCDet/examples/waymo_mppnet/exp/centerpoint_4frames.v2/default/eval/epoch_10/val/default/result.pkl']
2022-10-12 19:30:12,577   INFO  max_waiting_mins 0
2022-10-12 19:30:12,577   INFO  start_epoch      0
2022-10-12 19:30:12,577   INFO  num_epochs_to_eval 0
2022-10-12 19:30:12,577   INFO  save_to_file     False
2022-10-12 19:30:12,577   INFO  use_tqdm_to_record False
2022-10-12 19:30:12,577   INFO  logger_iter_interval 50
2022-10-12 19:30:12,577   INFO  ckpt_save_time_interval 18000
2022-10-12 19:30:12,577   INFO  wo_gpu_stat      False
2022-10-12 19:30:12,577   INFO  cfg.ROOT_DIR: /work/dl-user/OpenPCDet
2022-10-12 19:30:12,578   INFO  cfg.LOCAL_RANK: 0
2022-10-12 19:30:12,578   INFO  cfg.CLASS_NAMES: ['Vehicle', 'Pedestrian', 'Cyclist']
2022-10-12 19:30:12,578   INFO
cfg.DATA_CONFIG = edict()
2022-10-12 19:30:12,578   INFO  cfg.DATA_CONFIG.DATASET: WaymoDataset
2022-10-12 19:30:12,578   INFO  cfg.DATA_CONFIG.DATA_PATH: ../data/waymo
2022-10-12 19:30:12,578   INFO  cfg.DATA_CONFIG.PROCESSED_DATA_TAG: waymo_processed_data_v0_5_0
2022-10-12 19:30:12,578   INFO  cfg.DATA_CONFIG.POINT_CLOUD_RANGE: [-75.2, -75.2, -2, 75.2, 75.2, 4]
2022-10-12 19:30:12,578   INFO
cfg.DATA_CONFIG.DATA_SPLIT = edict()
2022-10-12 19:30:12,578   INFO  cfg.DATA_CONFIG.DATA_SPLIT.train: train
2022-10-12 19:30:12,578   INFO  cfg.DATA_CONFIG.DATA_SPLIT.test: val
2022-10-12 19:30:12,578   INFO
cfg.DATA_CONFIG.SAMPLED_INTERVAL = edict()
2022-10-12 19:30:12,578   INFO  cfg.DATA_CONFIG.SAMPLED_INTERVAL.train: 1
2022-10-12 19:30:12,578   INFO  cfg.DATA_CONFIG.SAMPLED_INTERVAL.test: 1
2022-10-12 19:30:12,578   INFO  cfg.DATA_CONFIG.FILTER_EMPTY_BOXES_FOR_TRAIN: True
2022-10-12 19:30:12,579   INFO  cfg.DATA_CONFIG.DISABLE_NLZ_FLAG_ON_POINTS: True
2022-10-12 19:30:12,579   INFO  cfg.DATA_CONFIG.USE_SHARED_MEMORY: False
2022-10-12 19:30:12,579   INFO  cfg.DATA_CONFIG.SHARED_MEMORY_FILE_LIMIT: 35000
2022-10-12 19:30:12,579   INFO
cfg.DATA_CONFIG.DATA_AUGMENTOR = edict()
2022-10-12 19:30:12,579   INFO  cfg.DATA_CONFIG.DATA_AUGMENTOR.DISABLE_AUG_LIST: ['placeholder']
2022-10-12 19:30:12,579   INFO  cfg.DATA_CONFIG.DATA_AUGMENTOR.AUG_CONFIG_LIST: [{'NAME': 'random_world_flip', 'ALONG_AXIS_LIST': ['x', 'y']}, {'NAME': 'random_world_rotation', 'WORLD_ROT_ANGLE': [-0.78539816, 0.78539816]}, {'NAME': 'random_world_scaling', 'WORLD_SCALE_RANGE': [0.95, 1.05]}]
2022-10-12 19:30:12,579   INFO
cfg.DATA_CONFIG.POINT_FEATURE_ENCODING = edict()
2022-10-12 19:30:12,579   INFO  cfg.DATA_CONFIG.POINT_FEATURE_ENCODING.encoding_type: absolute_coordinates_encoding
2022-10-12 19:30:12,579   INFO  cfg.DATA_CONFIG.POINT_FEATURE_ENCODING.used_feature_list: ['x', 'y', 'z', 'intensity', 'time']
2022-10-12 19:30:12,579   INFO  cfg.DATA_CONFIG.POINT_FEATURE_ENCODING.src_feature_list: ['x', 'y', 'z', 'intensity', 'elongation', 'time']
2022-10-12 19:30:12,579   INFO  cfg.DATA_CONFIG.DATA_PROCESSOR: [{'NAME': 'mask_points_and_boxes_outside_range', 'REMOVE_OUTSIDE_BOXES': True}, {'NAME': 'shuffle_points', 'SHUFFLE_ENABLED': {'train': True, 'test': True}}]
2022-10-12 19:30:12,579   INFO  cfg.DATA_CONFIG._BASE_CONFIG_: cfgs/dataset_configs/waymo_dataset.yaml
2022-10-12 19:30:12,579   INFO
cfg.DATA_CONFIG.SEQUENCE_CONFIG = edict()
2022-10-12 19:30:12,579   INFO  cfg.DATA_CONFIG.SEQUENCE_CONFIG.ENABLED: True
2022-10-12 19:30:12,580   INFO  cfg.DATA_CONFIG.SEQUENCE_CONFIG.SAMPLE_OFFSET: [-3, 0]
2022-10-12 19:30:12,580   INFO  cfg.DATA_CONFIG.USE_PREDBOX: True
2022-10-12 19:30:12,580   INFO
cfg.DATA_CONFIG.ROI_BOXES_PATH = edict()
2022-10-12 19:30:12,580   INFO  cfg.DATA_CONFIG.ROI_BOXES_PATH.train: /work/dl-user/OpenPCDet/examples/waymo_mppnet/exp/centerpoint_4frames.v2/default/eval/epoch_10/train/default/result.pkl
2022-10-12 19:30:12,580   INFO  cfg.DATA_CONFIG.ROI_BOXES_PATH.test: /work/dl-user/OpenPCDet/examples/waymo_mppnet/exp/centerpoint_4frames.v2/default/eval/epoch_10/val/default/result.pkl
2022-10-12 19:30:12,580   INFO
cfg.MODEL = edict()
2022-10-12 19:30:12,580   INFO  cfg.MODEL.NAME: MPPNet
2022-10-12 19:30:12,580   INFO
cfg.MODEL.ROI_HEAD = edict()
2022-10-12 19:30:12,580   INFO  cfg.MODEL.ROI_HEAD.NAME: MPPNetHead
2022-10-12 19:30:12,580   INFO  cfg.MODEL.ROI_HEAD.TRANS_INPUT: 256
2022-10-12 19:30:12,580   INFO  cfg.MODEL.ROI_HEAD.CLASS_AGNOSTIC: True
2022-10-12 19:30:12,580   INFO
cfg.MODEL.ROI_HEAD.USE_BOX_ENCODING = edict()
2022-10-12 19:30:12,580   INFO  cfg.MODEL.ROI_HEAD.USE_BOX_ENCODING.ENABLED: True
2022-10-12 19:30:12,580   INFO  cfg.MODEL.ROI_HEAD.AVG_STAGE1_SCORE: True
2022-10-12 19:30:12,581   INFO  cfg.MODEL.ROI_HEAD.USE_TRAJ_EMPTY_MASK: True
2022-10-12 19:30:12,581   INFO  cfg.MODEL.ROI_HEAD.USE_AUX_LOSS: True
2022-10-12 19:30:12,581   INFO  cfg.MODEL.ROI_HEAD.IOU_WEIGHT: [0.5, 0.4]
2022-10-12 19:30:12,581   INFO
cfg.MODEL.ROI_HEAD.ROI_GRID_POOL = edict()
2022-10-12 19:30:12,581   INFO  cfg.MODEL.ROI_HEAD.ROI_GRID_POOL.GRID_SIZE: 4
2022-10-12 19:30:12,581   INFO  cfg.MODEL.ROI_HEAD.ROI_GRID_POOL.MLPS: [[128, 128], [128, 128]]
2022-10-12 19:30:12,581   INFO  cfg.MODEL.ROI_HEAD.ROI_GRID_POOL.POOL_RADIUS: [0.8, 1.6]
2022-10-12 19:30:12,581   INFO  cfg.MODEL.ROI_HEAD.ROI_GRID_POOL.NSAMPLE: [16, 16]
2022-10-12 19:30:12,581   INFO  cfg.MODEL.ROI_HEAD.ROI_GRID_POOL.POOL_METHOD: max_pool
2022-10-12 19:30:12,581   INFO
cfg.MODEL.ROI_HEAD.Transformer = edict()
2022-10-12 19:30:12,581   INFO  cfg.MODEL.ROI_HEAD.Transformer.num_lidar_points: 128
2022-10-12 19:30:12,581   INFO  cfg.MODEL.ROI_HEAD.Transformer.num_proxy_points: 64
2022-10-12 19:30:12,581   INFO  cfg.MODEL.ROI_HEAD.Transformer.pos_hidden_dim: 64
2022-10-12 19:30:12,581   INFO  cfg.MODEL.ROI_HEAD.Transformer.enc_layers: 3
2022-10-12 19:30:12,581   INFO  cfg.MODEL.ROI_HEAD.Transformer.dim_feedforward: 512
2022-10-12 19:30:12,582   INFO  cfg.MODEL.ROI_HEAD.Transformer.hidden_dim: 256
2022-10-12 19:30:12,582   INFO  cfg.MODEL.ROI_HEAD.Transformer.dropout: 0.1
2022-10-12 19:30:12,582   INFO  cfg.MODEL.ROI_HEAD.Transformer.nheads: 4
2022-10-12 19:30:12,582   INFO  cfg.MODEL.ROI_HEAD.Transformer.pre_norm: False
2022-10-12 19:30:12,582   INFO  cfg.MODEL.ROI_HEAD.Transformer.num_frames: 4
2022-10-12 19:30:12,582   INFO  cfg.MODEL.ROI_HEAD.Transformer.num_groups: 4
2022-10-12 19:30:12,582   INFO
cfg.MODEL.ROI_HEAD.Transformer.use_grid_pos = edict()
2022-10-12 19:30:12,582   INFO  cfg.MODEL.ROI_HEAD.Transformer.use_grid_pos.enabled: True
2022-10-12 19:30:12,582   INFO  cfg.MODEL.ROI_HEAD.Transformer.use_grid_pos.init_type: index
2022-10-12 19:30:12,582   INFO
cfg.MODEL.ROI_HEAD.Transformer.use_mlp_mixer = edict()
2022-10-12 19:30:12,582   INFO  cfg.MODEL.ROI_HEAD.Transformer.use_mlp_mixer.enabled: True
2022-10-12 19:30:12,582   INFO  cfg.MODEL.ROI_HEAD.Transformer.use_mlp_mixer.hidden_dim: 16
2022-10-12 19:30:12,582   INFO
cfg.MODEL.ROI_HEAD.TARGET_CONFIG = edict()
2022-10-12 19:30:12,582   INFO  cfg.MODEL.ROI_HEAD.TARGET_CONFIG.BOX_CODER: ResidualCoder
2022-10-12 19:30:12,583   INFO  cfg.MODEL.ROI_HEAD.TARGET_CONFIG.ROI_PER_IMAGE: 96
2022-10-12 19:30:12,583   INFO  cfg.MODEL.ROI_HEAD.TARGET_CONFIG.FG_RATIO: 0.5
2022-10-12 19:30:12,583   INFO  cfg.MODEL.ROI_HEAD.TARGET_CONFIG.REG_AUG_METHOD: single
2022-10-12 19:30:12,583   INFO  cfg.MODEL.ROI_HEAD.TARGET_CONFIG.ROI_FG_AUG_TIMES: 10
2022-10-12 19:30:12,583   INFO  cfg.MODEL.ROI_HEAD.TARGET_CONFIG.RATIO: 0.2
2022-10-12 19:30:12,583   INFO  cfg.MODEL.ROI_HEAD.TARGET_CONFIG.USE_ROI_AUG: True
2022-10-12 19:30:12,583   INFO
cfg.MODEL.ROI_HEAD.TARGET_CONFIG.USE_TRAJ_AUG = edict()
2022-10-12 19:30:12,583   INFO  cfg.MODEL.ROI_HEAD.TARGET_CONFIG.USE_TRAJ_AUG.ENABLED: True
2022-10-12 19:30:12,583   INFO  cfg.MODEL.ROI_HEAD.TARGET_CONFIG.USE_TRAJ_AUG.THRESHOD: 0.8
2022-10-12 19:30:12,583   INFO  cfg.MODEL.ROI_HEAD.TARGET_CONFIG.SAMPLE_ROI_BY_EACH_CLASS: True
2022-10-12 19:30:12,583   INFO  cfg.MODEL.ROI_HEAD.TARGET_CONFIG.CLS_SCORE_TYPE: roi_iou
2022-10-12 19:30:12,583   INFO  cfg.MODEL.ROI_HEAD.TARGET_CONFIG.CLS_FG_THRESH: 0.75
2022-10-12 19:30:12,583   INFO  cfg.MODEL.ROI_HEAD.TARGET_CONFIG.CLS_BG_THRESH: 0.25
2022-10-12 19:30:12,583   INFO  cfg.MODEL.ROI_HEAD.TARGET_CONFIG.CLS_BG_THRESH_LO: 0.1
2022-10-12 19:30:12,584   INFO  cfg.MODEL.ROI_HEAD.TARGET_CONFIG.HARD_BG_RATIO: 0.8
2022-10-12 19:30:12,584   INFO  cfg.MODEL.ROI_HEAD.TARGET_CONFIG.REG_FG_THRESH: 0.55
2022-10-12 19:30:12,584   INFO
cfg.MODEL.ROI_HEAD.LOSS_CONFIG = edict()
2022-10-12 19:30:12,584   INFO  cfg.MODEL.ROI_HEAD.LOSS_CONFIG.CLS_LOSS: BinaryCrossEntropy
2022-10-12 19:30:12,584   INFO  cfg.MODEL.ROI_HEAD.LOSS_CONFIG.REG_LOSS: smooth-l1
2022-10-12 19:30:12,584   INFO  cfg.MODEL.ROI_HEAD.LOSS_CONFIG.CORNER_LOSS_REGULARIZATION: True
2022-10-12 19:30:12,584   INFO
cfg.MODEL.ROI_HEAD.LOSS_CONFIG.LOSS_WEIGHTS = edict()
2022-10-12 19:30:12,584   INFO  cfg.MODEL.ROI_HEAD.LOSS_CONFIG.LOSS_WEIGHTS.rcnn_cls_weight: 1.0
2022-10-12 19:30:12,584   INFO  cfg.MODEL.ROI_HEAD.LOSS_CONFIG.LOSS_WEIGHTS.rcnn_reg_weight: 1.0
2022-10-12 19:30:12,584   INFO  cfg.MODEL.ROI_HEAD.LOSS_CONFIG.LOSS_WEIGHTS.rcnn_corner_weight: 2.0
2022-10-12 19:30:12,584   INFO  cfg.MODEL.ROI_HEAD.LOSS_CONFIG.LOSS_WEIGHTS.traj_reg_weight: [2.0, 2.0, 2.0]
2022-10-12 19:30:12,584   INFO  cfg.MODEL.ROI_HEAD.LOSS_CONFIG.LOSS_WEIGHTS.code_weights: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
2022-10-12 19:30:12,584   INFO
cfg.MODEL.POST_PROCESSING = edict()
2022-10-12 19:30:12,584   INFO  cfg.MODEL.POST_PROCESSING.RECALL_THRESH_LIST: [0.3, 0.5, 0.7]
2022-10-12 19:30:12,585   INFO  cfg.MODEL.POST_PROCESSING.SCORE_THRESH: 0.1
2022-10-12 19:30:12,585   INFO  cfg.MODEL.POST_PROCESSING.OUTPUT_RAW_SCORE: False
2022-10-12 19:30:12,585   INFO  cfg.MODEL.POST_PROCESSING.SAVE_BBOX: False
2022-10-12 19:30:12,585   INFO  cfg.MODEL.POST_PROCESSING.EVAL_METRIC: waymo
2022-10-12 19:30:12,585   INFO  cfg.MODEL.POST_PROCESSING.NOT_APPLY_NMS_FOR_VEL: True
2022-10-12 19:30:12,585   INFO
cfg.MODEL.POST_PROCESSING.NMS_CONFIG = edict()
2022-10-12 19:30:12,585   INFO  cfg.MODEL.POST_PROCESSING.NMS_CONFIG.MULTI_CLASSES_NMS: False
2022-10-12 19:30:12,585   INFO  cfg.MODEL.POST_PROCESSING.NMS_CONFIG.NMS_TYPE: nms_gpu
2022-10-12 19:30:12,585   INFO  cfg.MODEL.POST_PROCESSING.NMS_CONFIG.NMS_THRESH: 0.7
2022-10-12 19:30:12,585   INFO  cfg.MODEL.POST_PROCESSING.NMS_CONFIG.NMS_PRE_MAXSIZE: 4096
2022-10-12 19:30:12,585   INFO  cfg.MODEL.POST_PROCESSING.NMS_CONFIG.NMS_POST_MAXSIZE: 500
2022-10-12 19:30:12,585   INFO
cfg.OPTIMIZATION = edict()
2022-10-12 19:30:12,585   INFO  cfg.OPTIMIZATION.BATCH_SIZE_PER_GPU: 2
2022-10-12 19:30:12,585   INFO  cfg.OPTIMIZATION.NUM_EPOCHS: 6
2022-10-12 19:30:12,586   INFO  cfg.OPTIMIZATION.OPTIMIZER: adam_onecycle
2022-10-12 19:30:12,586   INFO  cfg.OPTIMIZATION.LR: 0.003
2022-10-12 19:30:12,586   INFO  cfg.OPTIMIZATION.WEIGHT_DECAY: 0.01
2022-10-12 19:30:12,586   INFO  cfg.OPTIMIZATION.MOMENTUM: 0.9
2022-10-12 19:30:12,586   INFO  cfg.OPTIMIZATION.MOMS: [0.95, 0.85]
2022-10-12 19:30:12,586   INFO  cfg.OPTIMIZATION.PCT_START: 0.4
2022-10-12 19:30:12,586   INFO  cfg.OPTIMIZATION.DIV_FACTOR: 10
2022-10-12 19:30:12,586   INFO  cfg.OPTIMIZATION.DECAY_STEP_LIST: [35, 45]
2022-10-12 19:30:12,586   INFO  cfg.OPTIMIZATION.LR_DECAY: 0.1
2022-10-12 19:30:12,586   INFO  cfg.OPTIMIZATION.LR_CLIP: 1e-07
2022-10-12 19:30:12,586   INFO  cfg.OPTIMIZATION.LR_WARMUP: False
2022-10-12 19:30:12,586   INFO  cfg.OPTIMIZATION.WARMUP_EPOCH: 1
2022-10-12 19:30:12,586   INFO  cfg.OPTIMIZATION.GRAD_NORM_CLIP: 10
2022-10-12 19:30:12,586   INFO  cfg.TAG: mppnet_4frames.v2
2022-10-12 19:30:12,587   INFO  cfg.EXP_GROUP_PATH: waymo_models
2022-10-12 19:30:12,598   INFO  Loading Waymo dataset
2022-10-12 19:30:23,879   INFO  Total skipped info 0
2022-10-12 19:30:23,879   INFO  Total samples for Waymo dataset: 158081
2022-10-12 19:30:23,882   INFO  Loading and reorganizing pred_boxes to dict from path: /work/dl-user/OpenPCDet/examples/waymo_mppnet/exp/centerpoint_4frames.v2/default/eval/epoch_10/train/default/result.pkl
2022-10-12 19:31:19,560   INFO  Predicted boxes has been loaded, total sequences: 798
/work/dl-user/OpenPCDet/venv/.venv/lib/python3.7/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2157.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
2022-10-12 19:31:24,879   INFO  MPPNet(
  (vfe): None
  (backbone_3d): None
  (map_to_bev_module): None
  (pfe): None
  (backbone_2d): None
  (dense_head): None
  (point_head): None
  (roi_head): MPPNetHead(
    (proposal_target_layer): ProposalTargetLayerMPPNet()
    (reg_loss_func): WeightedSmoothL1Loss()
    (seqboxembed): PointNet(
      (feat): PointNetfeat(
        (conv1): Conv1d(8, 64, kernel_size=(1,), stride=(1,))
        (conv2): Conv1d(64, 128, kernel_size=(1,), stride=(1,))
        (conv3): Conv1d(128, 256, kernel_size=(1,), stride=(1,))
        (conv4): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
        (bn1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (bn2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (bn3): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (bn4): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (fc1): Linear(in_features=512, out_features=256, bias=True)
      (fc2): Linear(in_features=256, out_features=256, bias=True)
      (pre_bn): BatchNorm1d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (bn1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (bn2): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU()
      (fc_s1): Linear(in_features=256, out_features=256, bias=True)
      (fc_s2): Linear(in_features=256, out_features=3, bias=False)
      (fc_ce1): Linear(in_features=256, out_features=256, bias=True)
      (fc_ce2): Linear(in_features=256, out_features=3, bias=False)
      (fc_hr1): Linear(in_features=256, out_features=256, bias=True)
      (fc_hr2): Linear(in_features=256, out_features=1, bias=False)
    )
    (jointembed): MLP(
      (layers): ModuleList(
        (0): Linear(in_features=1280, out_features=256, bias=True)
        (1): Linear(in_features=256, out_features=256, bias=True)
        (2): Linear(in_features=256, out_features=256, bias=True)
        (3): Linear(in_features=256, out_features=7, bias=True)
      )
    )
    (up_dimension_geometry): MLP(
      (layers): ModuleList(
        (0): Linear(in_features=29, out_features=64, bias=True)
        (1): Linear(in_features=64, out_features=64, bias=True)
        (2): Linear(in_features=64, out_features=128, bias=True)
      )
    )
    (up_dimension_motion): MLP(
      (layers): ModuleList(
        (0): Linear(in_features=30, out_features=64, bias=True)
        (1): Linear(in_features=64, out_features=64, bias=True)
        (2): Linear(in_features=64, out_features=256, bias=True)
      )
    )
    (transformer): Transformer(
      (encoder): TransformerEncoder(
        (layers): ModuleList(
          (0): TransformerEncoderLayer(
            (self_attn): MultiheadAttention(
              (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
            )
            (linear1): Linear(in_features=256, out_features=512, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
            (linear2): Linear(in_features=512, out_features=256, bias=True)
            (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (dropout1): Dropout(p=0.1, inplace=False)
            (dropout2): Dropout(p=0.1, inplace=False)
            (cross_attn_layers): ModuleList(
              (0): MultiheadAttention(
                (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
              )
              (1): MultiheadAttention(
                (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
              )
              (2): MultiheadAttention(
                (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
              )
              (3): MultiheadAttention(
                (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
              )
            )
            (ffn): FFN(
              (linear1): Linear(in_features=256, out_features=512, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
              (linear2): Linear(in_features=512, out_features=256, bias=True)
              (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (dropout1): Dropout(p=0.1, inplace=False)
              (dropout2): Dropout(p=0.1, inplace=False)
              (dropout3): Dropout(p=0.1, inplace=False)
            )
            (fusion_all_groups): MLP(
              (layers): ModuleList(
                (0): Linear(in_features=1024, out_features=256, bias=True)
                (1): Linear(in_features=256, out_features=256, bias=True)
                (2): Linear(in_features=256, out_features=256, bias=True)
                (3): Linear(in_features=256, out_features=256, bias=True)
              )
            )
            (mlp_mixer_3d): SpatialMixerBlock(
              (mixer_x): MLP(
                (layers): ModuleList(
                  (0): Linear(in_features=4, out_features=16, bias=True)
                  (1): Linear(in_features=16, out_features=16, bias=True)
                  (2): Linear(in_features=16, out_features=4, bias=True)
                )
              )
              (mixer_y): MLP(
                (layers): ModuleList(
                  (0): Linear(in_features=4, out_features=16, bias=True)
                  (1): Linear(in_features=16, out_features=16, bias=True)
                  (2): Linear(in_features=16, out_features=4, bias=True)
                )
              )
              (mixer_z): MLP(
                (layers): ModuleList(
                  (0): Linear(in_features=4, out_features=16, bias=True)
                  (1): Linear(in_features=16, out_features=16, bias=True)
                  (2): Linear(in_features=16, out_features=4, bias=True)
                )
              )
              (norm_x): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (norm_y): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (norm_z): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (norm_channel): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (ffn): Sequential(
                (0): Linear(in_features=256, out_features=512, bias=True)
                (1): ReLU()
                (2): Dropout(p=0.0, inplace=False)
                (3): Linear(in_features=512, out_features=256, bias=True)
              )
            )
          )
          (1): TransformerEncoderLayer(
            (self_attn): MultiheadAttention(
              (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
            )
            (linear1): Linear(in_features=256, out_features=512, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
            (linear2): Linear(in_features=512, out_features=256, bias=True)
            (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (dropout1): Dropout(p=0.1, inplace=False)
            (dropout2): Dropout(p=0.1, inplace=False)
            (cross_attn_layers): ModuleList(
              (0): MultiheadAttention(
                (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
              )
              (1): MultiheadAttention(
                (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
              )
              (2): MultiheadAttention(
                (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
              )
              (3): MultiheadAttention(
                (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
              )
            )
            (ffn): FFN(
              (linear1): Linear(in_features=256, out_features=512, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
              (linear2): Linear(in_features=512, out_features=256, bias=True)
              (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (dropout1): Dropout(p=0.1, inplace=False)
              (dropout2): Dropout(p=0.1, inplace=False)
              (dropout3): Dropout(p=0.1, inplace=False)
            )
            (fusion_all_groups): MLP(
              (layers): ModuleList(
                (0): Linear(in_features=1024, out_features=256, bias=True)
                (1): Linear(in_features=256, out_features=256, bias=True)
                (2): Linear(in_features=256, out_features=256, bias=True)
                (3): Linear(in_features=256, out_features=256, bias=True)
              )
            )
            (mlp_mixer_3d): SpatialMixerBlock(
              (mixer_x): MLP(
                (layers): ModuleList(
                  (0): Linear(in_features=4, out_features=16, bias=True)
                  (1): Linear(in_features=16, out_features=16, bias=True)
                  (2): Linear(in_features=16, out_features=4, bias=True)
                )
              )
              (mixer_y): MLP(
                (layers): ModuleList(
                  (0): Linear(in_features=4, out_features=16, bias=True)
                  (1): Linear(in_features=16, out_features=16, bias=True)
                  (2): Linear(in_features=16, out_features=4, bias=True)
                )
              )
              (mixer_z): MLP(
                (layers): ModuleList(
                  (0): Linear(in_features=4, out_features=16, bias=True)
                  (1): Linear(in_features=16, out_features=16, bias=True)
                  (2): Linear(in_features=16, out_features=4, bias=True)
                )
              )
              (norm_x): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (norm_y): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (norm_z): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (norm_channel): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (ffn): Sequential(
                (0): Linear(in_features=256, out_features=512, bias=True)
                (1): ReLU()
                (2): Dropout(p=0.0, inplace=False)
                (3): Linear(in_features=512, out_features=256, bias=True)
              )
            )
          )
          (2): TransformerEncoderLayer(
            (self_attn): MultiheadAttention(
              (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
            )
            (linear1): Linear(in_features=256, out_features=512, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
            (linear2): Linear(in_features=512, out_features=256, bias=True)
            (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (dropout1): Dropout(p=0.1, inplace=False)
            (dropout2): Dropout(p=0.1, inplace=False)
            (mlp_mixer_3d): SpatialMixerBlock(
              (mixer_x): MLP(
                (layers): ModuleList(
                  (0): Linear(in_features=4, out_features=16, bias=True)
                  (1): Linear(in_features=16, out_features=16, bias=True)
                  (2): Linear(in_features=16, out_features=4, bias=True)
                )
              )
              (mixer_y): MLP(
                (layers): ModuleList(
                  (0): Linear(in_features=4, out_features=16, bias=True)
                  (1): Linear(in_features=16, out_features=16, bias=True)
                  (2): Linear(in_features=16, out_features=4, bias=True)
                )
              )
              (mixer_z): MLP(
                (layers): ModuleList(
                  (0): Linear(in_features=4, out_features=16, bias=True)
                  (1): Linear(in_features=16, out_features=16, bias=True)
                  (2): Linear(in_features=16, out_features=4, bias=True)
                )
              )
              (norm_x): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (norm_y): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (norm_z): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (norm_channel): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (ffn): Sequential(
                (0): Linear(in_features=256, out_features=512, bias=True)
                (1): ReLU()
                (2): Dropout(p=0.0, inplace=False)
                (3): Linear(in_features=512, out_features=256, bias=True)
              )
            )
          )
        )
      )
    )
    (roi_grid_pool_layer): StackSAModuleMSG(
      (groupers): ModuleList(
        (0): QueryAndGroup()
        (1): QueryAndGroup()
      )
      (mlps): ModuleList(
        (0): Sequential(
          (0): Conv2d(131, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU()
        )
        (1): Sequential(
          (0): Conv2d(131, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU()
        )
      )
    )
    (class_embed): ModuleList(
      (0): Linear(in_features=256, out_features=1, bias=True)
    )
    (bbox_embed): ModuleList(
      (0): MLP(
        (layers): ModuleList(
          (0): Linear(in_features=256, out_features=256, bias=True)
          (1): Linear(in_features=256, out_features=256, bias=True)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): Linear(in_features=256, out_features=7, bias=True)
        )
      )
      (1): MLP(
        (layers): ModuleList(
          (0): Linear(in_features=256, out_features=256, bias=True)
          (1): Linear(in_features=256, out_features=256, bias=True)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): Linear(in_features=256, out_features=7, bias=True)
        )
      )
      (2): MLP(
        (layers): ModuleList(
          (0): Linear(in_features=256, out_features=256, bias=True)
          (1): Linear(in_features=256, out_features=256, bias=True)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): Linear(in_features=256, out_features=7, bias=True)
        )
      )
      (3): MLP(
        (layers): ModuleList(
          (0): Linear(in_features=256, out_features=256, bias=True)
          (1): Linear(in_features=256, out_features=256, bias=True)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): Linear(in_features=256, out_features=7, bias=True)
        )
      )
    )
    (grid_pos_embeded): MLP(
      (layers): ModuleList(
        (0): Linear(in_features=3, out_features=256, bias=True)
        (1): Linear(in_features=256, out_features=256, bias=True)
      )
    )
  )
)
2022-10-12 19:31:24,882   INFO  **********************Start training waymo_models/mppnet_4frames.v2(default)**********************
epochs:   0%|                                                                                                 | 0/6 [00:05<?, ?it/s]
Traceback (most recent call last):                                                                        | 0/79041 [00:00<?, ?it/s]
  File "train.py", line 221, in <module>
    main()
  File "train.py", line 190, in main
    show_gpu_stat=not args.wo_gpu_stat
  File "/work/dl-user/OpenPCDet/tools/train_utils/train_utils.py", line 162, in train_model
    show_gpu_stat=show_gpu_stat
  File "/work/dl-user/OpenPCDet/tools/train_utils/train_utils.py", line 51, in train_one_epoch
    loss, tb_dict, disp_dict = model_func(model, batch)
  File "../pcdet/models/__init__.py", line 42, in model_func
    ret_dict, tb_dict, disp_dict = model(batch_dict)
  File "/work/dl-user/OpenPCDet/venv/.venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "../pcdet/models/detectors/mppnet.py", line 20, in forward
    batch_dict = cur_module(batch_dict)
  File "/work/dl-user/OpenPCDet/venv/.venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "../pcdet/models/roi_heads/mppnet_head.py", line 704, in forward
    src = self.crop_previous_frame_points(src, batch_size,trajectory_rois, num_rois,valid_length,batch_dict)
  File "../pcdet/models/roi_heads/mppnet_head.py", line 545, in crop_previous_frame_points
    src[bs_idx, roi_box_idx, self.num_lidar_points*idx:self.num_lidar_points*(idx+1), :] = cur_roi_points_sample
RuntimeError: The expanded size of the tensor (5) must match the existing size (4) at non-singleton dimension 1.  Target sizes: [128, 5].  Tensor sizes: [128, 4]

Maybe some parts of MPPnet is hard coded for the 6 dimensional inputs. Could you give me some hints to solve this issue?

Cedarch commented 1 year ago

Hi, you can change the dim of points here https://github.com/open-mmlab/OpenPCDet/blob/f221374a5cb9398fd089aa7194732c808c700355/pcdet/models/roi_heads/mppnet_head.py#L700

kan-bayashi commented 1 year ago

I solved this issue by changing the following parts to refer the point dimension. https://github.com/open-mmlab/OpenPCDet/blob/f221374a5cb9398fd089aa7194732c808c700355/pcdet/models/roi_heads/mppnet_head.py#L320 https://github.com/open-mmlab/OpenPCDet/blob/f221374a5cb9398fd089aa7194732c808c700355/pcdet/models/roi_heads/mppnet_head.py#L486 https://github.com/open-mmlab/OpenPCDet/blob/f221374a5cb9398fd089aa7194732c808c700355/pcdet/models/roi_heads/mppnet_head.py#L534 https://github.com/open-mmlab/OpenPCDet/blob/f221374a5cb9398fd089aa7194732c808c700355/pcdet/models/roi_heads/mppnet_head.py#L538 https://github.com/open-mmlab/OpenPCDet/blob/f221374a5cb9398fd089aa7194732c808c700355/pcdet/models/roi_heads/mppnet_head.py#L700

Thank you for your suggestion.

geoseb94 commented 1 year ago

Hi kan-bayashi, I am facing the same issue. I am trying to run MPPNet on a custom dataset, which doesn't have 'elongation' feature. My config is as below.

POINT_FEATURE_ENCODING: {
    encoding_type: absolute_coordinates_encoding,
    used_feature_list: ['x', 'y', 'z', 'intensity', 'time'],
    src_feature_list: ['x', 'y', 'z', 'intensity', 'time'],
}

Thanks for pointing to the 5 lines in the mppnet_head.py, which requires further modification. Could you please let me know the exact modification you made to run the network?

Also, if you have some insights on an another issue https://github.com/open-mmlab/OpenPCDet/issues/1291, I would be really grateful!

Tottowich commented 1 year ago

@geoseb94 Hi! I am also looking into using a MPPNet model on a custom, KITTI format, dataset. Have you made any progress on this? I also have ['x', 'y', 'z', 'intensity', 'time'].