[Problem] Training with custom data reaches maximum recursion

leihui6 commented 2 years ago

Hi,

My environment is as follows:

torch              1.8.2+cu111
torchaudio         0.8.2
torchvision        0.9.2+cu111
RTX3090
Ubuntu 18.04

pcdet installed successfully, and demo runs perfectly. However, when training my own dataset here is the error:

error when training

### bash python train.py --cfg_file cfgs/custom_models/pv_rcnn.yaml --batch_size 64 2022-09-13 13:13:51,560 INFO **********************Start logging********************** 2022-09-13 13:13:51,560 INFO CUDA_VISIBLE_DEVICES=ALL 2022-09-13 13:13:51,560 INFO cfg_file cfgs/custom_models/pv_rcnn.yaml 2022-09-13 13:13:51,560 INFO batch_size 64 2022-09-13 13:13:51,560 INFO epochs 80 2022-09-13 13:13:51,560 INFO workers 4 2022-09-13 13:13:51,560 INFO extra_tag default 2022-09-13 13:13:51,560 INFO ckpt None 2022-09-13 13:13:51,560 INFO pretrained_model None 2022-09-13 13:13:51,560 INFO launcher none 2022-09-13 13:13:51,560 INFO tcp_port 18888 2022-09-13 13:13:51,560 INFO sync_bn False 2022-09-13 13:13:51,560 INFO fix_random_seed False 2022-09-13 13:13:51,560 INFO ckpt_save_interval 1 2022-09-13 13:13:51,561 INFO local_rank 0 2022-09-13 13:13:51,561 INFO max_ckpt_save_num 30 2022-09-13 13:13:51,561 INFO merge_all_iters_to_one_epoch False 2022-09-13 13:13:51,561 INFO set_cfgs None 2022-09-13 13:13:51,561 INFO max_waiting_mins 0 2022-09-13 13:13:51,561 INFO start_epoch 0 2022-09-13 13:13:51,561 INFO num_epochs_to_eval 0 2022-09-13 13:13:51,561 INFO save_to_file False 2022-09-13 13:13:51,561 INFO use_tqdm_to_record False 2022-09-13 13:13:51,561 INFO logger_iter_interval 50 2022-09-13 13:13:51,561 INFO ckpt_save_time_interval 300 2022-09-13 13:13:51,561 INFO wo_gpu_stat False 2022-09-13 13:13:51,561 INFO cfg.ROOT_DIR: /home/threed-detection/Desktop/OpenPCDet/OpenPCDet2/OpenPCDet 2022-09-13 13:13:51,561 INFO cfg.LOCAL_RANK: 0 2022-09-13 13:13:51,561 INFO cfg.CLASS_NAMES: ['Vehicle'] 2022-09-13 13:13:51,561 INFO cfg.DATA_CONFIG = edict() 2022-09-13 13:13:51,561 INFO cfg.DATA_CONFIG.DATASET: CustomDataset 2022-09-13 13:13:51,561 INFO cfg.DATA_CONFIG.DATA_PATH: ../data/custom 2022-09-13 13:13:51,561 INFO cfg.DATA_CONFIG.POINT_CLOUD_RANGE: [-4.8, -4.8, -2, 4.8, 4.8, 4] 2022-09-13 13:13:51,561 INFO cfg.DATA_CONFIG.MAP_CLASS_TO_KITTI = edict() 2022-09-13 13:13:51,561 INFO cfg.DATA_CONFIG.MAP_CLASS_TO_KITTI.Vehicle: Car 2022-09-13 13:13:51,561 INFO cfg.DATA_CONFIG.DATA_SPLIT = edict() 2022-09-13 13:13:51,561 INFO cfg.DATA_CONFIG.DATA_SPLIT.train: train 2022-09-13 13:13:51,561 INFO cfg.DATA_CONFIG.DATA_SPLIT.test: val 2022-09-13 13:13:51,561 INFO cfg.DATA_CONFIG.INFO_PATH = edict() 2022-09-13 13:13:51,561 INFO cfg.DATA_CONFIG.INFO_PATH.train: ['custom_infos_train.pkl'] 2022-09-13 13:13:51,561 INFO cfg.DATA_CONFIG.INFO_PATH.test: ['custom_infos_val.pkl'] 2022-09-13 13:13:51,561 INFO cfg.DATA_CONFIG.POINT_FEATURE_ENCODING = edict() 2022-09-13 13:13:51,561 INFO cfg.DATA_CONFIG.POINT_FEATURE_ENCODING.encoding_type: absolute_coordinates_encoding 2022-09-13 13:13:51,561 INFO cfg.DATA_CONFIG.POINT_FEATURE_ENCODING.used_feature_list: ['x', 'y', 'z', 'intensity'] 2022-09-13 13:13:51,561 INFO cfg.DATA_CONFIG.POINT_FEATURE_ENCODING.src_feature_list: ['x', 'y', 'z', 'intensity'] 2022-09-13 13:13:51,561 INFO cfg.DATA_CONFIG.DATA_AUGMENTOR = edict() 2022-09-13 13:13:51,561 INFO cfg.DATA_CONFIG.DATA_AUGMENTOR.DISABLE_AUG_LIST: ['placeholder'] 2022-09-13 13:13:51,562 INFO cfg.DATA_CONFIG.DATA_AUGMENTOR.AUG_CONFIG_LIST: [{'NAME': 'gt_sampling', 'USE_ROAD_PLANE': False, 'DB_INFO_PATH': ['custom_dbinfos_train.pkl'], 'PREPARE': {'filter_by_min_points': ['Vehicle:5']}, 'SAMPLE_GROUPS': ['Vehicle:20'], 'NUM_POINT_FEATURES': 4, 'DATABASE_WITH_FAKELIDAR': False, 'REMOVE_EXTRA_WIDTH': [0.0, 0.0, 0.0], 'LIMIT_WHOLE_SCENE': True}, {'NAME': 'random_world_flip', 'ALONG_AXIS_LIST': ['x', 'y']}, {'NAME': 'random_world_rotation', 'WORLD_ROT_ANGLE': [-0.78539816, 0.78539816]}, {'NAME': 'random_world_scaling', 'WORLD_SCALE_RANGE': [0.95, 1.05]}] 2022-09-13 13:13:51,562 INFO cfg.DATA_CONFIG.DATA_PROCESSOR: [{'NAME': 'mask_points_and_boxes_outside_range', 'REMOVE_OUTSIDE_BOXES': True}, {'NAME': 'shuffle_points', 'SHUFFLE_ENABLED': {'train': True, 'test': False}}, {'NAME': 'transform_points_to_voxels', 'VOXEL_SIZE': [0.5, 0.5, 0.15], 'MAX_POINTS_PER_VOXEL': 5, 'MAX_NUMBER_OF_VOXELS': {'train': 150000, 'test': 150000}}] 2022-09-13 13:13:51,562 INFO cfg.DATA_CONFIG._BASE_CONFIG_: cfgs/dataset_configs/custom_dataset.yaml 2022-09-13 13:13:51,562 INFO cfg.MODEL = edict() 2022-09-13 13:13:51,562 INFO cfg.MODEL.NAME: PVRCNN 2022-09-13 13:13:51,562 INFO cfg.MODEL.VFE = edict() 2022-09-13 13:13:51,562 INFO cfg.MODEL.VFE.NAME: MeanVFE 2022-09-13 13:13:51,562 INFO cfg.MODEL.BACKBONE_3D = edict() 2022-09-13 13:13:51,562 INFO cfg.MODEL.BACKBONE_3D.NAME: VoxelBackBone8x 2022-09-13 13:13:51,562 INFO cfg.MODEL.MAP_TO_BEV = edict() 2022-09-13 13:13:51,562 INFO cfg.MODEL.MAP_TO_BEV.NAME: HeightCompression 2022-09-13 13:13:51,562 INFO cfg.MODEL.MAP_TO_BEV.NUM_BEV_FEATURES: 256 2022-09-13 13:13:51,562 INFO cfg.MODEL.BACKBONE_2D = edict() 2022-09-13 13:13:51,562 INFO cfg.MODEL.BACKBONE_2D.NAME: BaseBEVBackbone 2022-09-13 13:13:51,562 INFO cfg.MODEL.BACKBONE_2D.LAYER_NUMS: [5, 5] 2022-09-13 13:13:51,562 INFO cfg.MODEL.BACKBONE_2D.LAYER_STRIDES: [1, 2] 2022-09-13 13:13:51,562 INFO cfg.MODEL.BACKBONE_2D.NUM_FILTERS: [128, 256] 2022-09-13 13:13:51,562 INFO cfg.MODEL.BACKBONE_2D.UPSAMPLE_STRIDES: [1, 2] 2022-09-13 13:13:51,562 INFO cfg.MODEL.BACKBONE_2D.NUM_UPSAMPLE_FILTERS: [256, 256] 2022-09-13 13:13:51,562 INFO cfg.MODEL.DENSE_HEAD = edict() 2022-09-13 13:13:51,562 INFO cfg.MODEL.DENSE_HEAD.NAME: AnchorHeadSingle 2022-09-13 13:13:51,562 INFO cfg.MODEL.DENSE_HEAD.CLASS_AGNOSTIC: False 2022-09-13 13:13:51,562 INFO cfg.MODEL.DENSE_HEAD.USE_DIRECTION_CLASSIFIER: True 2022-09-13 13:13:51,562 INFO cfg.MODEL.DENSE_HEAD.DIR_OFFSET: 0.78539 2022-09-13 13:13:51,562 INFO cfg.MODEL.DENSE_HEAD.DIR_LIMIT_OFFSET: 0.0 2022-09-13 13:13:51,562 INFO cfg.MODEL.DENSE_HEAD.NUM_DIR_BINS: 2 2022-09-13 13:13:51,562 INFO cfg.MODEL.DENSE_HEAD.ANCHOR_GENERATOR_CONFIG: [{'class_name': 'Vehicle', 'anchor_sizes': [[3.9, 1.6, 1.56]], 'anchor_rotations': [0, 1.57], 'anchor_bottom_heights': [0], 'align_center': False, 'feature_map_stride': 8, 'matched_threshold': 0.55, 'unmatched_threshold': 0.4}] 2022-09-13 13:13:51,562 INFO cfg.MODEL.DENSE_HEAD.TARGET_ASSIGNER_CONFIG = edict() 2022-09-13 13:13:51,562 INFO cfg.MODEL.DENSE_HEAD.TARGET_ASSIGNER_CONFIG.NAME: AxisAlignedTargetAssigner 2022-09-13 13:13:51,562 INFO cfg.MODEL.DENSE_HEAD.TARGET_ASSIGNER_CONFIG.POS_FRACTION: -1.0 2022-09-13 13:13:51,562 INFO cfg.MODEL.DENSE_HEAD.TARGET_ASSIGNER_CONFIG.SAMPLE_SIZE: 512 2022-09-13 13:13:51,562 INFO cfg.MODEL.DENSE_HEAD.TARGET_ASSIGNER_CONFIG.NORM_BY_NUM_EXAMPLES: False 2022-09-13 13:13:51,563 INFO cfg.MODEL.DENSE_HEAD.TARGET_ASSIGNER_CONFIG.MATCH_HEIGHT: False 2022-09-13 13:13:51,563 INFO cfg.MODEL.DENSE_HEAD.TARGET_ASSIGNER_CONFIG.BOX_CODER: ResidualCoder 2022-09-13 13:13:51,563 INFO cfg.MODEL.DENSE_HEAD.LOSS_CONFIG = edict() 2022-09-13 13:13:51,563 INFO cfg.MODEL.DENSE_HEAD.LOSS_CONFIG.LOSS_WEIGHTS = edict() 2022-09-13 13:13:51,563 INFO cfg.MODEL.DENSE_HEAD.LOSS_CONFIG.LOSS_WEIGHTS.cls_weight: 1.0 2022-09-13 13:13:51,563 INFO cfg.MODEL.DENSE_HEAD.LOSS_CONFIG.LOSS_WEIGHTS.loc_weight: 2.0 2022-09-13 13:13:51,563 INFO cfg.MODEL.DENSE_HEAD.LOSS_CONFIG.LOSS_WEIGHTS.dir_weight: 0.2 2022-09-13 13:13:51,563 INFO cfg.MODEL.DENSE_HEAD.LOSS_CONFIG.LOSS_WEIGHTS.code_weights: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE = edict() 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.NAME: VoxelSetAbstraction 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.POINT_SOURCE: raw_points 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.NUM_KEYPOINTS: 4096 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.NUM_OUTPUT_FEATURES: 128 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.SAMPLE_METHOD: FPS 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.FEATURES_SOURCE: ['bev', 'x_conv3', 'x_conv4', 'raw_points'] 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.SA_LAYER = edict() 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.SA_LAYER.raw_points = edict() 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.SA_LAYER.raw_points.MLPS: [[16, 16], [16, 16]] 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.SA_LAYER.raw_points.POOL_RADIUS: [0.4, 0.8] 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.SA_LAYER.raw_points.NSAMPLE: [16, 16] 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.SA_LAYER.x_conv1 = edict() 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.SA_LAYER.x_conv1.DOWNSAMPLE_FACTOR: 1 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.SA_LAYER.x_conv1.MLPS: [[16, 16], [16, 16]] 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.SA_LAYER.x_conv1.POOL_RADIUS: [0.4, 0.8] 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.SA_LAYER.x_conv1.NSAMPLE: [16, 16] 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.SA_LAYER.x_conv2 = edict() 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.SA_LAYER.x_conv2.DOWNSAMPLE_FACTOR: 2 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.SA_LAYER.x_conv2.MLPS: [[32, 32], [32, 32]] 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.SA_LAYER.x_conv2.POOL_RADIUS: [0.8, 1.2] 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.SA_LAYER.x_conv2.NSAMPLE: [16, 32] 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.SA_LAYER.x_conv3 = edict() 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.SA_LAYER.x_conv3.DOWNSAMPLE_FACTOR: 4 2022-09-13 13:13:51,563 INFO cfg.MODEL.PFE.SA_LAYER.x_conv3.MLPS: [[64, 64], [64, 64]] 2022-09-13 13:13:51,564 INFO cfg.MODEL.PFE.SA_LAYER.x_conv3.POOL_RADIUS: [1.2, 2.4] 2022-09-13 13:13:51,564 INFO cfg.MODEL.PFE.SA_LAYER.x_conv3.NSAMPLE: [16, 32] 2022-09-13 13:13:51,564 INFO cfg.MODEL.PFE.SA_LAYER.x_conv4 = edict() 2022-09-13 13:13:51,564 INFO cfg.MODEL.PFE.SA_LAYER.x_conv4.DOWNSAMPLE_FACTOR: 8 2022-09-13 13:13:51,564 INFO cfg.MODEL.PFE.SA_LAYER.x_conv4.MLPS: [[64, 64], [64, 64]] 2022-09-13 13:13:51,564 INFO cfg.MODEL.PFE.SA_LAYER.x_conv4.POOL_RADIUS: [2.4, 4.8] 2022-09-13 13:13:51,564 INFO cfg.MODEL.PFE.SA_LAYER.x_conv4.NSAMPLE: [16, 32] 2022-09-13 13:13:51,564 INFO cfg.MODEL.POINT_HEAD = edict() 2022-09-13 13:13:51,564 INFO cfg.MODEL.POINT_HEAD.NAME: PointHeadSimple 2022-09-13 13:13:51,564 INFO cfg.MODEL.POINT_HEAD.CLS_FC: [256, 256] 2022-09-13 13:13:51,564 INFO cfg.MODEL.POINT_HEAD.CLASS_AGNOSTIC: True 2022-09-13 13:13:51,564 INFO cfg.MODEL.POINT_HEAD.USE_POINT_FEATURES_BEFORE_FUSION: True 2022-09-13 13:13:51,564 INFO cfg.MODEL.POINT_HEAD.TARGET_CONFIG = edict() 2022-09-13 13:13:51,564 INFO cfg.MODEL.POINT_HEAD.TARGET_CONFIG.GT_EXTRA_WIDTH: [0.2, 0.2, 0.2] 2022-09-13 13:13:51,564 INFO cfg.MODEL.POINT_HEAD.LOSS_CONFIG = edict() 2022-09-13 13:13:51,564 INFO cfg.MODEL.POINT_HEAD.LOSS_CONFIG.LOSS_REG: smooth-l1 2022-09-13 13:13:51,564 INFO cfg.MODEL.POINT_HEAD.LOSS_CONFIG.LOSS_WEIGHTS = edict() 2022-09-13 13:13:51,564 INFO cfg.MODEL.POINT_HEAD.LOSS_CONFIG.LOSS_WEIGHTS.point_cls_weight: 1.0 2022-09-13 13:13:51,564 INFO cfg.MODEL.ROI_HEAD = edict() 2022-09-13 13:13:51,564 INFO cfg.MODEL.ROI_HEAD.NAME: PVRCNNHead 2022-09-13 13:13:51,564 INFO cfg.MODEL.ROI_HEAD.CLASS_AGNOSTIC: True 2022-09-13 13:13:51,564 INFO cfg.MODEL.ROI_HEAD.SHARED_FC: [256, 256] 2022-09-13 13:13:51,564 INFO cfg.MODEL.ROI_HEAD.CLS_FC: [256, 256] 2022-09-13 13:13:51,564 INFO cfg.MODEL.ROI_HEAD.REG_FC: [256, 256] 2022-09-13 13:13:51,564 INFO cfg.MODEL.ROI_HEAD.DP_RATIO: 0.3 2022-09-13 13:13:51,564 INFO cfg.MODEL.ROI_HEAD.NMS_CONFIG = edict() 2022-09-13 13:13:51,564 INFO cfg.MODEL.ROI_HEAD.NMS_CONFIG.TRAIN = edict() 2022-09-13 13:13:51,564 INFO cfg.MODEL.ROI_HEAD.NMS_CONFIG.TRAIN.NMS_TYPE: nms_gpu 2022-09-13 13:13:51,564 INFO cfg.MODEL.ROI_HEAD.NMS_CONFIG.TRAIN.MULTI_CLASSES_NMS: False 2022-09-13 13:13:51,564 INFO cfg.MODEL.ROI_HEAD.NMS_CONFIG.TRAIN.NMS_PRE_MAXSIZE: 9000 2022-09-13 13:13:51,564 INFO cfg.MODEL.ROI_HEAD.NMS_CONFIG.TRAIN.NMS_POST_MAXSIZE: 512 2022-09-13 13:13:51,564 INFO cfg.MODEL.ROI_HEAD.NMS_CONFIG.TRAIN.NMS_THRESH: 0.8 2022-09-13 13:13:51,564 INFO cfg.MODEL.ROI_HEAD.NMS_CONFIG.TEST = edict() 2022-09-13 13:13:51,564 INFO cfg.MODEL.ROI_HEAD.NMS_CONFIG.TEST.NMS_TYPE: nms_gpu 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.NMS_CONFIG.TEST.MULTI_CLASSES_NMS: False 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.NMS_CONFIG.TEST.NMS_PRE_MAXSIZE: 4096 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.NMS_CONFIG.TEST.NMS_POST_MAXSIZE: 300 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.NMS_CONFIG.TEST.NMS_THRESH: 0.85 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.ROI_GRID_POOL = edict() 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.ROI_GRID_POOL.GRID_SIZE: 6 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.ROI_GRID_POOL.MLPS: [[64, 64], [64, 64]] 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.ROI_GRID_POOL.POOL_RADIUS: [0.8, 1.6] 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.ROI_GRID_POOL.NSAMPLE: [16, 16] 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.ROI_GRID_POOL.POOL_METHOD: max_pool 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.TARGET_CONFIG = edict() 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.TARGET_CONFIG.BOX_CODER: ResidualCoder 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.TARGET_CONFIG.ROI_PER_IMAGE: 128 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.TARGET_CONFIG.FG_RATIO: 0.5 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.TARGET_CONFIG.SAMPLE_ROI_BY_EACH_CLASS: True 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.TARGET_CONFIG.CLS_SCORE_TYPE: roi_iou 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.TARGET_CONFIG.CLS_FG_THRESH: 0.75 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.TARGET_CONFIG.CLS_BG_THRESH: 0.25 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.TARGET_CONFIG.CLS_BG_THRESH_LO: 0.1 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.TARGET_CONFIG.HARD_BG_RATIO: 0.8 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.TARGET_CONFIG.REG_FG_THRESH: 0.55 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.LOSS_CONFIG = edict() 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.LOSS_CONFIG.CLS_LOSS: BinaryCrossEntropy 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.LOSS_CONFIG.REG_LOSS: smooth-l1 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.LOSS_CONFIG.CORNER_LOSS_REGULARIZATION: True 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.LOSS_CONFIG.LOSS_WEIGHTS = edict() 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.LOSS_CONFIG.LOSS_WEIGHTS.rcnn_cls_weight: 1.0 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.LOSS_CONFIG.LOSS_WEIGHTS.rcnn_reg_weight: 1.0 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.LOSS_CONFIG.LOSS_WEIGHTS.rcnn_corner_weight: 1.0 2022-09-13 13:13:51,565 INFO cfg.MODEL.ROI_HEAD.LOSS_CONFIG.LOSS_WEIGHTS.code_weights: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] 2022-09-13 13:13:51,565 INFO cfg.MODEL.POST_PROCESSING = edict() 2022-09-13 13:13:51,565 INFO cfg.MODEL.POST_PROCESSING.RECALL_THRESH_LIST: [0.3, 0.5, 0.7] 2022-09-13 13:13:51,565 INFO cfg.MODEL.POST_PROCESSING.SCORE_THRESH: 0.1 2022-09-13 13:13:51,565 INFO cfg.MODEL.POST_PROCESSING.OUTPUT_RAW_SCORE: False 2022-09-13 13:13:51,566 INFO cfg.MODEL.POST_PROCESSING.EVAL_METRIC: kitti 2022-09-13 13:13:51,566 INFO cfg.MODEL.POST_PROCESSING.NMS_CONFIG = edict() 2022-09-13 13:13:51,566 INFO cfg.MODEL.POST_PROCESSING.NMS_CONFIG.MULTI_CLASSES_NMS: False 2022-09-13 13:13:51,566 INFO cfg.MODEL.POST_PROCESSING.NMS_CONFIG.NMS_TYPE: nms_gpu 2022-09-13 13:13:51,566 INFO cfg.MODEL.POST_PROCESSING.NMS_CONFIG.NMS_THRESH: 0.1 2022-09-13 13:13:51,566 INFO cfg.MODEL.POST_PROCESSING.NMS_CONFIG.NMS_PRE_MAXSIZE: 4096 2022-09-13 13:13:51,566 INFO cfg.MODEL.POST_PROCESSING.NMS_CONFIG.NMS_POST_MAXSIZE: 500 2022-09-13 13:13:51,566 INFO cfg.OPTIMIZATION = edict() 2022-09-13 13:13:51,566 INFO cfg.OPTIMIZATION.BATCH_SIZE_PER_GPU: 2 2022-09-13 13:13:51,566 INFO cfg.OPTIMIZATION.NUM_EPOCHS: 80 2022-09-13 13:13:51,566 INFO cfg.OPTIMIZATION.OPTIMIZER: adam_onecycle 2022-09-13 13:13:51,566 INFO cfg.OPTIMIZATION.LR: 0.01 2022-09-13 13:13:51,566 INFO cfg.OPTIMIZATION.WEIGHT_DECAY: 0.01 2022-09-13 13:13:51,566 INFO cfg.OPTIMIZATION.MOMENTUM: 0.9 2022-09-13 13:13:51,566 INFO cfg.OPTIMIZATION.MOMS: [0.95, 0.85] 2022-09-13 13:13:51,566 INFO cfg.OPTIMIZATION.PCT_START: 0.4 2022-09-13 13:13:51,566 INFO cfg.OPTIMIZATION.DIV_FACTOR: 10 2022-09-13 13:13:51,566 INFO cfg.OPTIMIZATION.DECAY_STEP_LIST: [35, 45] 2022-09-13 13:13:51,566 INFO cfg.OPTIMIZATION.LR_DECAY: 0.1 2022-09-13 13:13:51,566 INFO cfg.OPTIMIZATION.LR_CLIP: 1e-07 2022-09-13 13:13:51,566 INFO cfg.OPTIMIZATION.LR_WARMUP: False 2022-09-13 13:13:51,566 INFO cfg.OPTIMIZATION.WARMUP_EPOCH: 1 2022-09-13 13:13:51,566 INFO cfg.OPTIMIZATION.GRAD_NORM_CLIP: 10 2022-09-13 13:13:51,566 INFO cfg.TAG: pv_rcnn 2022-09-13 13:13:51,566 INFO cfg.EXP_GROUP_PATH: custom_models 2022-09-13 13:13:51,578 INFO Database filter by min points Vehicle: 22 => 22 2022-09-13 13:13:51,578 INFO Loading Custom dataset. 2022-09-13 13:13:51,579 INFO Total samples for CUSTOM dataset: 22 2022-09-13 13:13:53,776 INFO PVRCNN( (vfe): MeanVFE() (backbone_3d): VoxelBackBone8x( (conv_input): SparseSequential( (0): SubMConv3d(4, 16, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.MaskImplicitGemm) (1): BatchNorm1d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (2): ReLU() ) (conv1): SparseSequential( (0): SparseSequential( (0): SubMConv3d(16, 16, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.MaskImplicitGemm) (1): BatchNorm1d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (2): ReLU() ) ) (conv2): SparseSequential( (0): SparseSequential( (0): SparseConv3d(16, 32, kernel_size=[3, 3, 3], stride=[2, 2, 2], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.MaskImplicitGemm) (1): BatchNorm1d(32, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (2): ReLU() ) (1): SparseSequential( (0): SubMConv3d(32, 32, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.MaskImplicitGemm) (1): BatchNorm1d(32, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (2): ReLU() ) (2): SparseSequential( (0): SubMConv3d(32, 32, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.MaskImplicitGemm) (1): BatchNorm1d(32, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (2): ReLU() ) ) (conv3): SparseSequential( (0): SparseSequential( (0): SparseConv3d(32, 64, kernel_size=[3, 3, 3], stride=[2, 2, 2], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.MaskImplicitGemm) (1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (2): ReLU() ) (1): SparseSequential( (0): SubMConv3d(64, 64, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.MaskImplicitGemm) (1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (2): ReLU() ) (2): SparseSequential( (0): SubMConv3d(64, 64, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.MaskImplicitGemm) (1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (2): ReLU() ) ) (conv4): SparseSequential( (0): SparseSequential( (0): SparseConv3d(64, 64, kernel_size=[3, 3, 3], stride=[2, 2, 2], padding=[0, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.MaskImplicitGemm) (1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (2): ReLU() ) (1): SparseSequential( (0): SubMConv3d(64, 64, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.MaskImplicitGemm) (1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (2): ReLU() ) (2): SparseSequential( (0): SubMConv3d(64, 64, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.MaskImplicitGemm) (1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (2): ReLU() ) ) (conv_out): SparseSequential( (0): SparseConv3d(64, 128, kernel_size=[3, 1, 1], stride=[2, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.MaskImplicitGemm) (1): BatchNorm1d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (2): ReLU() ) ) (map_to_bev_module): HeightCompression() (pfe): VoxelSetAbstraction( (SA_layers): ModuleList( (0): StackSAModuleMSG( (groupers): ModuleList( (0): QueryAndGroup() (1): QueryAndGroup() ) (mlps): ModuleList( (0): Sequential( (0): Conv2d(67, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() (3): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ReLU() ) (1): Sequential( (0): Conv2d(67, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() (3): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ReLU() ) ) ) (1): StackSAModuleMSG( (groupers): ModuleList( (0): QueryAndGroup() (1): QueryAndGroup() ) (mlps): ModuleList( (0): Sequential( (0): Conv2d(67, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() (3): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ReLU() ) (1): Sequential( (0): Conv2d(67, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() (3): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ReLU() ) ) ) ) (SA_rawpoints): StackSAModuleMSG( (groupers): ModuleList( (0): QueryAndGroup() (1): QueryAndGroup() ) (mlps): ModuleList( (0): Sequential( (0): Conv2d(4, 16, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() (3): Conv2d(16, 16, kernel_size=(1, 1), stride=(1, 1), bias=False) (4): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ReLU() ) (1): Sequential( (0): Conv2d(4, 16, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() (3): Conv2d(16, 16, kernel_size=(1, 1), stride=(1, 1), bias=False) (4): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ReLU() ) ) ) (vsa_point_feature_fusion): Sequential( (0): Linear(in_features=544, out_features=128, bias=False) (1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() ) ) (backbone_2d): BaseBEVBackbone( (blocks): ModuleList( (0): Sequential( (0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0) (1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), bias=False) (2): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (3): ReLU() (4): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (5): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (6): ReLU() (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (8): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (9): ReLU() (10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (11): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (12): ReLU() (13): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (14): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (15): ReLU() (16): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (17): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (18): ReLU() ) (1): Sequential( (0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0) (1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), bias=False) (2): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (3): ReLU() (4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (5): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (6): ReLU() (7): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (8): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (9): ReLU() (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (11): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (12): ReLU() (13): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (14): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (15): ReLU() (16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (17): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (18): ReLU() ) ) (deblocks): ModuleList( (0): Sequential( (0): ConvTranspose2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (2): ReLU() ) (1): Sequential( (0): ConvTranspose2d(256, 256, kernel_size=(2, 2), stride=(2, 2), bias=False) (1): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (2): ReLU() ) ) ) (dense_head): AnchorHeadSingle( (cls_loss_func): SigmoidFocalClassificationLoss() (reg_loss_func): WeightedSmoothL1Loss() (dir_loss_func): WeightedCrossEntropyLoss() (conv_cls): Conv2d(512, 2, kernel_size=(1, 1), stride=(1, 1)) (conv_box): Conv2d(512, 14, kernel_size=(1, 1), stride=(1, 1)) (conv_dir_cls): Conv2d(512, 4, kernel_size=(1, 1), stride=(1, 1)) ) (point_head): PointHeadSimple( (cls_loss_func): SigmoidFocalClassificationLoss() (cls_layers): Sequential( (0): Linear(in_features=544, out_features=256, bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() (3): Linear(in_features=256, out_features=256, bias=False) (4): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ReLU() (6): Linear(in_features=256, out_features=1, bias=True) ) ) (roi_head): PVRCNNHead( (proposal_target_layer): ProposalTargetLayer() (reg_loss_func): WeightedSmoothL1Loss() (roi_grid_pool_layer): StackSAModuleMSG( (groupers): ModuleList( (0): QueryAndGroup() (1): QueryAndGroup() ) (mlps): ModuleList( (0): Sequential( (0): Conv2d(131, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() (3): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ReLU() ) (1): Sequential( (0): Conv2d(131, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() (3): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ReLU() ) ) ) (shared_fc_layer): Sequential( (0): Conv1d(27648, 256, kernel_size=(1,), stride=(1,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() (3): Dropout(p=0.3, inplace=False) (4): Conv1d(256, 256, kernel_size=(1,), stride=(1,), bias=False) (5): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (6): ReLU() ) (cls_layers): Sequential( (0): Conv1d(256, 256, kernel_size=(1,), stride=(1,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() (3): Dropout(p=0.3, inplace=False) (4): Conv1d(256, 256, kernel_size=(1,), stride=(1,), bias=False) (5): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (6): ReLU() (7): Conv1d(256, 1, kernel_size=(1,), stride=(1,)) ) (reg_layers): Sequential( (0): Conv1d(256, 256, kernel_size=(1,), stride=(1,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() (3): Dropout(p=0.3, inplace=False) (4): Conv1d(256, 256, kernel_size=(1,), stride=(1,), bias=False) (5): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (6): ReLU() (7): Conv1d(256, 7, kernel_size=(1,), stride=(1,)) ) ) ) ... ... return self.__getitem__(new_index) File "../pcdet/datasets/custom/custom_dataset.py", line 111, in __getitem__ File "../pcdet/datasets/dataset.py", line 160, in prepare_data data_dict = self.data_augmentor.forward( File "../pcdet/datasets/augmentor/data_augmentor.py", line 266, in forward data_dict = cur_augmentor(data_dict=data_dict) File "../pcdet/datasets/augmentor/data_augmentor.py", line 49, in random_world_flip gt_boxes, points, enable = getattr(augmentor_utils, 'random_flip_along_%s' % cur_axis)( File "../pcdet/datasets/augmentor/augmentor_utils.py", line 16, in random_flip_along_x enable = np.random.choice([False, True], replace=False, p=[0.5, 0.5]) File "mtrand.pyx", line 971, in numpy.random.mtrand.RandomState.choice File "<__array_function__ internals>", line 180, in count_nonzero File "/home/threed-detection/anaconda3/envs/openpcdet_py385/lib/python3.8/site-packages/numpy/core/numeric.py", line 492, in count_nonzero return multiarray.count_nonzero(a) RecursionError: maximum recursion depth exceeded while calling a Python object ```

Sorry for the error log is too long, the command I used for training is:

python -m pcdet.datasets.custom.custom_dataset create_custom_infos tools/cfgs/dataset_configs/custom_dataset.yaml 
python train.py --cfg_file cfgs/custom_models/pv_rcnn.yaml  --bach_size 64

Anyone has ideas what is going on? btw, I set the maximum recursion by sys.setrecursionlimit(99999) but it shows RuntimeError: DataLoader worker (pid 13400) is killed by signal: Segmentation fault. after long time no-response.

@jihanyang @OrangeSodahub I also found there was the same situation in issure#1 issure#2

Thanks in advance!

jihanyang commented 2 years ago

Try to check the number of gt_boxes when call getitem in custom_dataset.py.

leihui6 commented 2 years ago

@jihanyang Thanks for your reply. I see, I checked the gt_boxes in custom_dataset.py as below

        if 'annos' in info:
            annos = info['annos']
            annos = common_utils.drop_info_with_name(annos, name='DontCare')
            gt_names = annos['name']
            gt_boxes_lidar = annos['gt_boxes_lidar']

            print(f"###########{index}")
            print(annos)
            print(gt_names)
            print(gt_boxes_lidar)
            print("###########")
            #exit()

            input_dict.update({
                'gt_names': gt_names,
                'gt_boxes': gt_boxes_lidar
            })

and the result is

...
{'name': array(['Vehicle'], dtype='<U7'), 'gt_boxes_lidar': array([[ 9.19, -0.59, -0.59,  3.09,  1.46,  2.54,  2.08]], dtype=float32)}
['Vehicle']
[[ 9.19 -0.59 -0.59  3.09  1.46  2.54  2.08]]
###########
###########16
{'name': array(['Vehicle'], dtype='<U7'), 'gt_boxes_lidar': array([[ 9.19, -0.59, -0.59,  3.09,  1.46,  2.54,  2.08]], dtype=float32)}
['Vehicle']
[[ 9.19 -0.59 -0.59  3.09  1.46  2.54  2.08]]
###########
###########8
epochs:   0%|                                                                                   | 0/80 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 221, in <module>
    main()
  File "train.py", line 168, in main
    train_model(
  File "/home/threed-detection/Desktop/OpenPCDet/OpenPCDet2/OpenPCDet/tools/train_utils/train_utils.py", line 150, in train_model
    accumulated_iter = train_one_epoch(
  File "/home/threed-detection/Desktop/OpenPCDet/OpenPCDet2/OpenPCDet/tools/train_utils/train_utils.py", line 30, in train_one_epoch
    batch = next(dataloader_iter)
  File "/home/threed-detection/anaconda3/envs/openpcdet_py385/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
    data = self._next_data()
  File "/home/threed-detection/anaconda3/envs/openpcdet_py385/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
    return self._process_data(data)
  File "/home/threed-detection/anaconda3/envs/openpcdet_py385/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
    data.reraise()
  File "/home/threed-detection/anaconda3/envs/openpcdet_py385/lib/python3.8/site-packages/torch/_utils.py", line 429, in reraise
    raise self.exc_type(msg)
RecursionError: Caught RecursionError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/threed-detection/anaconda3/envs/openpcdet_py385/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/threed-detection/anaconda3/envs/openpcdet_py385/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/threed-detection/anaconda3/envs/openpcdet_py385/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "../pcdet/datasets/custom/custom_dataset.py", line 120, in __getitem__
    data_dict = self.prepare_data(data_dict=input_dict)
  File "../pcdet/datasets/dataset.py", line 188, in prepare_data
    return self.__getitem__(new_index)

**so many __getiteam error**

I think the dataloader has no problem, what do you think?

jihanyang commented 2 years ago

No. If you check line 188 in dataset.py, you will find that the error casued by there are no any gt_boxes in all scenes. I think you can set a breakpoint before the line 186. I suspect it is caused by this line 156: https://github.com/open-mmlab/OpenPCDet/blob/bd96d39af2389478820b34db7ad0272e9bb205db/pcdet/datasets/dataset.py#L156 or line 170-171 https://github.com/open-mmlab/OpenPCDet/blob/bd96d39af2389478820b34db7ad0272e9bb205db/pcdet/datasets/dataset.py#L170-L171

leihui6 commented 2 years ago

Thanks again! @jihanyang

I checked the gt_boxes_mask and data_dict['gt_boxes'] & data_dict['gt_names'], all variable are not None. Here is the interesting point I found, I add check the data_dict before and after line 182:

https://github.com/open-mmlab/OpenPCDet/blob/bd96d39af2389478820b34db7ad0272e9bb205db/pcdet/datasets/dataset.py#L182

in my code

print("before data_processor ###########")
print("data_dict['gt_boxes']->",data_dict['gt_boxes'])
data_dict = self.data_processor.forward(
    data_dict=data_dict
)
print("after data_processor ########")   
print("data_dict['gt_boxes']->",data_dict['gt_boxes'])

The result is always None after this function as below

data_dict['gt_boxes']-> [[ 7.3673673  -5.6462927  -0.57821673  3.1345434   2.109984    2.698345
  -1.9615599   1.        ]]
data_dict['gt_boxes']-> [[ 7.6473007 -4.48231   -0.5679081  2.9742985  1.4053321  2.4448926
   1.6139538  1.       ]]
after data_processor ########
after data_processor ########
data_dict['gt_boxes']-> []
data_dict['gt_boxes']-> []
before data_processor ###########
data_dict['gt_boxes']-> [[ 8.786528   -1.7666688  -0.39309707  3.0366747   1.5822157   2.4961662
  -2.2125812   1.        ]]
before data_processor ###########
data_dict['gt_boxes']-> [[ 7.325734    0.0397209  -0.47262332  3.1747956   2.1370792   2.7329957
  -1.8075655   1.        ]]
after data_processor ########
data_dict['gt_boxes']-> []
after data_processor ########
data_dict['gt_boxes']-> []
before data_processor ###########
before data_processor ###########
data_dict['gt_boxes']-> [[-9.800245  -2.356237  -0.5687006  3.1950634  2.1507223  2.750443
  -2.3673973  1.       ]]
data_dict['gt_boxes']-> [[-7.090519   -2.0712912  -0.48674133  3.0080614   2.024844    2.589464
  -1.3435211   1.        ]]

Why is that? should I modify something in forward function or check my dataset again?

jihanyang commented 2 years ago

can you step into the data_processor and check which function cause that?

leihui6 commented 2 years ago

the full code is:

    def forward(self, data_dict):
        """
        Args:
            data_dict:
                points: (N, 3 + C_in)
                gt_boxes: optional, (N, 7 + C) [x, y, z, dx, dy, dz, heading, ...]
                gt_names: optional, (N), string
                ...

        Returns:
        """

        for cur_processor in self.data_processor_queue:
            data_dict = cur_processor(data_dict=data_dict)

        return data_dict

the data_dict before and after https://github.com/open-mmlab/OpenPCDet/blob/bd96d39af2389478820b34db7ad0272e9bb205db/pcdet/datasets/processor/data_processor.py#L209 is different

jihanyang commented 2 years ago

Just print the cur_processor and data_dict['gt_box'] before and after the data_dict = cur_processor(data_dict=data_dict) inside the loop.

leihui6 commented 2 years ago

[[ 8.110613   -0.43972045 -0.533743    3.1716654   2.134972    2.7303011
  -1.6260003   1.        ]]
[]
[]
[]
[]

code:

    def forward(self, data_dict):
        """
        Args:
            data_dict:
                points: (N, 3 + C_in)
                gt_boxes: optional, (N, 7 + C) [x, y, z, dx, dy, dz, heading, ...]
                gt_names: optional, (N), string
                ...

        Returns:
        """

        for cur_processor in self.data_processor_queue:
            print(data_dict['gt_boxes'])
            data_dict = cur_processor(data_dict=data_dict)
            print(data_dict['gt_boxes'])

        return data_dict

jihanyang commented 2 years ago

You should also print(cur_processor) and just step into it. It's obvious caused by the first processor and check what is the main problem. As the issue is not caused by the current codebase but your config or dataset prepare, I will close this issue.

leihui6 commented 2 years ago

Thanks for your insight thoughts on the bug, I look through the cur_processor function list and found the range of point cloud was set unreasonably. It can run smoothly right now. Thank you again! 👍 @jihanyang

open-mmlab / OpenPCDet

[Problem] Training with custom data reaches maximum recursion #1104