hailanyi / TED

Transformation-Equivariant 3D Object Detection for Autonomous Driving
https://arxiv.org/abs/2211.11962
Apache License 2.0
136 stars 32 forks source link

bev_pooling.py中的一些困惑 #33

Open Liu202209 opened 10 months ago

Liu202209 commented 10 months ago

image image 你好,我想问一下,第一个的网格生成不加x_stride / 2是为什么?(我尝试过,好像是报错,但能不能手动设置为70.6?) 第二个是ben_align里的,我看论文的理解是生成的网格点grid是一开始是和rot_num=0对齐的,那不是应该先back-forward到最初状态,然后for-ward到当前的rot_num吗?主要是我不是很理解grid通过for-ward到当前,然后又back-forwar到rot_num=0。

csjxchen commented 7 months ago

我也是对这个很有疑惑,逻辑上讲不通的感觉,你想明白了吗

csjxchen commented 7 months ago

@Liu202209 你调整完后还能复现吗,我调整完后反而不能复现了, Car AP@0.70, 0.70, 0.70: bbox AP:96.6737, 89.8503, 89.5701 bev AP:90.2488, 88.5818, 88.2505 3d AP:89.5764, 84.6657, 84.8719 aos AP:96.65, 89.78, 89.43 Car AP_R40@0.70, 0.70, 0.70: bbox AP:98.2332, 95.1310, 94.8786 bev AP:93.4819, 91.5662, 91.3079 3d AP:92.6654, 85.8230, 85.2598 aos AP:98.22, 95.02, 94.68 Car AP@0.70, 0.50, 0.50: bbox AP:96.6737, 89.8503, 89.5701 bev AP:96.8002, 95.4559, 95.8603 3d AP:96.7524, 95.3898, 89.5401 aos AP:96.65, 89.78, 89.43 Car AP_R40@0.70, 0.50, 0.50: bbox AP:98.2332, 95.1310, 94.8786 bev AP:98.3109, 97.2066, 96.7604 3d AP:98.1812, 97.1175, 94.9565 aos AP:98.22, 95.02, 94.68 调整之前是: Car AP@0.70, 0.70, 0.70: bbox AP:96.5080, 95.3536, 89.6044 bev AP:90.2641, 88.6183, 88.1891 3d AP:89.5928, 85.1168, 84.5518 aos AP:96.48, 95.11, 89.36 Car AP_R40@0.70, 0.70, 0.70: bbox AP:98.4402, 96.9759, 94.8656 bev AP:95.2964, 91.5174, 91.1605 3d AP:92.4918, 87.2046, 84.8723 aos AP:98.42, 96.74, 94.54 Car AP@0.70, 0.50, 0.50: bbox AP:96.5080, 95.3536, 89.6044 bev AP:96.6327, 95.3379, 95.7173 3d AP:96.5853, 95.2708, 89.5656 aos AP:96.48, 95.11, 89.36 Car AP_R40@0.70, 0.50, 0.50: bbox AP:98.4402, 96.9759, 94.8656 bev AP:98.5412, 97.1972, 96.6906 3d AP:98.5224, 97.1264, 94.9059 aos AP:98.42, 96.74, 94.54

Liu202209 commented 7 months ago

我没去复现他的单类别,我一直在弄多类别的(拿其他问题里出现的训练文件),我的调整完之后好像是差不多或者说更好一点,这两种情况都出现过,但是一直都没达到论文的精度

csjxchen commented 7 months ago

@Liu202209 我觉得这个对小目标的影响还是很大的 毕竟最后的spatial_feature对应的物理尺寸是8*0.05=0.4 cyclist 和 pedestrian的anchor尺寸也不过是[ 1.76, 0.6, 1.73 ]和[ 0.8, 0.6, 1.73 ] (顺序为lwh) 我多类别调整前的结果是 Car AP@0.70, 0.70, 0.70: bbox AP:98.1120, 94.6818, 89.4393 bev AP:90.4446, 88.7608, 88.0496 3d AP:89.7993, 84.5134, 79.0701 aos AP:98.05, 94.33, 89.03 Car AP_R40@0.70, 0.70, 0.70: bbox AP:99.2606, 96.6115, 94.4560 bev AP:96.2373, 91.5180, 90.8619 3d AP:93.0486, 85.7471, 82.7864 aos AP:99.20, 96.26, 93.94 Car AP@0.70, 0.50, 0.50: bbox AP:98.1120, 94.6818, 89.4393 bev AP:98.1685, 94.6936, 94.8741 3d AP:98.1228, 94.6105, 94.7638 aos AP:98.05, 94.33, 89.03 Car AP_R40@0.70, 0.50, 0.50: bbox AP:99.2606, 96.6115, 94.4560 bev AP:99.2534, 97.0615, 96.3965 3d AP:99.2321, 96.9621, 96.0263 aos AP:99.20, 96.26, 93.94 Pedestrian AP@0.50, 0.50, 0.50: bbox AP:71.8137, 68.6167, 64.1738 bev AP:68.8230, 62.4224, 57.9408 3d AP:66.8895, 59.6249, 54.6192 aos AP:64.34, 60.21, 56.02 Pedestrian AP_R40@0.50, 0.50, 0.50: bbox AP:73.4976, 69.0069, 64.2983 bev AP:68.9243, 62.4818, 57.1502 3d AP:66.3551, 59.7461, 53.9324 aos AP:64.94, 59.82, 55.25 Pedestrian AP@0.50, 0.25, 0.25: bbox AP:71.8137, 68.6167, 64.1738 bev AP:77.2878, 73.9435, 68.0667 3d AP:77.2714, 73.5384, 67.9978 aos AP:64.34, 60.21, 56.02 Pedestrian AP_R40@0.50, 0.25, 0.25: bbox AP:73.4976, 69.0069, 64.2983 bev AP:77.6491, 74.5224, 69.3100 3d AP:77.6281, 74.3388, 69.1776 aos AP:64.94, 59.82, 55.25 Cyclist AP@0.50, 0.50, 0.50: bbox AP:89.1586, 84.9098, 80.2125 bev AP:87.6822, 74.4189, 70.8076 3d AP:87.1630, 73.2488, 68.7283 aos AP:89.00, 82.43, 77.96 Cyclist AP_R40@0.50, 0.50, 0.50: bbox AP:93.6607, 86.8252, 82.0402 bev AP:91.8447, 76.1573, 71.6144 3d AP:91.1589, 74.3207, 69.2800 aos AP:93.47, 84.20, 79.53 Cyclist AP@0.50, 0.25, 0.25: bbox AP:89.1586, 84.9098, 80.2125 bev AP:88.2267, 81.6565, 76.8107 3d AP:88.2267, 81.6565, 76.8107 aos AP:89.00, 82.43, 77.96 Cyclist AP_R40@0.50, 0.25, 0.25: bbox AP:93.6607, 86.8252, 82.0402 bev AP:92.5861, 83.1353, 78.2893 3d AP:92.5861, 83.1353, 78.2893 aos AP:93.47, 84.20, 79.53 (rtx4090 x 4 per_gpu_batch ==2) 行人这一项差距蛮大的其实,我认为bev-align那一项影响会不小

csjxchen commented 7 months ago

单类别训练Car确实可以达到文章的精度,多类别感觉达不到了

shenglunch commented 7 months ago

@csjxchen

为什么我训练cyclist[ 1.76, 0.6, 1.73 ]和pedestrian[ 0.8, 0.6, 1.73 ]的所有精度都是0

Cyclist AP@0.50, 0.50, 0.50: bbox AP:0.0000, 0.0000, 0.0000 bev AP:0.0000, 0.0000, 0.0000 3d AP:0.0000, 0.0000, 0.0000 aos AP:0.00, 0.00, 0.00

请问哪个参数设置需要注意的么

csjxchen commented 7 months ago

CLASS_NAMES: ['Car', 'Pedestrian', 'Cyclist'] DATA_CONFIG: _BASECONFIG: cfgs/dataset_configs/kitti_dataset.yaml DATASET: 'KittiDataset' ROT_NUM: 3 USE_VAN: True

DATA_SPLIT: {
    'train': train,
    'test': val
}

INFO_PATH: {
    'train': [kitti_infos_train.pkl],
    'test': [kitti_infos_val.pkl],
}

DATA_AUGMENTOR:
    DISABLE_AUG_LIST: ['placeholder']
    AUG_CONFIG_LIST:
        - NAME: gt_sampling
          USE_ROAD_PLANE: True
          DB_INFO_PATH:
              - kitti_dbinfos_train.pkl
          PREPARE: {
              filter_by_min_points: ['Car:5', 'Pedestrian:5', 'Cyclist:5'],
              filter_by_difficulty: [-1],
          }

          SAMPLE_GROUPS: ['Car:15', 'Pedestrian:10', 'Cyclist:10']
          NUM_POINT_FEATURES: 4
          DATABASE_WITH_FAKELIDAR: False
          REMOVE_EXTRA_WIDTH: [0.0, 0.0, -0.2]
          LIMIT_WHOLE_SCENE: False

        - NAME: da_sampling
          USE_ROAD_PLANE: True
          DB_INFO_PATH:
            - kitti_dbinfos_train.pkl
          PREPARE: {
          filter_by_min_points: ['Car:5', 'Pedestrian:5', 'Cyclist:5'],
          filter_by_difficulty: [-1],
          }

          SAMPLE_GROUPS: ['Car:15', 'Pedestrian:10', 'Cyclist:10']

          MIN_SAMPLING_DIS: 0
          MAX_SAMPLING_DIS: 20
          OCCLUSION_NOISE: 0.2
          OCCLUSION_OFFSET: 2.
          SAMPLING_METHOD: 'LiDAR-aware'
          VERT_RES: 0.006
          HOR_RES: 0.003

          NUM_POINT_FEATURES: 4
          DATABASE_WITH_FAKELIDAR: False
          REMOVE_EXTRA_WIDTH: [0.0, 0.0, -0.2]
          LIMIT_WHOLE_SCENE: False

        - NAME: random_local_noise
          LOCAL_ROT_RANGE: [-0.78539816, 0.78539816]
          TRANSLATION_STD: [1.0, 1.0, 0.5]
          GLOBAL_ROT_RANGE: [0.0, 0.0]
          EXTRA_WIDTH: [0.2, 0.2, 0.]

        - NAME: random_world_rotation
          WORLD_ROT_ANGLE: [-0.39269908, 0.39269908]

        - NAME: random_world_scaling
          WORLD_SCALE_RANGE: [0.95, 1.05]

        - NAME: random_local_pyramid_aug
          DROP_PROB: 0.25
          SPARSIFY_PROB: 0.05
          SPARSIFY_MAX_NUM: 50
          SWAP_PROB: 0.1
          SWAP_MAX_NUM: 50

X_TRANS:
  AUG_CONFIG_LIST:
    - NAME: world_rotation
      WORLD_ROT_ANGLE: [0.39269908, 0, 0.39269908, -0.39269908, -0.39269908, 0]
    - NAME: world_flip
      ALONG_AXIS_LIST: [0, 1, 1, 0, 1, 0]
    - NAME: world_scaling
      WORLD_SCALE_RANGE: [ 0.98, 1.02, 1., 0.98, 1.02, 1.]

POINT_FEATURE_ENCODING: {
    encoding_type: absolute_coordinates_encoding_mm,
    used_feature_list: ['x', 'y', 'z', 'intensity'],
    src_feature_list: ['x', 'y', 'z', 'intensity'],
    num_features: 4
}

DATA_PROCESSOR:
    - NAME: mask_points_and_boxes_outside_range
      REMOVE_OUTSIDE_BOXES: True

    - NAME: shuffle_points
      SHUFFLE_ENABLED: {
        'train': True,
        'test': True
      }

    - NAME: transform_points_to_voxels
      VOXEL_SIZE: [0.05, 0.05, 0.05]  
      MAX_POINTS_PER_VOXEL: 5
      MAX_NUMBER_OF_VOXELS: {
        'train': 16000,
        'test': 40000
      }

MODEL: NAME: VoxelRCNN

VFE:
    NAME: MeanVFE
    MODEL: 'max'

BACKBONE_3D:
    NAME: TeVoxelBackBone8x
    NUM_FILTERS: [16, 32, 64, 64]
    RETURN_NUM_FEATURES_AS_DICT: True
    OUT_FEATURES: 64

MAP_TO_BEV:
    NAME: BEVPool
    NUM_BEV_FEATURES: 256
    ALIGN_METHOD: 'max'

BACKBONE_2D:
    NAME: BaseBEVBackbone

    LAYER_NUMS: [4, 4]
    LAYER_STRIDES: [1, 2]
    NUM_FILTERS: [64, 128]
    UPSAMPLE_STRIDES: [1, 2]
    NUM_UPSAMPLE_FILTERS: [128, 128]

DENSE_HEAD:
    NAME: AnchorHeadSingle
    CLASS_AGNOSTIC: False

    USE_DIRECTION_CLASSIFIER: True
    DIR_OFFSET: 0.78539
    DIR_LIMIT_OFFSET: 0.0
    NUM_DIR_BINS: 2

    ANCHOR_GENERATOR_CONFIG: [
        {
            'class_name': 'Car',
            'anchor_sizes': [[3.9, 1.6, 1.56]],
            'anchor_rotations': [0, 1.57],
            'anchor_bottom_heights': [-1.78],
            'align_center': False,
            'feature_map_stride': 8,
            'matched_threshold': 0.6,
            'unmatched_threshold': 0.45
        },
        {
        'class_name': 'Pedestrian',
        'anchor_sizes': [[ 0.8, 0.6, 1.73 ]],
        'anchor_rotations': [ 0, 1.57 ],
        'anchor_bottom_heights': [ -0.6 ],
        'align_center': False,
        'feature_map_stride': 8,
        'matched_threshold': 0.5,
        'unmatched_threshold': 0.35
      },
      {
        'class_name': 'Cyclist',
        'anchor_sizes': [[ 1.76, 0.6, 1.73 ]],
        'anchor_rotations': [ 0, 1.57 ],
        'anchor_bottom_heights': [ -0.6 ],
        'align_center': False,
        'feature_map_stride': 8,
        'matched_threshold': 0.5,
        'unmatched_threshold': 0.35
      }
    ]
    TARGET_ASSIGNER_CONFIG:
        NAME: AxisAlignedTargetAssigner
        POS_FRACTION: -1.0
        SAMPLE_SIZE: 512
        NORM_BY_NUM_EXAMPLES: False
        MATCH_HEIGHT: False
        BOX_CODER: ResidualCoder

    LOSS_CONFIG:
        LOSS_WEIGHTS: {
            'cls_weight': 1.0,
            'loc_weight': 2.0,
            'dir_weight': 0.2,
            'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
        }

ROI_HEAD:
    NAME: TEDSHead
    CLASS_AGNOSTIC: True

    SHARED_FC: [256, 256]
    CLS_FC: [256, 256]
    REG_FC: [256, 256]
    DP_RATIO: 0.01

    NMS_CONFIG:
        TRAIN:
            NMS_TYPE: nms_gpu
            MULTI_CLASSES_NMS: False
            NMS_PRE_MAXSIZE: 4000
            NMS_POST_MAXSIZE: 512
            NMS_THRESH: 0.8
        TEST:
            NMS_TYPE: nms_gpu
            MULTI_CLASSES_NMS: False
            USE_FAST_NMS: True
            SCORE_THRESH: 0.0
            NMS_PRE_MAXSIZE: 4000
            NMS_POST_MAXSIZE: 50
            NMS_THRESH: 0.75

    ROI_GRID_POOL:
        FEATURES_SOURCE: ['x_conv3','x_conv4']
        PRE_MLP: True
        GRID_SIZE: 6
        POOL_LAYERS:
            x_conv3:
                MLPS: [[32, 32], [32, 32]]
                QUERY_RANGES: [[2, 2, 2], [4, 4, 4]]
                POOL_RADIUS: [0.4, 0.8]
                NSAMPLE: [16, 16]
                POOL_METHOD: max_pool
            x_conv4:
                MLPS: [[32, 32], [32, 32]]
                QUERY_RANGES: [[2, 2, 2], [4, 4, 4]]
                POOL_RADIUS: [0.8, 1.6]
                NSAMPLE: [16, 16]
                POOL_METHOD: max_pool

    TARGET_CONFIG:
        BOX_CODER: ResidualCoder
        ROI_PER_IMAGE: 160
        FG_RATIO: 0.5
        SAMPLE_ROI_BY_EACH_CLASS: True
        CLS_SCORE_TYPE: roi_iou_x
        CLS_FG_THRESH: [0.75, 0.65, 0.65]
        CLS_BG_THRESH: [0.25, 0.15, 0.15]
        CLS_BG_THRESH_LO: 0.1
        HARD_BG_RATIO: 0.8
        REG_FG_THRESH: [0.55, 0.5, 0.5]
        ENABLE_HARD_SAMPLING: True
        HARD_SAMPLING_THRESH: [0.5, 0.5, 0.5]
        HARD_SAMPLING_RATIO: [0.5, 0.5, 0.5]

    LOSS_CONFIG:
        CLS_LOSS: BinaryCrossEntropy
        REG_LOSS: smooth-l1
        CORNER_LOSS_REGULARIZATION: True
        GRID_3D_IOU_LOSS: False
        LOSS_WEIGHTS: {
            'rcnn_cls_weight': 1.0,
            'rcnn_reg_weight': 1.0,
            'rcnn_corner_weight': 1.0,
            'rcnn_iou3d_weight': 1.0,
            'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
        }

POST_PROCESSING:
    RECALL_THRESH_LIST: [0.3, 0.5, 0.7]
    SCORE_THRESH: 0.3
    OUTPUT_RAW_SCORE: False
    EVAL_METRIC: kitti

    NMS_CONFIG:
        MULTI_CLASSES_NMS: False
        NMS_TYPE: nms_gpu
        NMS_THRESH: 0.1
        NMS_PRE_MAXSIZE: 4096
        NMS_POST_MAXSIZE: 500

OPTIMIZATION: BATCH_SIZE_PER_GPU: 2 NUM_EPOCHS: 40

OPTIMIZER: adam_onecycle
LR: 0.01
WEIGHT_DECAY: 0.01
MOMENTUM: 0.9

MOMS: [0.95, 0.85]
PCT_START: 0.4
DIV_FACTOR: 10
DECAY_STEP_LIST: [35, 45]
LR_DECAY: 0.1
LR_CLIP: 0.0000001

LR_WARMUP: False
WARMUP_EPOCH: 1

GRAD_NORM_CLIP: 10

@csl1994

csjxchen commented 6 months ago

@csl1994 @Liu202209 你们用八卡训练过吗,复现出来了吗?我一直在想是不是训练卡数量的问题

Liu202209 commented 6 months ago

没8卡那条件。没有,而且我跑出来的结果挺不稳定的

Machine-NO-Learning commented 5 months ago

我检查了训练阶段的两个loss监督没有问题,而且rpn在训练阶段创建anchor时候是用的dict_batch['points']这个是在Rot0 coord下的,所以也佐证了bevpooling应该是想在Rot0做对齐的,我也是在bevpool这边这么做的,其次我也注意到multi-grid-pool-aggregation里关于rois的转换(roi_x_trans)也很叫我费解为啥是rotnum-1?我改完这些之后训练完直接R40eval差到离谱

Machine-NO-Learning commented 5 months ago

各位大佬有能解释的么

1120192017 commented 5 months ago

@csjxchen

为什么我训练cyclist[ 1.76, 0.6, 1.73 ]和pedestrian[ 0.8, 0.6, 1.73 ]的所有精度都是0

Cyclist AP@0.50, 0.50, 0.50: bbox AP:0.0000, 0.0000, 0.0000 bev AP:0.0000, 0.0000, 0.0000 3d AP:0.0000, 0.0000, 0.0000 aos AP:0.00, 0.00, 0.00

请问哪个参数设置需要注意的么

大佬,你后来是怎么解决这个问题的,我也有相同的问题 @shenglunch

shenglunch commented 5 months ago

issue中有csjxchen提供的配置文件,但是我没达到论文中的精度,https://github.com/hailanyi/TED/issues/33#issuecomment-1910434327

shenglunch commented 5 months ago

@csjxchen 我只能1batch 4卡。car精度接近的模型,cyc和pre差很多。cyc相近的模型,,car和pre差很多。我在想是不是每个类都要单独训练,不能同时检测三个类.......

Liu202209 commented 4 months ago

单类别训练Car确实可以达到文章的精度,多类别感觉达不到了

你有试过训练单汽车类的多模态吗?我单卡训练的结果很低啊: bbox AP:98.9466, 90.4582, 89.9018 bev AP:90.1253, 88.8376, 80.2418 3d AP:89.7045, 87.3508, 79.3342 aos AP:98.53, 89.77, 88.96 Car AP_R40@0.70, 0.70, 0.70: bbox AP:99.5549, 96.5883, 91.5515 bev AP:96.0639, 92.4806, 85.3564 3d AP:95.1403, 87.0011, 82.0037 aos AP:99.16, 95.74, 90.56 Car AP@0.70, 0.50, 0.50: bbox AP:98.9466, 90.4582, 89.9018 bev AP:99.0606, 90.4330, 89.9194 3d AP:99.0323, 90.4258, 89.8929 aos AP:98.53, 89.77, 88.96 Car AP_R40@0.70, 0.50, 0.50: bbox AP:99.5549, 96.5883, 91.5515 bev AP:99.5883, 96.5739, 91.5433 3d AP:99.5789, 96.5555, 91.5227 aos AP:99.16, 95.74, 90.56 这还是置信阈值为0.3的,置信阈值0.5和0.7的更低