Open maggiesong7 opened 1 year ago
Hi, Do you modified the batchsize or other parameters? Can you share your config.
Hi, I only change the dataset path. here is my config:
point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0] class_names = [ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ] dataset_type = 'CustomNuScenesDataset' data_root = './data/nuscenes/' input_modality = dict( use_lidar=False, use_camera=True, use_radar=False, use_map=False, use_external=False) file_client_args = dict(backend='disk') train_pipeline = [ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False), dict( type='ObjectRangeFilter', point_cloud_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]), dict( type='ObjectNameFilter', classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]), dict( type='ResizeCropFlipImage', data_aug_conf=dict( resize_lim=(0.8, 1.0), final_dim=(512, 1408), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=True), dict( type='GlobalRotScaleTransImage', rot_range=[-0.3925, 0.3925], translation_std=[0, 0, 0], scale_ratio_range=[0.95, 1.05], reverse_angle=True, training=True), dict( type='NormalizeMultiviewImage', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='PadMultiViewImage', size_divisor=32), dict( type='DefaultFormatBundle3D', class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]), dict(type='Collect3D', keys=['gt_bboxes_3d', 'gt_labels_3d', 'img']) ] test_pipeline = [ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='ResizeCropFlipImage', data_aug_conf=dict( resize_lim=(0.8, 1.0), final_dim=(512, 1408), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=False), dict( type='NormalizeMultiviewImage', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='PadMultiViewImage', size_divisor=32), dict( type='MultiScaleFlipAug3D', img_scale=(1333, 800), pts_scale_ratio=1, flip=False, transforms=[ dict( type='DefaultFormatBundle3D', class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], with_label=False), dict(type='Collect3D', keys=['img']) ]) ] eval_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=5, use_dim=5, file_client_args=dict(backend='disk')), dict( type='LoadPointsFromMultiSweeps', sweeps_num=10, file_client_args=dict(backend='disk')), dict( type='DefaultFormatBundle3D', class_names=[ 'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle', 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier' ], with_label=False), dict(type='Collect3D', keys=['points']) ] data = dict( samples_per_gpu=1, workers_per_gpu=4, train=dict( type='CustomNuScenesDataset', data_root='./data/nuscenes/', ann_file='./data/nuscenes/nuscenes_infos_train.pkl', pipeline=[ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False), dict( type='ObjectRangeFilter', point_cloud_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]), dict( type='ObjectNameFilter', classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]), dict( type='ResizeCropFlipImage', data_aug_conf=dict( resize_lim=(0.8, 1.0), final_dim=(512, 1408), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=True), dict( type='GlobalRotScaleTransImage', rot_range=[-0.3925, 0.3925], translation_std=[0, 0, 0], scale_ratio_range=[0.95, 1.05], reverse_angle=True, training=True), dict( type='NormalizeMultiviewImage', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='PadMultiViewImage', size_divisor=32), dict( type='DefaultFormatBundle3D', class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]), dict( type='Collect3D', keys=['gt_bboxes_3d', 'gt_labels_3d', 'img']) ], classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], modality=dict( use_lidar=False, use_camera=True, use_radar=False, use_map=False, use_external=False), test_mode=False, box_type_3d='LiDAR', use_valid_flag=True), val=dict( type='CustomNuScenesDataset', data_root='data/nuscenes/', ann_file='data/nuscenes/nuscenes_infos_val.pkl', pipeline=[ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='ResizeCropFlipImage', data_aug_conf=dict( resize_lim=(0.8, 1.0), final_dim=(512, 1408), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=False), dict( type='NormalizeMultiviewImage', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='PadMultiViewImage', size_divisor=32), dict( type='MultiScaleFlipAug3D', img_scale=(1333, 800), pts_scale_ratio=1, flip=False, transforms=[ dict( type='DefaultFormatBundle3D', class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], with_label=False), dict(type='Collect3D', keys=['img']) ]) ], classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], modality=dict( use_lidar=False, use_camera=True, use_radar=False, use_map=False, use_external=False), test_mode=True, box_type_3d='LiDAR'), test=dict( type='CustomNuScenesDataset', data_root='data/nuscenes/', ann_file='data/nuscenes/nuscenes_infos_val.pkl', pipeline=[ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='ResizeCropFlipImage', data_aug_conf=dict( resize_lim=(0.8, 1.0), final_dim=(512, 1408), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=False), dict( type='NormalizeMultiviewImage', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='PadMultiViewImage', size_divisor=32), dict( type='MultiScaleFlipAug3D', img_scale=(1333, 800), pts_scale_ratio=1, flip=False, transforms=[ dict( type='DefaultFormatBundle3D', class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], with_label=False), dict(type='Collect3D', keys=['img']) ]) ], classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], modality=dict( use_lidar=False, use_camera=True, use_radar=False, use_map=False, use_external=False), test_mode=True, box_type_3d='LiDAR')) evaluation = dict( interval=1, pipeline=[ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='ResizeCropFlipImage', data_aug_conf=dict( resize_lim=(0.8, 1.0), final_dim=(512, 1408), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=False), dict( type='NormalizeMultiviewImage', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='PadMultiViewImage', size_divisor=32), dict( type='MultiScaleFlipAug3D', img_scale=(1333, 800), pts_scale_ratio=1, flip=False, transforms=[ dict( type='DefaultFormatBundle3D', class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], with_label=False), dict(type='Collect3D', keys=['img']) ]) ]) checkpoint_config = dict(interval=1) log_config = dict( interval=50, hooks=[dict(type='TextLoggerHook'), dict(type='TensorboardLoggerHook')]) dist_params = dict(backend='nccl') log_level = 'INFO' work_dir = 'work_dirs/petr_r50dcn_gridmask_p4/' load_from = None resume_from = None workflow = [('train', 1)] opencv_num_threads = 0 mp_start_method = 'fork' backbone_norm_cfg = dict(type='LN', requires_grad=True) plugin = True plugin_dir = 'projects/mmdet3d_plugin/' voxel_size = [0.2, 0.2, 8] img_norm_cfg = dict( mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) model = dict( type='Petr3D', use_grid_mask=True, img_backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(2, 3), frozen_stages=-1, norm_cfg=dict(type='BN2d', requires_grad=False), norm_eval=True, style='caffe', with_cp=True, dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, False, True, True), pretrained='ckpts/resnet50_msra-5891d200.pth'), img_neck=dict( type='CPFPN', in_channels=[1024, 2048], out_channels=256, num_outs=2), pts_bbox_head=dict( type='PETRHead', num_classes=10, in_channels=256, num_query=900, LID=True, with_position=True, with_multiview=True, position_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], normedlinear=False, transformer=dict( type='PETRTransformer', decoder=dict( type='PETRTransformerDecoder', return_intermediate=True, num_layers=6, transformerlayers=dict( type='PETRTransformerDecoderLayer', attn_cfgs=[ dict( type='MultiheadAttention', embed_dims=256, num_heads=8, dropout=0.1), dict( type='PETRMultiheadAttention', embed_dims=256, num_heads=8, dropout=0.1) ], feedforward_channels=2048, ffn_dropout=0.1, with_cp=True, operation_order=('self_attn', 'norm', 'cross_attn', 'norm', 'ffn', 'norm')))), bbox_coder=dict( type='NMSFreeCoder', post_center_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], pc_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0], max_num=300, voxel_size=[0.2, 0.2, 8], num_classes=10), positional_encoding=dict( type='SinePositionalEncoding3D', num_feats=128, normalize=True), loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=2.0), loss_bbox=dict(type='L1Loss', loss_weight=0.25), loss_iou=dict(type='GIoULoss', loss_weight=0.0)), train_cfg=dict( pts=dict( grid_size=[512, 512, 1], voxel_size=[0.2, 0.2, 8], point_cloud_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0], out_size_factor=4, assigner=dict( type='HungarianAssigner3D', cls_cost=dict(type='FocalLossCost', weight=2.0), reg_cost=dict(type='BBox3DL1Cost', weight=0.25), iou_cost=dict(type='IoUCost', weight=0.0), pc_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0])))) db_sampler = dict() ida_aug_conf = dict( resize_lim=(0.8, 1.0), final_dim=(512, 1408), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True) optimizer = dict( type='AdamW', lr=0.0002, paramwise_cfg=dict(custom_keys=dict(img_backbone=dict(lr_mult=0.1))), weight_decay=0.01) optimizer_config = dict( type='Fp16OptimizerHook', loss_scale=512.0, grad_clip=dict(max_norm=35, norm_type=2)) lr_config = dict( policy='CosineAnnealing', warmup='linear', warmup_iters=500, warmup_ratio=0.3333333333333333, min_lr_ratio=0.001) total_epochs = 24 find_unused_parameters = False runner = dict(type='EpochBasedRunner', max_epochs=24) gpu_ids = range(0, 8)
Hi,
The config has no problem. Can you tell me the gpu number and the version of python and mmdet3d? Python3.8 may drops some performance.
I use 8 2080ti to train. And I have trained the model using two different python versions, that is, python 3.7.6 and python 3.6.5, both of them are with mmdet3d 1.0.0. Also, when I set with_position=False, the accuracy is extremely low, which is 0.0887mAP and 0.2230NDS. In my opinion, setting with_position=False is just a kind of ablation study about the 3D PE module. Can you explain that?
Hi,
(1) When use mmdet1.0, have you notice here https://github.com/megvii-research/PETR/issues/71#issuecomment-1318191277 . The reverse_angle must be False in GlobalRotScaleTransImage. (2) Yes, when set with_position=False, it's a result in ablation study.
When set with_position=False, the intrinsics and extrinsics are not used in model. In fact, PETR can work without intrinsics and extrinsics, benefiting from global attention. The low performance is mainly due to ResizeCropFlipImage and GlobalRotScaleTransImage. These data augmentation greatly change the intrinsics and extrinsics during the training process, and the network can't overfit the parameters of the data set. Once these augmentations are removed, resnet50 should obtain the peformance more than 27% mAP. But we don't think it's meaningful to over-fit the dataset.
Hi,
(1) When use mmdet1.0, have you notice here #71 (comment) . The reverse_angle must be False in GlobalRotScaleTransImage. (2) Yes, when set with_position=False, it's a result in ablation study.
When set with_position=False, the intrinsics and extrinsics are not used in model. In fact, PETR can work without intrinsics and extrinsics, benefiting from global attention. The low performance is mainly due to ResizeCropFlipImage and GlobalRotScaleTransImage. These data augmentation greatly change the intrinsics and extrinsics during the training process, and the network can't overfit the parameters of the data set. Once these augmentations are removed, resnet50 should obtain the peformance more than 27% mAP. But we don't think it's meaningful to over-fit the dataset.
I have noticed StreamPETR still set reverse_angle=True
but they use mmdet3d=1.0.0rc6
, have I missed something?
Hi, (1) When use mmdet1.0, have you notice here #71 (comment) . The reverse_angle must be False in GlobalRotScaleTransImage. (2) Yes, when set with_position=False, it's a result in ablation study. When set with_position=False, the intrinsics and extrinsics are not used in model. In fact, PETR can work without intrinsics and extrinsics, benefiting from global attention. The low performance is mainly due to ResizeCropFlipImage and GlobalRotScaleTransImage. These data augmentation greatly change the intrinsics and extrinsics during the training process, and the network can't overfit the parameters of the data set. Once these augmentations are removed, resnet50 should obtain the peformance more than 27% mAP. But we don't think it's meaningful to over-fit the dataset.
I have noticed StreamPETR still set
reverse_angle=True
but they usemmdet3d=1.0.0rc6
, have I missed something?
The rotate matrix is different.
Hi, (1) When use mmdet1.0, have you notice here #71 (comment) . The reverse_angle must be False in GlobalRotScaleTransImage. (2) Yes, when set with_position=False, it's a result in ablation study. When set with_position=False, the intrinsics and extrinsics are not used in model. In fact, PETR can work without intrinsics and extrinsics, benefiting from global attention. The low performance is mainly due to ResizeCropFlipImage and GlobalRotScaleTransImage. These data augmentation greatly change the intrinsics and extrinsics during the training process, and the network can't overfit the parameters of the data set. Once these augmentations are removed, resnet50 should obtain the peformance more than 27% mAP. But we don't think it's meaningful to over-fit the dataset.
I have noticed StreamPETR still set
reverse_angle=True
but they usemmdet3d=1.0.0rc6
, have I missed something?The rotate matrix is different.
Thanks, got it. 👍
When running
[petr_r50dcn_gridmask_p4.py](https://github.com/megvii-research/PETR/blob/main/projects/configs/petr/petr_r50dcn_gridmask_p4.py)
, the accuracy I got was: mAP: 0.3022 mATE: 0.8507 mASE: 0.2785 mAOE: 0.6519 mAVE: 1.0027 mAAE: 0.2668 NDS: 0.3463 Eval time: 302.2sThis is much lower than the reported one. Also, we I set
with_position=False
, the accuracy is extremely low, which is 0.0887mAP and 0.2230NDS. / /When running
[petr_r50dcn_gridmask_p4.py](https://github.com/megvii-research/PETR/blob/main/projects/configs/petr/petr_r50dcn_gridmask_p4.py)
, the accuracy I got was: mAP: 0.3022 mATE: 0.8507 mASE: 0.2785 mAOE: 0.6519 mAVE: 1.0027 mAAE: 0.2668 NDS: 0.3463 Eval time: 302.2sThis is much lower than the reported one. Also, we I set
with_position=False
, the accuracy is extremely low, which is 0.0887mAP and 0.2230NDS.
When running
[petr_r50dcn_gridmask_p4.py](https://github.com/megvii-research/PETR/blob/main/projects/configs/petr/petr_r50dcn_gridmask_p4.py)
, the accuracy I got was: mAP: 0.3022 mATE: 0.8507 mASE: 0.2785 mAOE: 0.6519 mAVE: 1.0027 mAAE: 0.2668 NDS: 0.3463 Eval time: 302.2sThis is much lower than the reported one. Also, we I set
with_position=False
, the accuracy is extremely low, which is 0.0887mAP and 0.2230NDS.