hustvl / MapTR

[ICLR'23 Spotlight & IJCV'24] MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction
MIT License
1.08k stars 163 forks source link

increase point_cloud_range #32

Open bitwangdan opened 1 year ago

bitwangdan commented 1 year ago

hi , thanks for your great work! I want to increase the range, [-15.0, -30.0, -2.0, 60.0, 30.0, 2.0], I modified "MapTRNMSFreeCoder" and point_cloud_range, but the result is not very good, can you give some suggestions?

zxczrx123 commented 1 year ago

@bitwangdan Hi, I had the same problem before. There is a bug when you use asymmetric range, because the patch_size of LocalMap is (w/2, h/2), but the origin is still (0, 0). You can check the code below.

https://github.com/hustvl/MapTR/blob/3b0c7d6b634193657023ebf755f6628b08100806/projects/mmdet3d_plugin/datasets/nuscenes_map_dataset.py#L85

https://github.com/hustvl/MapTR/blob/3b0c7d6b634193657023ebf755f6628b08100806/projects/mmdet3d_plugin/datasets/nuscenes_map_dataset.py#L828

bitwangdan commented 1 year ago

@zxczrx123 thanks,I have found this problem, and there is one more point to pay attention to, the parameter num_vec needs to be increased

zxczrx123 commented 1 year ago

@bitwangdan By the way, the increase in range greatly increases the complexity of the instance, and the method with fixed num vecs seems to be difficult to deal with lines of different lengths.

bitwangdan commented 1 year ago

@zxczrx123 Yes, when I increase the data range, many category indicators have dropped significantly. I have not found a better way except to increase the parameter num_vec。

bitwangdan commented 1 year ago

@zxczrx123 Hi, have you tried adding temporal features like bevformer?the result of my experiment is not very good

bitwangdan commented 1 year ago

@LegendBC Hi, I have added temporal feature like bevformer,the result of my experiment is not very good, are you experimenting with temporal feature in you your code?

LegendBC commented 1 year ago

@LegendBC Hi, I have added temporal feature like bevformer,the result of my experiment is not very good, are you experimenting with temporal feature in you your code?

We have tried the temporal fusion for MapTR and found that it deteriorated the accuracy, so we removed it.

bitwangdan commented 1 year ago

@LegendBC Thank you for your reply, I have added the lidar information, and the mAP in my dataset have improved a lot,when i add temporal information like bevformer, the mAP drops a lot, maybe this temporal fusion method is not suitable for MapTR, I will try other temporal methods and also hope that you can find a suitable temporal method for MapTR

zxczrx123 commented 1 year ago

@zxczrx123 Hi, have you tried adding temporal features like bevformer?the result of my experiment is not very good

I have not used temporal features. Can you share your results?

bitwangdan commented 1 year ago

@zxczrx123 hi,I experimented on my own dataset, The indicators dropped a lot

bitwangdan commented 1 year ago

@zxczrx123 hi, have you ever encountered such a situation? After increasing the point_cloud_range, mAP drops a lot under the threshold 0.5, but the threshold of 1.0 and 1.5 is basically normal.

zxczrx123 commented 1 year ago

@bitwangdan My phenomenon is that it will all drop.

zx2624 commented 1 year ago

Probably because the rotate-center used in bevformer is not at (0, 0). Any way, when adding tempral fusion(length-que = 3), I have a runtime error, have you got the same error? image

bitwangdan commented 1 year ago

@zx2624 Debug: TORCH_DISTRIBUTED_DEBUG=DETAIL bash tools/dist_train.sh **config I adjusted this parameter (rotate_center), but still wrong result

zx2624 commented 1 year ago

@zx2624 Debug: TORCH_DISTRIBUTED_DEBUG=DETAIL bash tools/dist_train.sh **config I adjusted this parameter (rotate_center), but still wrong result

any code?

forvd commented 1 year ago

same problem. here is my cfg base = [ '../datasets/custom_nus-3d.py', '../base/default_runtime.py' ] # plugin = True plugin_dir = 'projects/mmdet3d_plugin/'

If point cloud range is changed, the models should also change their point

cloud range accordingly

point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]

point_cloud_range = [-15.0, -60.0, -2.0, 15.0, 60.0, 2.0] voxel_size = [0.15, 0.15, 4]

img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

For nuScenes we usually do 10-class detection

class_names = [ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]

map has classes: divider, ped_crossing, boundary

map_classes = ['divider', 'ped_crossing','boundary']

fixed_ptsnum_per_line = 20

map_classes = ['divider',]

fixed_ptsnum_per_gt_line = 40 # now only support fixed_pts > 0 fixed_ptsnum_per_pred_line = 40 eval_use_same_gt_sample_num_flag=True num_map_classes = len(map_classes)

input_modality = dict( use_lidar=False, use_camera=True, use_radar=False, use_map=False, use_external=True)

dim = 256 _posdim = dim//2 _ffndim = dim*2 _numlevels = 1

bevh = 50

bevw = 50

bevh = 400 bevw = 100 queue_length = 1 # each sequence contains queue_length frames.

model = dict( type='MapTR', use_grid_mask=True, video_test_mode=False, pretrained=dict(img='ckpts/resnet50-19c8e357.pth'), img_backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(3,), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=False), norm_eval=True, style='pytorch'), img_neck=dict( type='FPN', in_channels=[2048], out_channels=dim, start_level=0, add_extra_convs='on_output', num_outs=_numlevels, relu_before_extra_convs=True), pts_bbox_head=dict( type='MapTRHead', bev_h=bevh, bev_w=bevw, num_query=900, num_vec=100, num_pts_per_vec=fixed_ptsnum_per_pred_line, # one bbox num_pts_per_gt_vec=fixed_ptsnum_per_gt_line, dir_interval=1, query_embed_type='instance_pts', transform_method='minmax', gt_shift_pts_pattern='v2', num_classes=num_map_classes, in_channels=dim, sync_cls_avg_factor=True, with_box_refine=True, as_two_stage=False, code_size=2, code_weights=[1.0, 1.0, 1.0, 1.0], transformer=dict( type='MapTRPerceptionTransformer', rotate_prev_bev=True, use_shift=True, use_can_bus=True, embed_dims=dim, encoder=dict( type='BEVFormerEncoder', num_layers=1, pc_range=point_cloud_range, num_points_in_pillar=4, return_intermediate=False, transformerlayers=dict( type='BEVFormerLayer', attn_cfgs=[ dict( type='TemporalSelfAttention', embed_dims=dim, num_levels=1), dict( type='GeometrySptialCrossAttention', pc_range=point_cloud_range, attention=dict( type='GeometryKernelAttention', embed_dims=dim, num_heads=4, dilation=1, kernel_size=(3,5), num_levels=_numlevels), embed_dims=dim, ) ], feedforward_channels=_ffndim, ffn_dropout=0.1, operation_order=('self_attn', 'norm', 'cross_attn', 'norm', 'ffn', 'norm'))), decoder=dict( type='MapTRDecoder', num_layers=6, return_intermediate=True, transformerlayers=dict( type='DetrTransformerDecoderLayer', attn_cfgs=[ dict( type='MultiheadAttention', embed_dims=dim, num_heads=8, dropout=0.1), dict( type='CustomMSDeformableAttention', embed_dims=dim, num_levels=1), ],

                feedforward_channels=_ffn_dim_,
                ffn_dropout=0.1,
                operation_order=('self_attn', 'norm', 'cross_attn', 'norm',
                                 'ffn', 'norm')))),
    bbox_coder=dict(
        type='MapTRNMSFreeCoder',
        # post_center_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0],
        post_center_range=[-20, -65, -20, -65, 20, 65, 20, 65],
        pc_range=point_cloud_range,
        max_num=50,
        voxel_size=voxel_size,
        num_classes=num_map_classes),
    positional_encoding=dict(
        type='LearnedPositionalEncoding',
        num_feats=_pos_dim_,
        row_num_embed=bev_h_,
        col_num_embed=bev_w_,
        ),
    loss_cls=dict(
        type='FocalLoss',
        use_sigmoid=True,
        gamma=2.0,
        alpha=0.25,
        loss_weight=2.0),
    loss_bbox=dict(type='L1Loss', loss_weight=0.0),
    loss_iou=dict(type='GIoULoss', loss_weight=0.0),
    loss_pts=dict(type='PtsL1Loss', 
                  loss_weight=5.0),
    loss_dir=dict(type='PtsDirCosLoss', loss_weight=0.005)),
# model training and testing settings
train_cfg=dict(pts=dict(
    grid_size=[512, 512, 1],
    voxel_size=voxel_size,
    point_cloud_range=point_cloud_range,
    out_size_factor=4,
    assigner=dict(
        type='MapTRAssigner',
        cls_cost=dict(type='FocalLossCost', weight=2.0),
        reg_cost=dict(type='BBoxL1Cost', weight=0.0, box_format='xywh'),
        # reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
        # iou_cost=dict(type='IoUCost', weight=1.0), # Fake cost. This is just to make it compatible with DETR head.
        iou_cost=dict(type='IoUCost', iou_mode='giou', weight=0.0),
        pts_cost=dict(type='OrderedPtsL1Cost', 
                  weight=5),
        pc_range=point_cloud_range))))

dataset_type = 'CustomNuScenesLocalMapDataset' data_root = 'data/nuscenes/' file_client_args = dict(backend='disk')

train_pipeline = [ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict(type='PhotoMetricDistortionMultiViewImage'), dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False), dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range), dict(type='ObjectNameFilter', classes=class_names), dict(type='NormalizeMultiviewImage', **img_norm_cfg), dict(type='RandomScaleImageMultiViewImage', scales=[0.5]), dict(type='PadMultiViewImage', size_divisor=32), dict(type='DefaultFormatBundle3D', class_names=class_names), dict(type='CustomCollect3D', keys=['gt_bboxes_3d', 'gt_labels_3d', 'img']) ]

test_pipeline = [ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict(type='NormalizeMultiviewImage', **img_norm_cfg),

dict(
    type='MultiScaleFlipAug3D',
    img_scale=(1600, 900),
    pts_scale_ratio=1,
    flip=False,
    transforms=[
        dict(type='RandomScaleImageMultiViewImage', scales=[0.5]),
        dict(type='PadMultiViewImage', size_divisor=32),
        dict(
            type='DefaultFormatBundle3D',
            class_names=class_names,
            with_label=False),
        dict(type='CustomCollect3D', keys=['img'])
    ])

]

data = dict( samples_per_gpu=2, workers_per_gpu=4, train=dict( type=dataset_type, data_root=data_root, ann_file=data_root + 'nuscenes_infos_temporal_train.pkl', pipeline=train_pipeline, classes=class_names, modality=input_modality, test_mode=False, use_valid_flag=True, bev_size=(bevh, bevw), pc_range=point_cloud_range, fixed_ptsnum_per_line=fixed_ptsnum_per_gt_line, eval_use_same_gt_sample_num_flag=eval_use_same_gt_sample_num_flag, padding_value=-10000, map_classes=map_classes, queue_length=queue_length,

we use box_type_3d='LiDAR' in kitti and nuscenes dataset

    # and box_type_3d='Depth' in sunrgbd and scannet dataset.
    box_type_3d='LiDAR'),
val=dict(type=dataset_type,
         data_root=data_root,
         ann_file=data_root + 'nuscenes_infos_temporal_val.pkl',
         map_ann_file=data_root + 'nuscenes_map_anns_val.json',
         pipeline=test_pipeline,  bev_size=(bev_h_, bev_w_),
         pc_range=point_cloud_range,
         fixed_ptsnum_per_line=fixed_ptsnum_per_gt_line,
         eval_use_same_gt_sample_num_flag=eval_use_same_gt_sample_num_flag,
         padding_value=-10000,
         map_classes=map_classes,
         classes=class_names, modality=input_modality, samples_per_gpu=1),
test=dict(type=dataset_type,
          data_root=data_root,
          ann_file=data_root + 'nuscenes_infos_temporal_val.pkl',
          map_ann_file=data_root + 'nuscenes_map_anns_val.json',
          pipeline=test_pipeline, bev_size=(bev_h_, bev_w_),
          pc_range=point_cloud_range,
          fixed_ptsnum_per_line=fixed_ptsnum_per_gt_line,
          eval_use_same_gt_sample_num_flag=eval_use_same_gt_sample_num_flag,
          padding_value=-10000,
          map_classes=map_classes,
          classes=class_names, modality=input_modality),
shuffler_sampler=dict(type='DistributedGroupSampler'),
nonshuffler_sampler=dict(type='DistributedSampler')

)

optimizer = dict( type='AdamW', lr=6e-4, paramwise_cfg=dict( custom_keys={ 'img_backbone': dict(lr_mult=0.1), }), weight_decay=0.01)

optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))

learning policy

lr_config = dict( policy='CosineAnnealing', warmup='linear', warmup_iters=500, warmup_ratio=1.0 / 3, min_lr_ratio=1e-3) total_epochs = 24

total_epochs = 50

evaluation = dict(interval=1, pipeline=test_pipeline)

evaluation = dict(interval=2, pipeline=test_pipeline, metric='chamfer')

runner = dict(type='EpochBasedRunner', max_epochs=total_epochs)

log_config = dict( interval=50, hooks=[ dict(type='TextLoggerHook'), dict(type='TensorboardLoggerHook') ]) fp16 = dict(loss_scale=512.) checkpoint_config = dict(interval=1) can you give some suggestions?

fishmarch commented 1 year ago

Probably because the rotate-center used in bevformer is not at (0, 0). Any way, when adding tempral fusion(length-que = 3), I have a runtime error, have you got the same error? image

@zx2624 Hi! I meet the same problem, have you solved it? Thanks.

swc-17 commented 1 year ago

Probably because the rotate-center used in bevformer is not at (0, 0). Any way, when adding tempral fusion(length-que = 3), I have a runtime error, have you got the same error? image

@zx2624 Hi! I meet the same problem, have you solved it? Thanks.

@fishmarch Hi, I met the same problem, have you solved this? Thanks.

LegendBC commented 1 year ago

@LegendBC Hi, I have added temporal feature like bevformer,the result of my experiment is not very good, are you experimenting with temporal feature in you your code?

We have tried the temporal fusion for MapTR and found that it deteriorated the accuracy, so we removed it.

We have addressed the temporal issue in the latest MapTRv1 code. The issue is that the can bus provides extra harmful information in the temporal setting. We set the length of can_bus to 6 instead of original 18 here

colahe commented 1 year ago

@zxczrx123 hi,I experimented on my own dataset, The indicators dropped a lot

Hi, I would like to ask you, how do you create your own dataset? Looking forward to your reply! Thanks!!!!!!

bitwangdan commented 1 year ago

@zxczrx123 hi,I experimented on my own dataset, The indicators dropped a lot

Hi, I would like to ask you, how do you create your own dataset? Looking forward to your reply! Thanks!!!!!!

Our dataset format is the same as nuscenes。

bitwangdan commented 1 year ago

@LegendBC Hi, I have added temporal feature like bevformer,the result of my experiment is not very good, are you experimenting with temporal feature in you your code?

We have tried the temporal fusion for MapTR and found that it deteriorated the accuracy, so we removed it.

We have addressed the temporal issue in the latest MapTRv1 code. The issue is that the can bus provides extra harmful information in the temporal setting. We set the length of can_bus to 6 instead of original 18 here

Thank you for your reply, but it seems that the temporal was not used when testing.

bitwangdan commented 1 year ago

@LegendBC Hi, When using temporal, the Video_test_mode needs to be True. I tried the new version of the temporal method, but the results was still not good.

zyc10ud commented 1 year ago

@LegendBC Hi, When using temporal, the Video_test_mode needs to be True. I tried the new version of the temporal method, but the results was still not good.

We set video_test_mode=True when we test, the result is consistent with when it was set False.

bitwangdan commented 11 months ago

@LegendBC Hi, I experimented with two temporal methods. The result of GKT encoder is normal, mAP: 52.1, and the result of bevformer encoder is not good, mAP: 25.1 , there may be some problems. By the way, how to integrate lidar with temporal? Looking forward to your reply