Open JiankunShi opened 6 months ago
By the way, my training instructions are as follows: torchrun --nproc_per_node=2 --nnodes=1 --node_rank=0 --master_addr="127.0.0.1" --master_port=29501 tools/train.py configs/bevfusion_lidar-cam_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py --cfg-option load_from=./bevfusion_lidar_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d-2628f933.pth model.img_backbone.init_cfg.checkpoint=./swint-nuimages-pretrained.pth --launcher pytorch
Hello, I have the same question. Could you please tell me roughly how much your final loss averages?
Hi [JiankunShi] @JiankunShi ,if you want to change batch size, it is customary to set to even. And in the same time the lr also needs to be scaled in the same proportion as the batch size.
@JiankunShi 你好,请问你有尝试过增加bevfusion的训练轮次吗,我尝试10epoch的时候在第7个epoch就开始loss上升到16左右了
@JiankunShi I suspect the issue comes from not setting the line auto_scale_lr = dict(enable=False, base_batch_size=32) to enable=True and the fact that the batch size is different. If i am not mistaken the bs in your example is (2GPUs x 3) whereas in the provided code it's 32 (8GPUs x 4).
Model/Dataset/Scheduler description
Content: Hello,
I have been training using the provided models bevfusion_lidar-cam_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d-5239b1af.pth and swint-nuimages-pretrained.pth. Due to GPU memory constraints, I reduced the batch_size in the config from 4 to 3, while keeping all other parameters unchanged. The specific config is as follows:
`base = [ './bevfusion_lidar_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py' ] point_cloud_range = [-54.0, -54.0, -5.0, 54.0, 54.0, 3.0] input_modality = dict(use_lidar=True, use_camera=True) backend_args = None
model = dict( type='BEVFusion', data_preprocessor=dict( type='Det3DDataPreprocessor', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=False), img_backbone=dict( type='mmdet.SwinTransformer', embed_dims=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], window_size=7, mlp_ratio=4, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.2, patch_norm=True, out_indices=[1, 2, 3], with_cp=False, convert_weights=True, init_cfg=dict( type='Pretrained', checkpoint= # noqa: E251 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth' # noqa: E501 )), img_neck=dict( type='GeneralizedLSSFPN', in_channels=[192, 384, 768], out_channels=256, start_level=0, num_outs=3, norm_cfg=dict(type='BN2d', requires_grad=True), act_cfg=dict(type='ReLU', inplace=True), upsample_cfg=dict(mode='bilinear', align_corners=False)), view_transform=dict( type='DepthLSSTransform', in_channels=256, out_channels=80, image_size=[256, 704], feature_size=[32, 88], xbound=[-54.0, 54.0, 0.3], ybound=[-54.0, 54.0, 0.3], zbound=[-10.0, 10.0, 20.0], dbound=[1.0, 60.0, 0.5], downsample=2), fusion_layer=dict( type='ConvFuser', in_channels=[80, 256], out_channels=256))
train_pipeline = [ dict( type='BEVLoadMultiViewImageFromFiles', to_float32=True, color_type='color', backend_args=backend_args), dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=5, use_dim=5, backend_args=backend_args), dict( type='LoadPointsFromMultiSweeps', sweeps_num=9, load_dim=5, use_dim=5, pad_empty_sweeps=True, remove_close=True, backend_args=backend_args), dict( type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False), dict( type='ImageAug3D', final_dim=[256, 704], resize_lim=[0.38, 0.55], bot_pct_lim=[0.0, 0.0], rot_lim=[-5.4, 5.4], rand_flip=True, is_train=True), dict( type='BEVFusionGlobalRotScaleTrans', scale_ratio_range=[0.9, 1.1], rot_range=[-0.78539816, 0.78539816], translation_std=0.5), dict(type='BEVFusionRandomFlip3D'), dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range), dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range), dict( type='ObjectNameFilter', classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]),
Actually, 'GridMask' is not used here
]
test_pipeline = [ dict( type='BEVLoadMultiViewImageFromFiles', to_float32=True, color_type='color', backend_args=backend_args), dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=5, use_dim=5, backend_args=backend_args), dict( type='LoadPointsFromMultiSweeps', sweeps_num=9, load_dim=5, use_dim=5, pad_empty_sweeps=True, remove_close=True, backend_args=backend_args), dict( type='ImageAug3D', final_dim=[256, 704], resize_lim=[0.48, 0.48], bot_pct_lim=[0.0, 0.0], rot_lim=[0.0, 0.0], rand_flip=False, is_train=False), dict( type='PointsRangeFilter', point_cloud_range=[-54.0, -54.0, -5.0, 54.0, 54.0, 3.0]), dict( type='Pack3DDetInputs', keys=['img', 'points', 'gt_bboxes_3d', 'gt_labels_3d'], meta_keys=[ 'cam2img', 'ori_cam2img', 'lidar2cam', 'lidar2img', 'cam2lidar', 'ori_lidar2img', 'img_aug_matrix', 'box_type_3d', 'sample_idx', 'lidar_path', 'img_path', 'num_pts_feats' ]) ]
train_dataloader = dict( dataset=dict( dataset=dict(pipeline=train_pipeline, modality=input_modality))) val_dataloader = dict( dataset=dict(pipeline=test_pipeline, modality=input_modality)) test_dataloader = val_dataloader
param_scheduler = [ dict( type='LinearLR', start_factor=0.33333333, by_epoch=False, begin=0, end=500), dict( type='CosineAnnealingLR', begin=0, T_max=6, end=6, by_epoch=True, eta_min_ratio=1e-4, convert_to_iter_based=True),
momentum scheduler
]
runtime settings
train_cfg = dict(by_epoch=True, max_epochs=6, val_interval=1) val_cfg = dict() test_cfg = dict() find_unused_parameters=True optim_wrapper = dict( type='OptimWrapper', optimizer=dict(type='AdamW', lr=0.0002, weight_decay=0.01), clip_grad=dict(max_norm=30, norm_type=2))
Default setting for scaling LR automatically
-
enable
means enable scaling LR automaticallyor not by default.
-
base_batch_size
= (8 GPUs) x (4 samples per GPU).auto_scale_lr = dict(enable=False, base_batch_size=32)
default_hooks = dict( logger=dict(type='LoggerHook', interval=50), checkpoint=dict(type='CheckpointHook', interval=1))
del base.custom_hooks `
However, the training results I achieved were NDS 69.28 and mAP 64.55, which significantly deviate from the expected results. Could you please advise on potential adjustments or any steps I might take to improve these outcomes?
Thank you for your assistance!
Open source status
Provide useful links for the implementation
No response