Prerequisite

[X] I have searched Issues and Discussions but cannot get the expected help.
[X] I have read the FAQ documentation but cannot get the expected help.
[X] The bug has not been fixed in the latest version (dev-1.x) or latest version (dev-1.0).

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmdetection3d

Environment

System environment: sys.platform: linux Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0] CUDA available: True numpy_random_seed: 1155412052 GPU 0,1,2,3,4,5: Tesla V100S-PCIE-32GB CUDA_HOME: /home/guanjingchao/cuda-11.6 NVCC: Cuda compilation tools, release 11.6, V11.6.55 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.13.1+cu116 PyTorch compiling details: PyTorch built with:

GCC 9.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.6
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
CuDNN 8.3.2 (built against CUDA 11.5)
Magma 2.6.1
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.14.1+cu116 OpenCV: 4.8.1 MMEngine: 0.9.0

Runtime environment: cudnn_benchmark: False mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: 1155412052 Distributed launcher: pytorch Distributed training: True GPU number: 6

Reproduces the problem - code sample

base = ['../../../configs/base/default_runtime.py'] custom_imports = dict( imports=['projects.BEVFusion.bevfusion'], allow_failed_imports=False)

model settings

Voxel size for voxel encoder

Usually voxel size is changed consistently with the point cloud range

If point cloud range is modified, do remember to change all related

keys in the config.

voxel_size = [0.075, 0.075, 0.2] point_cloud_range = [-54.0, -54.0, -5.0, 54.0, 54.0, 3.0] class_names = [ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]

metainfo = dict(classes=class_names) dataset_type = 'NuScenesDataset' data_root = '/home/guanjingchao/datasets/nuscenes/' # 完整nuScenes数据集

data_root = '/home/guanjingchao/datasets/nuscenes-mini/' # mini nuScenes数据集

data_prefix = dict( pts='samples/LIDAR_TOP', CAM_FRONT='samples/CAM_FRONT', CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT', CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT', CAM_BACK='samples/CAM_BACK', CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT', CAM_BACK_LEFT='samples/CAM_BACK_LEFT', sweeps='sweeps/LIDAR_TOP') input_modality = dict(use_lidar=True, use_camera=False)

backend_args = dict(

backend='petrel',

path_mapping=dict({

'./data/nuscenes/':

's3://openmmlab/datasets/detection3d/nuscenes/',

'data/nuscenes/':

's3://openmmlab/datasets/detection3d/nuscenes/',

'./data/nuscenes_mini/':

's3://openmmlab/datasets/detection3d/nuscenes/',

'data/nuscenes_mini/':

's3://openmmlab/datasets/detection3d/nuscenes/'

}))

backend_args = None

model = dict( type='BEVFusion', data_preprocessor=dict( type='Det3DDataPreprocessor', pad_size_divisor=32, voxelize_cfg=dict( max_num_points=10, point_cloud_range=[-54.0, -54.0, -5.0, 54.0, 54.0, 3.0], voxel_size=[0.075, 0.075, 0.2], max_voxels=[120000, 160000], voxelize_reduce=True)),

pts_voxel_encoder=dict(type='HardSimpleVFE', num_features=5), # 这个已经写在bevfusion.py的voxelize函数里面了, 因此这个是无效的

pts_middle_encoder=dict(
    type='BEVFusionSparseEncoder',
    in_channels=5,
    sparse_shape=[1440, 1440, 41],
    order=('conv', 'norm', 'act'),
    norm_cfg=dict(type='BN1d', eps=0.001, momentum=0.01),
    encoder_channels=((16, 16, 32), (32, 32, 64), (64, 64, 128), (128,
                                                                  128)),
    encoder_paddings=((0, 0, 1), (0, 0, 1), (0, 0, (1, 1, 0)), (0, 0)),
    block_type='basicblock'),
pts_backbone=dict(
    type='SECOND',    # 主干网络
    in_channels=256,
    out_channels=[128, 256],
    layer_nums=[5, 5],
    layer_strides=[1, 2],
    norm_cfg=dict(type='BN', eps=0.001, momentum=0.01),
    conv_cfg=dict(type='Conv2d', bias=False)),
pts_neck=dict(
    type='SECONDFPN',   # 颈部网络
    in_channels=[128, 256],
    out_channels=[256, 256],
    upsample_strides=[1, 2],
    norm_cfg=dict(type='BN', eps=0.001, momentum=0.01),
    upsample_cfg=dict(type='deconv', bias=False),
    use_conv_for_no_stride=True),
bbox_head=dict(
    type='TransFusionHead',
    num_proposals=200,
    auxiliary=True,
    in_channels=512,
    hidden_channel=128,
    num_classes=10,
    nms_kernel_size=3,
    bn_momentum=0.1,
    num_decoder_layers=1,
    decoder_layer=dict(
        type='TransformerDecoderLayer',
        self_attn_cfg=dict(embed_dims=128, num_heads=8, dropout=0.1),
        cross_attn_cfg=dict(embed_dims=128, num_heads=8, dropout=0.1),
        ffn_cfg=dict(
            embed_dims=128,
            feedforward_channels=256,
            num_fcs=2,
            ffn_drop=0.1,
            act_cfg=dict(type='ReLU', inplace=True),
        ),
        norm_cfg=dict(type='LN'),
        pos_encoding_cfg=dict(input_channel=2, num_pos_feats=128)),
    train_cfg=dict(
        dataset='nuScenes',
        point_cloud_range=[-54.0, -54.0, -5.0, 54.0, 54.0, 3.0],
        grid_size=[1440, 1440, 41],
        voxel_size=[0.075, 0.075, 0.2],
        out_size_factor=8,
        gaussian_overlap=0.1,
        min_radius=2,
        pos_weight=-1,
        code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2],
        assigner=dict(
            type='HungarianAssigner3D',
            iou_calculator=dict(type='BboxOverlaps3D', coordinate='lidar'),
            cls_cost=dict(
                type='mmdet.FocalLossCost',
                gamma=2.0,
                alpha=0.25,
                weight=0.15),
            reg_cost=dict(type='BBoxBEVL1Cost', weight=0.25),
            iou_cost=dict(type='IoU3DCost', weight=0.25))),
    test_cfg=dict(
        dataset='nuScenes',
        grid_size=[1440, 1440, 41],
        out_size_factor=8,
        voxel_size=[0.075, 0.075],
        pc_range=[-54.0, -54.0],
        nms_type=None),
    common_heads=dict(
        center=[2, 2], height=[1, 2], dim=[3, 2], rot=[2, 2], vel=[2, 2]),
    bbox_coder=dict(
        type='TransFusionBBoxCoder',
        pc_range=[-54.0, -54.0],
        post_center_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0],
        score_threshold=0.0,
        out_size_factor=8,
        voxel_size=[0.075, 0.075],
        code_size=10),
    loss_cls=dict(
        type='mmdet.FocalLoss',
        use_sigmoid=True,
        gamma=2.0,
        alpha=0.25,
        reduction='mean',
        loss_weight=1.0),
    loss_heatmap=dict(
        type='mmdet.GaussianFocalLoss', reduction='mean', loss_weight=1.0),
    loss_bbox=dict(
        type='mmdet.L1Loss', reduction='mean', loss_weight=0.25)))

db_sampler = dict( data_root=data_root, info_path=data_root + 'nuscenes_dbinfos_train.pkl', rate=1.0, prepare=dict( filter_by_difficulty=[-1], filter_by_min_points=dict( car=5, truck=5, bus=5, trailer=5, construction_vehicle=5, traffic_cone=5, barrier=5, motorcycle=5, bicycle=5, pedestrian=5)), classes=class_names, sample_groups=dict( car=2, truck=3, construction_vehicle=7, bus=4, trailer=6, barrier=2, motorcycle=6, bicycle=6, pedestrian=2, traffic_cone=2), points_loader=dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=5, use_dim=[0, 1, 2, 3, 4], backend_args=backend_args))

train_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=5, use_dim=5, backend_args=backend_args), dict( type='LoadPointsFromMultiSweeps', sweeps_num=9, load_dim=5, use_dim=5, pad_empty_sweeps=True, remove_close=True, backend_args=backend_args), dict( type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False), dict(type='ObjectSample', db_sampler=db_sampler), dict( type='GlobalRotScaleTrans', scale_ratio_range=[0.9, 1.1], rot_range=[-0.78539816, 0.78539816], translation_std=0.5), dict(type='BEVFusionRandomFlip3D'), dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range), dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range), dict( type='ObjectNameFilter', classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]), dict(type='PointShuffle'), dict( type='Pack3DDetInputs', keys=[ 'points', 'img', 'gt_bboxes_3d', 'gt_labels_3d', 'gt_bboxes', 'gt_labels' ], meta_keys=[ 'cam2img', 'ori_cam2img', 'lidar2cam', 'lidar2img', 'cam2lidar', 'ori_lidar2img', 'img_aug_matrix', 'box_type_3d', 'sample_idx', 'lidar_path', 'img_path', 'transformation_3d_flow', 'pcd_rotation', 'pcd_scale_factor', 'pcd_trans', 'img_aug_matrix', 'lidar_aug_matrix' ]) ]

test_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=5, use_dim=5, backend_args=backend_args), dict( type='LoadPointsFromMultiSweeps', sweeps_num=9, load_dim=5, use_dim=5, pad_empty_sweeps=True, remove_close=True, backend_args=backend_args), dict( type='PointsRangeFilter', point_cloud_range=[-54.0, -54.0, -5.0, 54.0, 54.0, 3.0]), dict( type='Pack3DDetInputs', keys=['img', 'points', 'gt_bboxes_3d', 'gt_labels_3d'], meta_keys=[ 'cam2img', 'ori_cam2img', 'lidar2cam', 'lidar2img', 'cam2lidar', 'ori_lidar2img', 'img_aug_matrix', 'box_type_3d', 'sample_idx', 'lidar_path', 'img_path', 'num_pts_feats', 'num_views' ]) ]

train_dataloader = dict( batch_size=4, num_workers=4, persistent_workers=True, sampler=dict(type='DefaultSampler', shuffle=True), dataset=dict( type='CBGSDataset', dataset=dict( type=dataset_type, data_root=data_root, ann_file='nuscenes_infos_train.pkl', pipeline=train_pipeline, metainfo=metainfo, modality=input_modality, test_mode=False, data_prefix=data_prefix, use_valid_flag=True,

we use box_type_3d='LiDAR' in kitti and nuscenes dataset

        # and box_type_3d='Depth' in sunrgbd and scannet dataset.
        box_type_3d='LiDAR')))

val_dataloader = dict( batch_size=1, num_workers=4, persistent_workers=True, drop_last=False, sampler=dict(type='DefaultSampler', shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, ann_file='nuscenes_infos_val.pkl', pipeline=test_pipeline, metainfo=metainfo, modality=input_modality, data_prefix=data_prefix, test_mode=True, box_type_3d='LiDAR', backend_args=backend_args)) test_dataloader = val_dataloader

val_evaluator = dict( type='NuScenesMetric', data_root=data_root, ann_file=data_root + 'nuscenes_infos_val.pkl', metric='bbox', backend_args=backend_args) test_evaluator = val_evaluator

vis_backends = [dict(type='LocalVisBackend')] visualizer = dict( type='Det3DLocalVisualizer', vis_backends=vis_backends, name='visualizer')

learning rate

lr = 0.0001 param_scheduler = [

learning rate scheduler

# During the first 8 epochs, learning rate increases from 0 to lr * 10
# during the next 12 epochs, learning rate decreases from lr * 10 to
# lr * 1e-4
dict(
    type='CosineAnnealingLR',
    T_max=8,
    eta_min=lr * 10,
    begin=0,
    end=8,
    by_epoch=True,
    convert_to_iter_based=True),
dict(
    type='CosineAnnealingLR',
    T_max=12,
    eta_min=lr * 1e-4,
    begin=8,
    end=20,
    by_epoch=True,
    convert_to_iter_based=True),
# momentum scheduler
# During the first 8 epochs, momentum increases from 0 to 0.85 / 0.95
# during the next 12 epochs, momentum increases from 0.85 / 0.95 to 1
dict(
    type='CosineAnnealingMomentum',
    T_max=8,
    eta_min=0.85 / 0.95,
    begin=0,
    end=8,
    by_epoch=True,
    convert_to_iter_based=True),
dict(
    type='CosineAnnealingMomentum',
    T_max=12,
    eta_min=1,
    begin=8,
    end=20,
    by_epoch=True,
    convert_to_iter_based=True)

]

runtime settings

train_cfg = dict(by_epoch=True, max_epochs=20, val_interval=1) val_cfg = dict() test_cfg = dict()

optim_wrapper = dict( type='OptimWrapper', optimizer=dict(type='AdamW', lr=lr, weight_decay=0.01), clip_grad=dict(max_norm=35, norm_type=2))

Default setting for scaling LR automatically

- `enable` means enable scaling LR automatically

or not by default.

- `base_batch_size` = (8 GPUs) x (4 samples per GPU).

auto_scale_lr = dict(enable=False, base_batch_size=32) #2258 log_processor = dict(window_size=50)

default_hooks = dict( logger=dict(type='LoggerHook', interval=50), checkpoint=dict(type='CheckpointHook', interval=5)) custom_hooks = [dict(type='DisableObjectSampleHook', disable_after_epoch=15)]

find_unused_parameters=True

Reproduces the problem - command or script

CUDA_VISIBLE_DEVICES="2,3,4,5,6,7" bash tools/dist_train.sh projects/BEVFusion/configs/bevfusion_lidar-cam_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py 6 --cfg-options load_from=work_dirs/lidar/lidar_epoch_20.pth model.img_backbone.init_cfg.checkpoint=pre/swint-nuimages-pretrained.pth --amp

Reproduces the problem - error message

When I use batch_size=4 to train lidar-only module, I can achieve the accuracy of the paper. However, when I trained image+lidar fusion, due to insufficient graphics memory, I set batch_size=2 for training, and the accuracy could only reach around mAP=0.66. Therefore, I also set batch_size=4, but used fp16 mode with an accuracy of mAP=0.67. Is this why? Is training accuracy related to batch_size in bevfusion? Is it related to lidar-only using batch_size=4 and image+lidar using batch_size=2? If so, what should do?

Additional information

No response

open-mmlab / mmdetection3d

[Bug] Is training accuracy related to batch_size in bevfusion? #2897

Prerequisite

Task

Branch

Environment

Runtime environment: cudnn_benchmark: False mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: 1155412052 Distributed launcher: pytorch Distributed training: True GPU number: 6

Reproduces the problem - code sample

model settings

Voxel size for voxel encoder

Usually voxel size is changed consistently with the point cloud range

If point cloud range is modified, do remember to change all related

keys in the config.

data_root = '/home/guanjingchao/datasets/nuscenes-mini/' # mini nuScenes数据集

backend_args = dict(

backend='petrel',

path_mapping=dict({

'./data/nuscenes/':

's3://openmmlab/datasets/detection3d/nuscenes/',

'data/nuscenes/':

's3://openmmlab/datasets/detection3d/nuscenes/',

'./data/nuscenes_mini/':

's3://openmmlab/datasets/detection3d/nuscenes/',

'data/nuscenes_mini/':

's3://openmmlab/datasets/detection3d/nuscenes/'

}))

pts_voxel_encoder=dict(type='HardSimpleVFE', num_features=5), # 这个已经写在bevfusion.py的voxelize函数里面了, 因此这个是无效的

we use box_type_3d='LiDAR' in kitti and nuscenes dataset

learning rate

learning rate scheduler

runtime settings

Default setting for scaling LR automatically

- `enable` means enable scaling LR automatically

or not by default.

- `base_batch_size` = (8 GPUs) x (4 samples per GPU).

find_unused_parameters=True

Reproduces the problem - command or script

Reproduces the problem - error message

Additional information

open-mmlab / mmdetection3d

[Bug] Is training accuracy related to batch_size in bevfusion? #2897

Prerequisite

Task

Branch

Environment

Runtime environment: cudnn_benchmark: False mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: 1155412052 Distributed launcher: pytorch Distributed training: True GPU number: 6

Reproduces the problem - code sample

model settings

Voxel size for voxel encoder

Usually voxel size is changed consistently with the point cloud range

If point cloud range is modified, do remember to change all related

keys in the config.

data_root = '/home/guanjingchao/datasets/nuscenes-mini/' # mini nuScenes数据集

backend_args = dict(

backend='petrel',

path_mapping=dict({

'./data/nuscenes/':

's3://openmmlab/datasets/detection3d/nuscenes/',

'data/nuscenes/':

's3://openmmlab/datasets/detection3d/nuscenes/',

'./data/nuscenes_mini/':

's3://openmmlab/datasets/detection3d/nuscenes/',

'data/nuscenes_mini/':

's3://openmmlab/datasets/detection3d/nuscenes/'

}))

pts_voxel_encoder=dict(type='HardSimpleVFE', num_features=5), # 这个已经写在bevfusion.py的voxelize函数里面了, 因此这个是无效的

we use box_type_3d='LiDAR' in kitti and nuscenes dataset

learning rate

learning rate scheduler

runtime settings

Default setting for scaling LR automatically

- enable means enable scaling LR automatically

or not by default.

- base_batch_size = (8 GPUs) x (4 samples per GPU).

find_unused_parameters=True

Reproduces the problem - command or script

Reproduces the problem - error message

Additional information

- `enable` means enable scaling LR automatically

- `base_batch_size` = (8 GPUs) x (4 samples per GPU).