Open wzqforever opened 9 months ago
Hi bro, Can you specify the versions you have installed for all the lib? I am not able to run the bash scripts/convert_data.py
I am using the main branch of mmdection3d. In this mmdection3d, you need to use tools/create_data.py to generate nuscenes datasets. You can refer to https://mmdetection3d.readthedocs.io/en/latest/user_guides/dataset_prepare.html.
python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes
@Manishnayak234
Prerequisite
Task
I'm using the official example scripts/configs for the officially supported tasks/models/datasets.
Branch
main branch https://github.com/open-mmlab/mmdetection3d
Environment
System environment: sys.platform: linux Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0] CUDA available: True numpy_random_seed: 1155412052 GPU 0,1,2,3,4,5: Tesla V100S-PCIE-32GB CUDA_HOME: /home/guanjingchao/cuda-11.6 NVCC: Cuda compilation tools, release 11.6, V11.6.55 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.13.1+cu116 PyTorch compiling details: PyTorch built with:
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
TorchVision: 0.14.1+cu116 OpenCV: 4.8.1 MMEngine: 0.9.0
Runtime environment: cudnn_benchmark: False mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: 1155412052 Distributed launcher: pytorch Distributed training: True GPU number: 6
Reproduces the problem - code sample
base = ['../../../configs/base/default_runtime.py'] custom_imports = dict( imports=['projects.BEVFusion.bevfusion'], allow_failed_imports=False)
model settings
Voxel size for voxel encoder
Usually voxel size is changed consistently with the point cloud range
If point cloud range is modified, do remember to change all related
keys in the config.
voxel_size = [0.075, 0.075, 0.2] point_cloud_range = [-54.0, -54.0, -5.0, 54.0, 54.0, 3.0] class_names = [ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]
metainfo = dict(classes=class_names) dataset_type = 'NuScenesDataset' data_root = '/home/guanjingchao/datasets/nuscenes/' # 完整nuScenes数据集
data_root = '/home/guanjingchao/datasets/nuscenes-mini/' # mini nuScenes数据集
data_prefix = dict( pts='samples/LIDAR_TOP', CAM_FRONT='samples/CAM_FRONT', CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT', CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT', CAM_BACK='samples/CAM_BACK', CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT', CAM_BACK_LEFT='samples/CAM_BACK_LEFT', sweeps='sweeps/LIDAR_TOP') input_modality = dict(use_lidar=True, use_camera=False)
backend_args = dict(
backend='petrel',
path_mapping=dict({
'./data/nuscenes/':
's3://openmmlab/datasets/detection3d/nuscenes/',
'data/nuscenes/':
's3://openmmlab/datasets/detection3d/nuscenes/',
'./data/nuscenes_mini/':
's3://openmmlab/datasets/detection3d/nuscenes/',
'data/nuscenes_mini/':
's3://openmmlab/datasets/detection3d/nuscenes/'
}))
backend_args = None
model = dict( type='BEVFusion', data_preprocessor=dict( type='Det3DDataPreprocessor', pad_size_divisor=32, voxelize_cfg=dict( max_num_points=10, point_cloud_range=[-54.0, -54.0, -5.0, 54.0, 54.0, 3.0], voxel_size=[0.075, 0.075, 0.2], max_voxels=[120000, 160000], voxelize_reduce=True)),
pts_voxel_encoder=dict(type='HardSimpleVFE', num_features=5), # 这个已经写在bevfusion.py的voxelize函数里面了, 因此这个是无效的
db_sampler = dict( data_root=data_root, info_path=data_root + 'nuscenes_dbinfos_train.pkl', rate=1.0, prepare=dict( filter_by_difficulty=[-1], filter_by_min_points=dict( car=5, truck=5, bus=5, trailer=5, construction_vehicle=5, traffic_cone=5, barrier=5, motorcycle=5, bicycle=5, pedestrian=5)), classes=class_names, sample_groups=dict( car=2, truck=3, construction_vehicle=7, bus=4, trailer=6, barrier=2, motorcycle=6, bicycle=6, pedestrian=2, traffic_cone=2), points_loader=dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=5, use_dim=[0, 1, 2, 3, 4], backend_args=backend_args))
train_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=5, use_dim=5, backend_args=backend_args), dict( type='LoadPointsFromMultiSweeps', sweeps_num=9, load_dim=5, use_dim=5, pad_empty_sweeps=True, remove_close=True, backend_args=backend_args), dict( type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False), dict(type='ObjectSample', db_sampler=db_sampler), dict( type='GlobalRotScaleTrans', scale_ratio_range=[0.9, 1.1], rot_range=[-0.78539816, 0.78539816], translation_std=0.5), dict(type='BEVFusionRandomFlip3D'), dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range), dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range), dict( type='ObjectNameFilter', classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]), dict(type='PointShuffle'), dict( type='Pack3DDetInputs', keys=[ 'points', 'img', 'gt_bboxes_3d', 'gt_labels_3d', 'gt_bboxes', 'gt_labels' ], meta_keys=[ 'cam2img', 'ori_cam2img', 'lidar2cam', 'lidar2img', 'cam2lidar', 'ori_lidar2img', 'img_aug_matrix', 'box_type_3d', 'sample_idx', 'lidar_path', 'img_path', 'transformation_3d_flow', 'pcd_rotation', 'pcd_scale_factor', 'pcd_trans', 'img_aug_matrix', 'lidar_aug_matrix' ]) ]
test_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=5, use_dim=5, backend_args=backend_args), dict( type='LoadPointsFromMultiSweeps', sweeps_num=9, load_dim=5, use_dim=5, pad_empty_sweeps=True, remove_close=True, backend_args=backend_args), dict( type='PointsRangeFilter', point_cloud_range=[-54.0, -54.0, -5.0, 54.0, 54.0, 3.0]), dict( type='Pack3DDetInputs', keys=['img', 'points', 'gt_bboxes_3d', 'gt_labels_3d'], meta_keys=[ 'cam2img', 'ori_cam2img', 'lidar2cam', 'lidar2img', 'cam2lidar', 'ori_lidar2img', 'img_aug_matrix', 'box_type_3d', 'sample_idx', 'lidar_path', 'img_path', 'num_pts_feats', 'num_views' ]) ]
train_dataloader = dict( batch_size=4, num_workers=4, persistent_workers=True, sampler=dict(type='DefaultSampler', shuffle=True), dataset=dict( type='CBGSDataset', dataset=dict( type=dataset_type, data_root=data_root, ann_file='nuscenes_infos_train.pkl', pipeline=train_pipeline, metainfo=metainfo, modality=input_modality, test_mode=False, data_prefix=data_prefix, use_valid_flag=True,
we use box_type_3d='LiDAR' in kitti and nuscenes dataset
val_dataloader = dict( batch_size=1, num_workers=4, persistent_workers=True, drop_last=False, sampler=dict(type='DefaultSampler', shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, ann_file='nuscenes_infos_val.pkl', pipeline=test_pipeline, metainfo=metainfo, modality=input_modality, data_prefix=data_prefix, test_mode=True, box_type_3d='LiDAR', backend_args=backend_args)) test_dataloader = val_dataloader
val_evaluator = dict( type='NuScenesMetric', data_root=data_root, ann_file=data_root + 'nuscenes_infos_val.pkl', metric='bbox', backend_args=backend_args) test_evaluator = val_evaluator
vis_backends = [dict(type='LocalVisBackend')] visualizer = dict( type='Det3DLocalVisualizer', vis_backends=vis_backends, name='visualizer')
learning rate
lr = 0.0001 param_scheduler = [
learning rate scheduler
]
runtime settings
train_cfg = dict(by_epoch=True, max_epochs=20, val_interval=1) val_cfg = dict() test_cfg = dict()
optim_wrapper = dict( type='OptimWrapper', optimizer=dict(type='AdamW', lr=lr, weight_decay=0.01), clip_grad=dict(max_norm=35, norm_type=2))
Default setting for scaling LR automatically
-
enable
means enable scaling LR automaticallyor not by default.
-
base_batch_size
= (8 GPUs) x (4 samples per GPU).auto_scale_lr = dict(enable=False, base_batch_size=32) #2258 log_processor = dict(window_size=50)
default_hooks = dict( logger=dict(type='LoggerHook', interval=50), checkpoint=dict(type='CheckpointHook', interval=5)) custom_hooks = [dict(type='DisableObjectSampleHook', disable_after_epoch=15)]
find_unused_parameters=True
Reproduces the problem - command or script
CUDA_VISIBLE_DEVICES="2,3,4,5,6,7" bash tools/dist_train.sh projects/BEVFusion/configs/bevfusion_lidar-cam_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py 6 --cfg-options load_from=work_dirs/lidar/lidar_epoch_20.pth model.img_backbone.init_cfg.checkpoint=pre/swint-nuimages-pretrained.pth --amp
Reproduces the problem - error message
When I use
batch_size=4
to train lidar-only module, I can achieve the accuracy of the paper. However, when I trained image+lidar fusion, due to insufficient graphics memory, I setbatch_size=2
for training, and the accuracy could only reach aroundmAP=0.66
. Therefore, I also setbatch_size=4
, but usedfp16
mode with an accuracy ofmAP=0.67
. Is this why? Is training accuracy related tobatch_size
in bevfusion? Is it related to lidar-only usingbatch_size=4
and image+lidar usingbatch_size=2
? If so, what should do?Additional information
No response