open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.
https://mmdetection3d.readthedocs.io/en/latest/
Apache License 2.0
5.32k stars 1.54k forks source link

[Bug] Training model but only test (do not train) #2195

Open youngfly opened 1 year ago

youngfly commented 1 year ago

Prerequisite

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

master branch https://github.com/open-mmlab/mmdetection3d

Environment

fatal: Not a git repository (or any parent up to mount point /root/paddlejob/yyr/all) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). sys.platform: linux Python: 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0] CUDA available: True GPU 0,1,2,3,4,5,6,7: Tesla V100-SXM2-32GB CUDA_HOME: /root/paddlejob/yyr/all/cuda/cuda-11.1 NVCC: Cuda compilation tools, release 11.1, V11.1.105 GCC: gcc (GCC) 8.2.0 PyTorch: 1.9.0+cu111 PyTorch compiling details: PyTorch built with:

TorchVision: 0.10.0 OpenCV: 4.5.5 MMCV: 1.6.0 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMDetection: 2.26.0 MMSegmentation: 0.29.1 MMDetection3D: 1.0.0rc6+ spconv2.0: False

Reproduces the problem - code sample

It is strange, I train the model, but it only test during the training. It seems it skips the training process. image. my config is below

model

model = dict( type='FCOSMono3D', backbone=dict( type='ResNet', depth=18, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=0, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch', init_cfg=dict( type='Pretrained', checkpoint='/root/paddlejob/yyr/all/pretrain/resnet18-f37072fd.pth') ), neck=dict( type='FPN', in_channels=[64, 128, 256, 512], out_channels=256, start_level=0, add_extra_convs='on_output', num_outs=4, relu_before_extra_convs=True), bbox_head=dict( type='PGDHead', num_classes=3, in_channels=256, stacked_convs=2, feat_channels=256, bbox_code_size=7, use_onlyreg_proj=True, use_direction_classifier=True, diff_rad_by_sin=True, pred_attrs=False, pred_velo=False, pred_bbox2d=True, pred_keypoints=True, dir_offset=0.7854, # pi/4 strides=[4, 8, 16, 32], regress_ranges=((-1, 64), (64, 128), (128, 256), (256, 1e8)), group_reg_dims=(2, 1, 3, 1, 16, 4), # offset, depth, size, rot, kpts, bbox2d cls_branch=(256,), reg_branch=( (256, ), # offset (256, ), # depth (256, ), # size (256, ), # rot (256, ), # kpts (256, ) # bbox2d ), centerness_branch=(256,), dir_branch=(256,), attr_branch=(256,), loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), weight_dim=1, loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0), loss_dir=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_attr=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_centerness=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_depth=dict( type='UncertainSmoothL1Loss', alpha=1.0, beta=3.0, loss_weight=1.0), norm_on_bbox=True, centerness_on_reg=True, center_sampling=True, conv_bias=True, dcn_on_last_conv=True, use_depth_classifier=True, depth_branch=(256,), depth_range=(0, 70), depth_unit=10, division='uniform', depth_bins=8, bbox_coder=dict( type='PGDBBoxCoder', base_depths=((28.01, 16.32),), base_dims=((0.8, 1.73, 0.6), (1.76, 1.73, 0.6), (3.9, 1.56, 1.6)), code_size=7)), train_cfg=dict( allowed_border=0, code_weight=[ 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 1.0, 1.0, 1.0, 1.0], pos_weight=-1, debug=False), test_cfg=dict( use_rotate_nms=True, nms_across_levels=False, nms_pre=100, nms_thr=0.8, score_thr=0.001, min_bbox_size=0, max_per_img=20))

class_names = ['Pedestrian', 'Cyclist', 'Car'] img_norm_cfg = dict( mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) train_pipeline = [ dict(type='LoadImageFromFileMono3D'), dict( type='LoadAnnotations3D', with_bbox=True, with_label=True, with_attr_label=False, with_bbox_3d=True, with_label_3d=True, with_bbox_depth=True), dict(type='Resize', img_scale=(1242, 375), keep_ratio=True), dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5), dict(type='Normalize', img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle3D', class_names=class_names), dict( type='Collect3D', keys=[ 'img', 'gt_bboxes', 'gt_labels', 'gt_bboxes_3d', 'gt_labels_3d', 'centers2d', 'depths' ]), ] test_pipeline = [ dict(type='LoadImageFromFileMono3D'), dict( type='MultiScaleFlipAug', scale_factor=1.0, flip=False, transforms=[ dict(type='RandomFlip3D'), dict(type='Normalize', img_norm_cfg), dict(type='Pad', size_divisor=32), dict( type='DefaultFormatBundle3D', class_names=class_names, with_label=False), dict(type='Collect3D', keys=['img']), ]) ] data = dict( samples_per_gpu=3, workers_per_gpu=3, train=dict(pipeline=train_pipeline), val=dict(pipeline=test_pipeline), test=dict(pipeline=test_pipeline))

optimizer

optimizer = dict( lr=0.001, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.)) optimizer_config = dict( delete=True, grad_clip=dict(max_norm=35, norm_type=2))

learning policy

lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=1.0 / 3, step=[32, 44]) total_epochs = 48 runner = dict(type='EpochBasedRunner', max_epochs=48) evaluation = dict(interval=2) checkpoint_config = dict(interval=8)

dataset settings

dataset_type = 'KittiDataset' data_root = '/root/paddlejob/yyr/kitti/' class_names = ['Pedestrian', 'Cyclist', 'Car'] point_cloud_range = [0, -40, -3, 70.4, 40, 1] input_modality = dict(use_lidar=True, use_camera=False)

db_sampler = dict( data_root=data_root, info_path=data_root + 'kitti_dbinfos_train.pkl', rate=1.0, prepare=dict( filter_by_difficulty=[-1], filter_by_min_points=dict(Car=5, Pedestrian=10, Cyclist=10)), classes=class_names, sample_groups=dict(Car=12, Pedestrian=6, Cyclist=6), points_loader=dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4, file_client_args=file_client_args), file_client_args=file_client_args)

train_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4, file_client_args=file_client_args), dict( type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, file_client_args=file_client_args), dict(type='ObjectSample', db_sampler=db_sampler), dict( type='ObjectNoise', num_try=100, translation_std=[1.0, 1.0, 0.5], global_rot_range=[0.0, 0.0], rot_range=[-0.78539816, 0.78539816]), dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5), dict( type='GlobalRotScaleTrans', rot_range=[-0.78539816, 0.78539816], scale_ratio_range=[0.95, 1.05]), dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range), dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range), dict(type='PointShuffle'), dict(type='DefaultFormatBundle3D', class_names=class_names), dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d']) ] test_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4, file_client_args=file_client_args), dict( type='MultiScaleFlipAug3D', img_scale=(1333, 800), pts_scale_ratio=1, flip=False, transforms=[ dict( type='GlobalRotScaleTrans', rot_range=[0, 0], scale_ratio_range=[1., 1.], translation_std=[0, 0, 0]), dict(type='RandomFlip3D'), dict( type='PointsRangeFilter', point_cloud_range=point_cloud_range), dict( type='DefaultFormatBundle3D', class_names=class_names, with_label=False), dict(type='Collect3D', keys=['points']) ]) ]

eval_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4, file_client_args=file_client_args), dict( type='DefaultFormatBundle3D', class_names=class_names, with_label=False), dict(type='Collect3D', keys=['points']) ]

data = dict( samples_per_gpu=6, workers_per_gpu=4, train=dict( type='RepeatDataset', times=2, dataset=dict( type=dataset_type, data_root=data_root, ann_file=data_root + 'kitti_infos_train.pkl', split='training', pts_prefix='velodyne_reduced', pipeline=train_pipeline, modality=input_modality, classes=class_names, test_mode=False, box_type_3d='LiDAR', file_client_args=file_client_args)), val=dict( type=dataset_type, data_root=data_root, ann_file=data_root + 'kitti_infos_val.pkl', split='training', pts_prefix='velodyne_reduced', pipeline=test_pipeline, modality=input_modality, classes=class_names, test_mode=True, box_type_3d='LiDAR', file_client_args=file_client_args), test=dict( type=dataset_type, data_root=data_root, ann_file=data_root + 'kitti_infos_val.pkl', split='training', pts_prefix='velodyne_reduced', pipeline=test_pipeline, modality=input_modality, classes=class_names, test_mode=True, box_type_3d='LiDAR', file_client_args=file_client_args))

evaluation = dict(interval=1, pipeline=eval_pipeline)

lr = 0.0018

optimizer = dict(type='AdamW', lr=lr, betas=(0.95, 0.99), weight_decay=0.01) optimizer_config = dict(grad_clip=dict(max_norm=10, norm_type=2)) lr_config = dict( policy='cyclic', target_ratio=(10, 1e-4), cyclic_times=1, step_ratio_up=0.4, ) momentum_config = dict( policy='cyclic', target_ratio=(0.85 / 0.95, 1), cyclic_times=1, step_ratio_up=0.4, )

runner = dict(type='EpochBasedRunner', max_epochs=40)

checkpoint_config = dict(interval=1) log_config = dict( interval=50, hooks=[ dict(type='TextLoggerHook'), dict(type='TensorboardLoggerHook') ]) dist_params = dict(backend='nccl') log_level = 'INFO' work_dir = None load_from = None resume_from = None workflow = [('train', 1)]

opencv_num_threads = 0

mp_start_method = 'fork'

Reproduces the problem - command or script

python tools/train.py configs_yyr/PGD_mini/PGD_r18.py

Reproduces the problem - error message

2023-01-06 11:30:35,154 - mmdet - INFO - Set random seed to 0, deterministic: False 2023-01-06 11:30:36,795 - mmdet - INFO - initialize ResNet with init_cfg {'type': 'Pretrained', 'checkpoint': '/root/paddlejob/yyr/all/pretrain/resnet50-0676ba61.pth/resnet18-f37072fd.pth'} 2023-01-06 11:30:36,796 - mmcv - INFO - load model from: /root/paddlejob/yyr/all/pretrain/resnet50-0676ba61.pth/resnet18-f37072fd.pth 2023-01-06 11:30:36,796 - mmcv - INFO - load checkpoint from local path: /root/paddlejob/yyr/all/pretrain/resnet50-0676ba61.pth/resnet18-f37072fd.pth Traceback (most recent call last): File "tools/train.py", line 263, in main() File "tools/train.py", line 223, in main model.init_weights() File "/home/zliu/anaconda3/envs/bevfusion_mit_light/lib/python3.7/site-packages/mmcv/runner/base_module.py", line 117, in init_weights m.init_weights() File "/home/zliu/anaconda3/envs/bevfusion_mit_light/lib/python3.7/site-packages/mmcv/runner/base_module.py", line 106, in init_weights initialize(self, self.init_cfg) File "/home/zliu/anaconda3/envs/bevfusion_mit_light/lib/python3.7/site-packages/mmcv/cnn/utils/weight_init.py", line 636, in initialize _initialize(module, cp_cfg) File "/home/zliu/anaconda3/envs/bevfusion_mit_light/lib/python3.7/site-packages/mmcv/cnn/utils/weight_init.py", line 539, in _initialize func(module) File "/home/zliu/anaconda3/envs/bevfusion_mit_light/lib/python3.7/site-packages/mmcv/cnn/utils/weight_init.py", line 514, in call logger=logger) File "/home/zliu/anaconda3/envs/bevfusion_mit_light/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 627, in load_checkpoint checkpoint = _load_checkpoint(filename, map_location, logger) File "/home/zliu/anaconda3/envs/bevfusion_mit_light/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 561, in _load_checkpoint return CheckpointLoader.load_checkpoint(filename, map_location, logger) File "/home/zliu/anaconda3/envs/bevfusion_mit_light/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 303, in load_checkpoint return checkpoint_loader(filename, map_location) # type: ignore File "/home/zliu/anaconda3/envs/bevfusion_mit_light/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 322, in load_from_local raise FileNotFoundError(f'{filename} can not be found.') FileNotFoundError: /root/paddlejob/yyr/all/pretrain/resnet50-0676ba61.pth/resnet18-f37072fd.pth can not be found. root@yq01-sys-hic-k8s-v100-box-a225-0346.yq01.baidu.com m3d-1.0rc6 $ python tools/train.py configs_yyr/PGD_mini/PGD_r18.py /root/paddlejob/yyr/all/code/mmdetection-master/mmdet/utils/setup_env.py:39: UserWarning: Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. f'Setting OMP_NUM_THREADS environment variable for each process ' /root/paddlejob/yyr/all/code/mmdetection-master/mmdet/utils/setup_env.py:49: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. f'Setting MKL_NUM_THREADS environment variable for each process ' fatal: Not a git repository (or any parent up to mount point /root/paddlejob/yyr/all) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). 2023-01-06 11:32:03,511 - mmdet - INFO - Environment info:

sys.platform: linux Python: 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0] CUDA available: True GPU 0,1,2,3,4,5,6,7: Tesla V100-SXM2-32GB CUDA_HOME: /root/paddlejob/yyr/all/cuda/cuda-11.1 NVCC: Cuda compilation tools, release 11.1, V11.1.105 GCC: gcc (GCC) 8.2.0 PyTorch: 1.9.0+cu111 PyTorch compiling details: PyTorch built with:

TorchVision: 0.10.0 OpenCV: 4.5.5 MMCV: 1.6.0 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMDetection: 2.26.0 MMSegmentation: 0.29.1 MMDetection3D: 1.0.0rc6+ spconv2.0: False

2023-01-06 11:32:04,519 - mmdet - INFO - Distributed training: False 2023-01-06 11:32:05,382 - mmdet - INFO - Config: dataset_type = 'KittiMonoDataset' data_root = '/root/paddlejob/yyr/kitti/' class_names = ['Pedestrian', 'Cyclist', 'Car'] input_modality = dict(use_lidar=False, use_camera=True) img_norm_cfg = dict( mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) train_pipeline = [ dict(type='LoadImageFromFileMono3D'), dict( type='LoadAnnotations3D', with_bbox=True, with_label=True, with_attr_label=False, with_bbox_3d=True, with_label_3d=True, with_bbox_depth=True), dict(type='Resize', img_scale=(1242, 375), keep_ratio=True), dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5), dict( type='Normalize', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='Pad', size_divisor=32), dict( type='DefaultFormatBundle3D', class_names=['Pedestrian', 'Cyclist', 'Car']), dict( type='Collect3D', keys=[ 'img', 'gt_bboxes', 'gt_labels', 'gt_bboxes_3d', 'gt_labels_3d', 'centers2d', 'depths' ]) ] test_pipeline = [ dict(type='LoadImageFromFileMono3D'), dict( type='MultiScaleFlipAug', scale_factor=1.0, flip=False, transforms=[ dict(type='RandomFlip3D'), dict( type='Normalize', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='Pad', size_divisor=32), dict( type='DefaultFormatBundle3D', class_names=['Pedestrian', 'Cyclist', 'Car'], with_label=False), dict(type='Collect3D', keys=['img']) ]) ] eval_pipeline = [ dict(type='LoadImageFromFileMono3D'), dict( type='DefaultFormatBundle3D', class_names=['Pedestrian', 'Cyclist', 'Car'], with_label=False), dict(type='Collect3D', keys=['img']) ] data = dict( samples_per_gpu=3, workers_per_gpu=3, train=dict( type='KittiMonoDataset', data_root='/root/paddlejob/yyr/kitti/', ann_file='/root/paddlejob/yyr/kitti/kitti_infos_train_mono3d.coco.json', info_file='/root/paddlejob/yyr/kitti/kitti_infos_train.pkl', img_prefix='/root/paddlejob/yyr/kitti/', classes=['Pedestrian', 'Cyclist', 'Car'], pipeline=[ dict(type='LoadImageFromFileMono3D'), dict( type='LoadAnnotations3D', with_bbox=True, with_label=True, with_attr_label=False, with_bbox_3d=True, with_label_3d=True, with_bbox_depth=True), dict(type='Resize', img_scale=(1242, 375), keep_ratio=True), dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5), dict( type='Normalize', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='Pad', size_divisor=32), dict( type='DefaultFormatBundle3D', class_names=['Pedestrian', 'Cyclist', 'Car']), dict( type='Collect3D', keys=[ 'img', 'gt_bboxes', 'gt_labels', 'gt_bboxes_3d', 'gt_labels_3d', 'centers2d', 'depths' ]) ], modality=dict(use_lidar=False, use_camera=True), test_mode=False, box_type_3d='Camera'), val=dict( type='KittiMonoDataset', data_root='/root/paddlejob/yyr/kitti/', ann_file='/root/paddlejob/yyr/kitti/kitti_infos_val_mono3d.coco.json', info_file='/root/paddlejob/yyr/kitti/kitti_infos_val.pkl', img_prefix='/root/paddlejob/yyr/kitti/', classes=['Pedestrian', 'Cyclist', 'Car'], pipeline=[ dict(type='LoadImageFromFileMono3D'), dict( type='MultiScaleFlipAug', scale_factor=1.0, flip=False, transforms=[ dict(type='RandomFlip3D'), dict( type='Normalize', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='Pad', size_divisor=32), dict( type='DefaultFormatBundle3D', class_names=['Pedestrian', 'Cyclist', 'Car'], with_label=False), dict(type='Collect3D', keys=['img']) ]) ], modality=dict(use_lidar=False, use_camera=True), test_mode=True, box_type_3d='Camera'), test=dict( type='KittiMonoDataset', data_root='/root/paddlejob/yyr/kitti/', ann_file='/root/paddlejob/yyr/kitti/kitti_infos_test_mono3d.coco.json', info_file='/root/paddlejob/yyr/kitti/kitti_infos_test.pkl', img_prefix='/root/paddlejob/yyr/kitti/', classes=['Pedestrian', 'Cyclist', 'Car'], pipeline=[ dict(type='LoadImageFromFileMono3D'), dict( type='MultiScaleFlipAug', scale_factor=1.0, flip=False, transforms=[ dict(type='RandomFlip3D'), dict( type='Normalize', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='Pad', size_divisor=32), dict( type='DefaultFormatBundle3D', class_names=['Pedestrian', 'Cyclist', 'Car'], with_label=False), dict(type='Collect3D', keys=['img']) ]) ], modality=dict(use_lidar=False, use_camera=True), test_mode=True, box_type_3d='Camera')) evaluation = dict(interval=2) optimizer = dict( type='SGD', lr=0.001, momentum=0.9, weight_decay=0.0001, paramwise_cfg=dict(bias_lr_mult=2.0, bias_decay_mult=0.0)) optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.3333333333333333, step=[32, 44]) runner = dict(type='EpochBasedRunner', max_epochs=48) checkpoint_config = dict(interval=8) log_config = dict( interval=50, hooks=[dict(type='TextLoggerHook'), dict(type='TensorboardLoggerHook')]) dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] work_dir = 'work_dirs/Dis_PGD_reg800_fea_l2_cls' model = dict( type='FCOSMono3D', backbone=dict( type='ResNet', depth=18, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=0, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch', init_cfg=dict( type='Pretrained', checkpoint='/root/paddlejob/yyr/all/pretrain/resnet18-f37072fd.pth' )), neck=dict( type='FPN', in_channels=[64, 128, 256, 512], out_channels=256, start_level=0, add_extra_convs='on_output', num_outs=4, relu_before_extra_convs=True), bbox_head=dict( type='PGDHead', num_classes=3, in_channels=256, stacked_convs=2, feat_channels=256, bbox_code_size=7, use_onlyreg_proj=True, use_direction_classifier=True, diff_rad_by_sin=True, pred_attrs=False, pred_velo=False, pred_bbox2d=True, pred_keypoints=True, dir_offset=0.7854, strides=[4, 8, 16, 32], regress_ranges=((-1, 64), (64, 128), (128, 256), (256, 100000000.0)), group_reg_dims=(2, 1, 3, 1, 16, 4), cls_branch=(256, ), reg_branch=((256, ), (256, ), (256, ), (256, ), (256, ), (256, )), centerness_branch=(256, ), dir_branch=(256, ), attr_branch=(256, ), loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), weight_dim=1, loss_bbox=dict( type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0), loss_dir=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_attr=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_centerness=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_depth=dict( type='UncertainSmoothL1Loss', alpha=1.0, beta=3.0, loss_weight=1.0), norm_on_bbox=True, centerness_on_reg=True, center_sampling=True, conv_bias=True, dcn_on_last_conv=True, use_depth_classifier=True, depth_branch=(256, ), depth_range=(0, 70), depth_unit=10, division='uniform', depth_bins=8, bbox_coder=dict( type='PGDBBoxCoder', base_depths=((28.01, 16.32), ), base_dims=((0.8, 1.73, 0.6), (1.76, 1.73, 0.6), (3.9, 1.56, 1.6)), code_size=7)), train_cfg=dict( allowed_border=0, code_weight=[ 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 1.0, 1.0, 1.0, 1.0 ], pos_weight=-1, debug=False), test_cfg=dict( use_rotate_nms=True, nms_across_levels=False, nms_pre=100, nms_thr=0.8, score_thr=0.001, min_bbox_size=0, max_per_img=20)) total_epochs = 48 gpu_ids = [0]

2023-01-06 11:32:05,383 - mmdet - INFO - Set random seed to 0, deterministic: False 2023-01-06 11:32:07,015 - mmdet - INFO - initialize ResNet with init_cfg {'type': 'Pretrained', 'checkpoint': '/root/paddlejob/yyr/all/pretrain/resnet18-f37072fd.pth'} 2023-01-06 11:32:07,016 - mmcv - INFO - load model from: /root/paddlejob/yyr/all/pretrain/resnet18-f37072fd.pth 2023-01-06 11:32:07,017 - mmcv - INFO - load checkpoint from local path: /root/paddlejob/yyr/all/pretrain/resnet18-f37072fd.pth 2023-01-06 11:32:13,552 - mmcv - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

2023-01-06 11:32:14,620 - mmdet - INFO - initialize FPN with init_cfg {'type': 'Xavier', 'layer': 'Conv2d', 'distribution': 'uniform'} 2023-01-06 11:32:15,622 - mmdet - INFO - Model: FCOSMono3D( (backbone): ResNet( (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (layer1): ResLayer( (0): BasicBlock( (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (1): BasicBlock( (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) ) (layer2): ResLayer( (0): BasicBlock( (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) ) (layer3): ResLayer( (0): BasicBlock( (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) ) (layer4): ResLayer( (0): BasicBlock( (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) ) ) init_cfg={'type': 'Pretrained', 'checkpoint': '/root/paddlejob/yyr/all/pretrain/resnet18-f37072fd.pth'} (neck): FPN( (lateral_convs): ModuleList( (0): ConvModule( (conv): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1)) ) (1): ConvModule( (conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1)) ) (2): ConvModule( (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1)) ) (3): ConvModule( (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1)) ) ) (fpn_convs): ModuleList( (0): ConvModule( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) (1): ConvModule( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) (2): ConvModule( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) (3): ConvModule( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) ) init_cfg={'type': 'Xavier', 'layer': 'Conv2d', 'distribution': 'uniform'} (bbox_head): PGDHead( (loss_cls): FocalLoss() (loss_bbox): SmoothL1Loss() (loss_dir): CrossEntropyLoss(avg_non_ignore=False) (cls_convs): ModuleList( (0): ConvModule( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (gn): GroupNorm(32, 256, eps=1e-05, affine=True) (activate): ReLU(inplace=True) ) (1): ConvModule( (conv): ModulatedDeformConv2dPack( (conv_offset): Conv2d(256, 27, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) (gn): GroupNorm(32, 256, eps=1e-05, affine=True) (activate): ReLU(inplace=True) ) ) (reg_convs): ModuleList( (0): ConvModule( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (gn): GroupNorm(32, 256, eps=1e-05, affine=True) (activate): ReLU(inplace=True) ) (1): ConvModule( (conv): ModulatedDeformConv2dPack( (conv_offset): Conv2d(256, 27, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) (gn): GroupNorm(32, 256, eps=1e-05, affine=True) (activate): ReLU(inplace=True) ) ) (conv_cls_prev): ModuleList( (0): ConvModule( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (gn): GroupNorm(32, 256, eps=1e-05, affine=True) (activate): ReLU(inplace=True) ) ) (conv_cls): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1)) (conv_reg_prevs): ModuleList( (0): ModuleList( (0): ConvModule( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (gn): GroupNorm(32, 256, eps=1e-05, affine=True) (activate): ReLU(inplace=True) ) ) (1): ModuleList( (0): ConvModule( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (gn): GroupNorm(32, 256, eps=1e-05, affine=True) (activate): ReLU(inplace=True) ) ) (2): ModuleList( (0): ConvModule( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (gn): GroupNorm(32, 256, eps=1e-05, affine=True) (activate): ReLU(inplace=True) ) ) (3): ModuleList( (0): ConvModule( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (gn): GroupNorm(32, 256, eps=1e-05, affine=True) (activate): ReLU(inplace=True) ) ) (4): ModuleList( (0): ConvModule( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (gn): GroupNorm(32, 256, eps=1e-05, affine=True) (activate): ReLU(inplace=True) ) ) (5): ModuleList( (0): ConvModule( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (gn): GroupNorm(32, 256, eps=1e-05, affine=True) (activate): ReLU(inplace=True) ) ) ) (conv_regs): ModuleList( (0): Conv2d(256, 2, kernel_size=(1, 1), stride=(1, 1)) (1): Conv2d(256, 1, kernel_size=(1, 1), stride=(1, 1)) (2): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1)) (3): Conv2d(256, 1, kernel_size=(1, 1), stride=(1, 1)) (4): Conv2d(256, 16, kernel_size=(1, 1), stride=(1, 1)) (5): Conv2d(256, 4, kernel_size=(1, 1), stride=(1, 1)) ) (conv_dir_cls_prev): ModuleList( (0): ConvModule( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (gn): GroupNorm(32, 256, eps=1e-05, affine=True) (activate): ReLU(inplace=True) ) ) (conv_dir_cls): Conv2d(256, 2, kernel_size=(1, 1), stride=(1, 1)) (conv_depth_cls_prev): ModuleList( (0): ConvModule( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (gn): GroupNorm(32, 256, eps=1e-05, affine=True) (activate): ReLU(inplace=True) ) ) (conv_depth_cls): Conv2d(256, 8, kernel_size=(1, 1), stride=(1, 1)) (conv_weight_prevs): ModuleList( (0): ModuleList( (0): ConvModule( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (gn): GroupNorm(32, 256, eps=1e-05, affine=True) (activate): ReLU(inplace=True) ) ) ) (conv_weights): ModuleList( (0): Conv2d(256, 1, kernel_size=(1, 1), stride=(1, 1)) ) (conv_centerness_prev): ModuleList( (0): ConvModule( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (gn): GroupNorm(32, 256, eps=1e-05, affine=True) (activate): ReLU(inplace=True) ) ) (conv_centerness): Conv2d(256, 1, kernel_size=(1, 1), stride=(1, 1)) (scales): ModuleList( (0): ModuleList( (0): Scale() (1): Scale() (2): Scale() (3): Scale() (4): Scale() ) (1): ModuleList( (0): Scale() (1): Scale() (2): Scale() (3): Scale() (4): Scale() ) (2): ModuleList( (0): Scale() (1): Scale() (2): Scale() (3): Scale() (4): Scale() ) (3): ModuleList( (0): Scale() (1): Scale() (2): Scale() (3): Scale() (4): Scale() ) ) (loss_centerness): CrossEntropyLoss(avg_non_ignore=False) (loss_depth): UncertainSmoothL1Loss() (loss_bbox2d): SmoothL1Loss() (loss_consistency): GIoULoss() ) ) loading annotations into memory... Done (t=0.00s) creating index... index created! loading annotations into memory... Done (t=0.53s) creating index... index created! 2023-01-06 11:32:24,391 - mmdet - INFO - Start running, host: root@yq01-sys-hic-k8s-v100-box-a225-0346.yq01.baidu.com, work_dir: /root/paddlejob/yyr/all/code/m3d-1.0rc6/work_dirs/Dis_PGD_reg800_fea_l2_cls 2023-01-06 11:32:24,392 - mmdet - INFO - Hooks will be executed in the following order: before_run: (VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) CheckpointHook
(LOW ) EvalHook
(VERY_LOW ) TextLoggerHook
(VERY_LOW ) TensorboardLoggerHook


before_train_epoch: (VERY_HIGH ) StepLrUpdaterHook
(LOW ) IterTimerHook
(LOW ) EvalHook
(VERY_LOW ) TextLoggerHook
(VERY_LOW ) TensorboardLoggerHook


before_train_iter: (VERY_HIGH ) StepLrUpdaterHook
(LOW ) IterTimerHook
(LOW ) EvalHook


after_train_iter: (ABOVE_NORMAL) OptimizerHook
(NORMAL ) CheckpointHook
(LOW ) IterTimerHook
(LOW ) EvalHook
(VERY_LOW ) TextLoggerHook
(VERY_LOW ) TensorboardLoggerHook


after_train_epoch: (NORMAL ) CheckpointHook
(LOW ) EvalHook
(VERY_LOW ) TextLoggerHook
(VERY_LOW ) TensorboardLoggerHook


before_val_epoch: (LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook
(VERY_LOW ) TensorboardLoggerHook


before_val_iter: (LOW ) IterTimerHook


after_val_iter: (LOW ) IterTimerHook


after_val_epoch: (VERY_LOW ) TextLoggerHook
(VERY_LOW ) TensorboardLoggerHook


after_run: (VERY_LOW ) TextLoggerHook
(VERY_LOW ) TensorboardLoggerHook


2023-01-06 11:32:24,392 - mmdet - INFO - workflow: [('train', 1)], max: 48 epochs 2023-01-06 11:32:24,392 - mmdet - INFO - Checkpoints will be saved to /root/paddlejob/yyr/all/code/m3d-1.0rc6/work_dirs/Dis_PGD_reg800_fea_l2_cls by HardDiskBackend. /home/zliu/anaconda3/envs/bevfusion_mit_light/lib/python3.7/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ] 3752/3769, 12.6 task/s, elapsed: 297s, ETA: 1s^CTraceback (most recent call last):

Additional information

No response

JingweiZhang12 commented 1 year ago

@youngfly Hi, Maybe the number of iterations in training is smaller than the logger interval, so it didn't show the training log. You can set

log_config = dict(
interval=1, ...). 

If it shows the training processing, please make sure that the annotation file of training and the length of the training dataset are both normal.