Open zkailinzhang opened 3 weeks ago
以上为多卡训练 卡住了 改单卡训练也卡主了
07/04 16:52:40 - mmengine - INFO - paramwise_options -- backbone.layer4.2.conv2.conv_offset.bias:lr=2e-05 07/04 16:52:40 - mmengine - INFO - paramwise_options -- backbone.layer4.2.conv2.conv_offset.bias:weight_decay=0.01 07/04 16:52:40 - mmengine - INFO - paramwise_options -- backbone.layer4.2.conv2.conv_offset.bias:lr_mult=0.1 07/04 16:52:40 - mmengine - WARNING - backbone.layer4.2.bn2.weight is skipped since its requires_grad=False 07/04 16:52:40 - mmengine - WARNING - backbone.layer4.2.bn2.bias is skipped since its requires_grad=False 07/04 16:52:40 - mmengine - INFO - paramwise_options -- backbone.layer4.2.conv3.weight:lr=2e-05 07/04 16:52:40 - mmengine - INFO - paramwise_options -- backbone.layer4.2.conv3.weight:weight_decay=0.01 07/04 16:52:40 - mmengine - INFO - paramwise_options -- backbone.layer4.2.conv3.weight:lr_mult=0.1 07/04 16:52:40 - mmengine - WARNING - backbone.layer4.2.bn3.weight is skipped since its requires_grad=False 07/04 16:52:40 - mmengine - WARNING - backbone.layer4.2.bn3.bias is skipped since its requires_grad=False /home/zkl/code/det3d_demo/mmdetection3d/mmdet3d/evaluation/functional/kitti_utils/eval.py:10: NumbaDeprecationWarning: The 'nopython' keyword argument was no t supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details. def get_thresholds(scores: np.ndarray, num_gt, num_sample_pts=41): 07/04 16:52:57 - mmengine - WARNING - The prefix is not set in metric class SegMetric. 07/04 16:52:59 - mmengine - INFO - load backbone. in model from: checkpoints/tpvformer_pretrained_fcos3d_r101_dcn.pth Loads checkpoint by local backend from path: checkpoints/tpvformer_pretrained_fcos3d_r101_dcn.pth 07/04 16:52:59 - mmengine - INFO - load neck. in model from: checkpoints/tpvformer_pretrained_fcos3d_r101_dcn.pth Loads checkpoint by local backend from path: checkpoints/tpvformer_pretrained_fcos3d_r101_dcn.pth 07/04 16:52:59 - mmengine - WARNING - The model and loaded state dict do not match exactly
size mismatch for lateral_convs.0.conv.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.S ize([128, 512, 1, 1]). size mismatch for lateral_convs.0.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for lateral_convs.1.conv.weight: copying a param with shape torch.Size([256, 1024, 1, 1]) from checkpoint, the shape in current model is torch. Size([128, 1024, 1, 1]). size mismatch for lateral_convs.1.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for lateral_convs.2.conv.weight: copying a param with shape torch.Size([256, 2048, 1, 1]) from checkpoint, the shape in current model is torch. Size([128, 2048, 1, 1]). size mismatch for lateral_convs.2.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for fpn_convs.0.conv.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size( [128, 128, 3, 3]). size mismatch for fpn_convs.0.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for fpn_convs.1.conv.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size( [128, 128, 3, 3]). size mismatch for fpn_convs.1.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for fpn_convs.2.conv.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size( [128, 128, 3, 3]). size mismatch for fpn_convs.2.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for fpn_convs.3.conv.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size( [128, 128, 3, 3]). size mismatch for fpn_convs.3.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). unexpected key in source state_dict: fpn_convs.4.conv.weight, fpn_convs.4.conv.bias
07/04 16:52:59 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fil eio.html#file-io 07/04 16:52:59 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future. 07/04 16:52:59 - mmengine - INFO - Checkpoints will be saved to /home/zkl/code/det3d_demo/mmdetection3d/work_dirs/tpvformer_8xb1-2x_nus-seg.
但是显存一直在变
单卡训练的日志有了,
先跑一晚上吧 明天试试多卡的
Prerequisite
Task
I'm using the official example scripts/configs for the officially supported tasks/models/datasets.
Branch
main branch https://github.com/open-mmlab/mmdetection3d
Environment
q
Reproduces the problem - code sample
] vis_backends = [ dict(type='LocalVisBackend'), ] visualizer = dict( name='visualizer', type='Det3DLocalVisualizer', vis_backends=[ dict(type='LocalVisBackend'), ]) work_dir = './work_dirs/tpvformer_8xb1-2x_nus-seg'
/home/zkl/code/det3d_demo/mmdetection3d/projects/TPVFormer/tpvformer/tpvformer_layer.py:69: UserWarning: The arguments
feedforward_channels
in BaseTransformerLayer has been deprecated, now you should setfeedforward_channels
and other FFN related arguments to a dict namedffn_cfgs
. warnings.warn( /home/zkl/code/det3d_demo/mmdetection3d/projects/TPVFormer/tpvformer/tpvformer_layer.py:69: UserWarning: The argumentsffn_dropout
in BaseTransformerLayer has been deprecated, now you should setffn_drop
and other FFN related arguments to a dict namedffn_cfgs
. warnings.warn(Reproduces the problem - command or script
bash tools/dist_train.sh projects/TPVFormer/configs/tpvformer_8xb1-2x_nus-seg.py 2
Reproduces the problem - error message
] vis_backends = [ dict(type='LocalVisBackend'), ] visualizer = dict( name='visualizer', type='Det3DLocalVisualizer', vis_backends=[ dict(type='LocalVisBackend'), ]) work_dir = './work_dirs/tpvformer_8xb1-2x_nus-seg'
/home/zkl/code/det3d_demo/mmdetection3d/projects/TPVFormer/tpvformer/tpvformer_layer.py:69: UserWarning: The arguments
feedforward_channels
in BaseTransformerLayer has been deprecated, now you should setfeedforward_channels
and other FFN related arguments to a dict namedffn_cfgs
. warnings.warn( /home/zkl/code/det3d_demo/mmdetection3d/projects/TPVFormer/tpvformer/tpvformer_layer.py:69: UserWarning: The argumentsffn_dropout
in BaseTransformerLayer has been deprecated, now you should setffn_drop
and other FFN related arguments to a dict namedffn_cfgs
. warnings.warn(Additional information
q