I pretrained a Swin-Large in MMpretrain using the SIMmim method, but a bug occurred when I used the .pth file for the downstream object detection task (DINO).

the log is bml@jupyter-1ae41f6b2e337d68-0:~/storage/wzy/projects/mmpretrain$ bash ./tools/benchmarks/mmdetection/mim_dist_train_fpn.sh ./configs/dino/dino-5scale_swin-l_8xb2-36e_coco.py ./work_dirs/simmim_swin-large-w12_16xb128-amp-coslr-800e_in1k-192px/epoch_799.pth 4

CFG=./configs/dino/dino-5scale_swin-l_8xb2-36e_coco.py
PRETRAIN=./work_dirs/simmim_swin-large-w12_16xb128-amp-coslr-800e_in1k-192px/epoch_799.pth
GPUS=4
PY_ARGS= ++ dirname ./tools/benchmarks/mmdetection/mim_dist_train_fpn.sh
PYTHONPATH=./tools/benchmarks/mmdetection/..:
mim train mmdet ./configs/dino/dino-5scale_swin-l_8xb2-36e_coco.py --launcher pytorch -G 4 --cfg-options model.backbone.init_cfg.type=Pretrained model.backbone.init_cfg.checkpoint=./work_dirs/simmim_swin-large-w12_16xb128-amp-coslr-800e_in1k-192px/epoch_799.pth model.backbone.init_cfg.prefix=backbone. Using port 26514 for synchronization. The path ./configs/dino/dino-5scale_swin-l_8xb2-36e_coco.py doesn't exist but we find the config file in codebase mmdet, will use /home/bml/storage/wzy/projects/mmdetection/configs/dino/dino-5scale_swin-l_8xb2-36e_coco.py instead. Training command is /opt/conda/bin/python -m torch.distributed.launch --nproc_per_node=4 --master_port=26514 /home/bml/storage/wzy/projects/mmdetection/mmdet/.mim/tools/train.py /home/bml/storage/wzy/projects/mmdetection/configs/dino/dino-5scale_swin-l_8xb2-36e_coco.py --launcher pytorch --cfg-options model.backbone.init_cfg.type=Pretrained model.backbone.init_cfg.checkpoint=./work_dirs/simmim_swin-large-w12_16xb128-amp-coslr-800e_in1k-192px/epoch_799.pth model.backbone.init_cfg.prefix=backbone.. /opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects --local_rank argument to be set, please change it to read from os.environ['LOCAL_RANK'] instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions

FutureWarning, WARNING:torch.distributed.run:

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

/opt/conda/lib/python3.7/site-packages/mmengine/utils/dl_utils/setup_env.py:57: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 'Setting MKL_NUM_THREADS environment variable for each process' /opt/conda/lib/python3.7/site-packages/mmengine/utils/dl_utils/setup_env.py:57: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 'Setting MKL_NUM_THREADS environment variable for each process' /opt/conda/lib/python3.7/site-packages/mmengine/utils/dl_utils/setup_env.py:57: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 'Setting MKL_NUM_THREADS environment variable for each process' /opt/conda/lib/python3.7/site-packages/mmengine/utils/dl_utils/setup_env.py:57: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 'Setting MKL_NUM_THREADS environment variable for each process' 05/18 10:53:17 - mmengine - INFO -

System environment: sys.platform: linux Python: 3.7.10 (default, Jun 4 2021, 14:48:32) [GCC 7.5.0] CUDA available: True numpy_random_seed: 1063577596 GPU 0,1,2,3: NVIDIA A100-PCIE-40GB CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.3, V11.3.109 GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 PyTorch: 1.10.1+cu113 PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX512
- CUDA Runtime 11.3
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
- CuDNN 8.2
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.11.2+cu113 OpenCV: 4.1.2 MMEngine: 0.7.2

Runtime environment: cudnn_benchmark: False mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: None Distributed launcher: pytorch Distributed training: True GPU number: 4

05/18 10:53:20 - mmengine - INFO - Config: dataset_type = 'CocoDataset' data_root = '/home/bml/storage/wzy/dataset/datasets/EL/cocox4/' backend_args = None train_pipeline = [ dict(type='LoadImageFromFile', backend_args=None), dict(type='LoadAnnotations', with_bbox=True), dict(type='RandomFlip', prob=0.5), dict( type='RandomChoice', transforms=[[{ 'type': 'RandomChoiceResize', 'scales': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'keep_ratio': True }], [{ 'type': 'RandomChoiceResize', 'scales': [(400, 4200), (500, 4200), (600, 4200)], 'keep_ratio': True }, { 'type': 'RandomCrop', 'crop_type': 'absolute_range', 'crop_size': (384, 600), 'allow_negative_crop': True }, { 'type': 'RandomChoiceResize', 'scales': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'keep_ratio': True }]]), dict(type='PackDetInputs') ] test_pipeline = [ dict(type='LoadImageFromFile', backend_args=None), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True), dict( type='PackDetInputs', meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor')) ] train_dataloader = dict( batch_size=1, num_workers=4, persistent_workers=True, sampler=dict(type='DefaultSampler', shuffle=True), batch_sampler=dict(type='AspectRatioBatchSampler'), dataset=dict( type='CocoDataset', data_root='/home/bml/storage/wzy/dataset/datasets/EL/cocox4/', ann_file='annotations/instances_train2017.json', data_prefix=dict(img='train2017/'), filter_cfg=dict(filter_empty_gt=False, min_size=32), pipeline=[ dict(type='LoadImageFromFile', backend_args=None), dict(type='LoadAnnotations', with_bbox=True), dict(type='RandomFlip', prob=0.5), dict( type='RandomChoice', transforms=[[{ 'type': 'RandomChoiceResize', 'scales': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'keep_ratio': True }], [{ 'type': 'RandomChoiceResize', 'scales': [(400, 4200), (500, 4200), (600, 4200)], 'keep_ratio': True }, { 'type': 'RandomCrop', 'crop_type': 'absolute_range', 'crop_size': (384, 600), 'allow_negative_crop': True }, { 'type': 'RandomChoiceResize', 'scales': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'keep_ratio': True }]]), dict(type='PackDetInputs') ], backend_args=None)) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, sampler=dict(type='DefaultSampler', shuffle=False), dataset=dict( type='CocoDataset', data_root='/home/bml/storage/wzy/dataset/datasets/EL/cocox4/', ann_file='annotations/instances_val2017.json', data_prefix=dict(img='val2017/'), test_mode=True, pipeline=[ dict(type='LoadImageFromFile', backend_args=None), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True), dict( type='PackDetInputs', meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor')) ], backend_args=None)) test_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, sampler=dict(type='DefaultSampler', shuffle=False), dataset=dict( type='CocoDataset', data_root='/home/bml/storage/wzy/dataset/datasets/EL/cocox4/', ann_file='annotations/instances_val2017.json', data_prefix=dict(img='val2017/'), test_mode=True, pipeline=[ dict(type='LoadImageFromFile', backend_args=None), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True), dict( type='PackDetInputs', meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor')) ], backend_args=None)) val_evaluator = dict( type='CocoMetric', ann_file= '/home/bml/storage/wzy/dataset/datasets/EL/cocox4/annotations/instances_val2017.json', metric='bbox', format_only=False, backend_args=None) test_evaluator = dict( type='CocoMetric', ann_file= '/home/bml/storage/wzy/dataset/datasets/EL/cocox4/annotations/instances_val2017.json', metric='bbox', format_only=False, backend_args=None) default_scope = 'mmdet' default_hooks = dict( timer=dict(type='IterTimerHook'), logger=dict(type='LoggerHook', interval=100), param_scheduler=dict(type='ParamSchedulerHook'), checkpoint=dict(type='CheckpointHook', interval=1), sampler_seed=dict(type='DistSamplerSeedHook'), visualization=dict(type='DetVisualizationHook')) env_cfg = dict( cudnn_benchmark=False, mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0), dist_cfg=dict(backend='nccl')) vis_backends = [dict(type='LocalVisBackend')] visualizer = dict( type='DetLocalVisualizer', vis_backends=[dict(type='LocalVisBackend')], name='visualizer') log_processor = dict(type='LogProcessor', window_size=50, by_epoch=True) log_level = 'INFO' load_from = None resume = False model = dict( type='DINO', num_queries=900, with_box_refine=True, as_two_stage=True, data_preprocessor=dict( type='DetDataPreprocessor', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=1), backbone=dict( type='SwinTransformer', pretrain_img_size=384, embed_dims=192, depths=[2, 2, 18, 2], num_heads=[6, 12, 24, 48], window_size=12, mlp_ratio=4, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.2, patch_norm=True, out_indices=(0, 1, 2, 3), with_cp=True, convert_weights=True, init_cfg=dict( type='Pretrained', checkpoint= './work_dirs/simmim_swin-large-w12_16xb128-amp-coslr-800e_in1k-192px/epoch_799.pth', prefix='backbone.')), neck=dict( type='ChannelMapper', in_channels=[192, 384, 768, 1536], kernel_size=1, out_channels=256, act_cfg=None, norm_cfg=dict(type='GN', num_groups=32), num_outs=5), encoder=dict( num_layers=6, layer_cfg=dict( self_attn_cfg=dict(embed_dims=256, num_levels=5, dropout=0.0), ffn_cfg=dict( embed_dims=256, feedforward_channels=2048, ffn_drop=0.0))), decoder=dict( num_layers=6, return_intermediate=True, layer_cfg=dict( self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), cross_attn_cfg=dict(embed_dims=256, num_levels=5, dropout=0.0), ffn_cfg=dict( embed_dims=256, feedforward_channels=2048, ffn_drop=0.0)), post_norm_cfg=None), positional_encoding=dict( num_feats=128, normalize=True, offset=0.0, temperature=20), bbox_head=dict( type='DINOHead', num_classes=8, sync_cls_avg_factor=True, loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=5.0), loss_iou=dict(type='GIoULoss', loss_weight=2.0)), dn_cfg=dict( label_noise_scale=0.5, box_noise_scale=1.0, group_cfg=dict(dynamic=True, num_groups=None, num_dn_queries=100)), train_cfg=dict( assigner=dict( type='HungarianAssigner', match_costs=[ dict(type='FocalLossCost', weight=2.0), dict(type='BBoxL1Cost', weight=5.0, box_format='xywh'), dict(type='IoUCost', iou_mode='giou', weight=2.0) ])), test_cfg=dict(max_per_img=300), num_feature_levels=5) optim_wrapper = dict( type='OptimWrapper', optimizer=dict(type='AdamW', lr=0.0001, weight_decay=0.0001), clip_grad=dict(max_norm=0.1, norm_type=2), paramwise_cfg=dict(custom_keys=dict(backbone=dict(lr_mult=0.1)))) max_epochs = 36 train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=36, val_interval=1) val_cfg = dict(type='ValLoop') test_cfg = dict(type='TestLoop') param_scheduler = [ dict( type='MultiStepLR', begin=0, end=36, by_epoch=True, milestones=[27, 33], gamma=0.1) ] auto_scale_lr = dict(base_batch_size=4) fp16 = dict(loss_scale=512.0) pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth' num_levels = 5 launcher = 'pytorch' work_dir = './work_dirs/dino-5scale_swin-l_8xb2-36e_coco'

the error is 5/18 10:38:04 - mmengine - INFO - paramwise_options -- backbone.norm3.weight:lr_mult=0.1 05/18 10:38:04 - mmengine - INFO - paramwise_options -- backbone.norm3.bias:lr=1e-05 05/18 10:38:04 - mmengine - INFO - paramwise_options -- backbone.norm3.bias:weight_decay=0.0001 05/18 10:38:04 - mmengine - INFO - paramwise_options -- backbone.norm3.bias:lr_mult=0.1 loading annotations into memory... Done (t=0.27s) creating index... index created! loading annotations into memory... Done (t=0.26s) creating index... index created! 05/18 10:38:08 - mmengine - INFO - Loads checkpoint by local backend from path: ./work_dirs/simmim_swin-large-w12_16xb128-amp-coslr-800e_in1k-192px/epoch_799.pth Traceback (most recent call last): File "/home/bml/storage/wzy/projects/mmdetection/mmdet/.mim/tools/train.py", line 133, in Traceback (most recent call last): File "/home/bml/storage/wzy/projects/mmdetection/mmdet/.mim/tools/train.py", line 133, in Traceback (most recent call last): File "/home/bml/storage/wzy/projects/mmdetection/mmdet/.mim/tools/train.py", line 133, in Traceback (most recent call last): File "/home/bml/storage/wzy/projects/mmdetection/mmdet/.mim/tools/train.py", line 133, in main() File "/home/bml/storage/wzy/projects/mmdetection/mmdet/.mim/tools/train.py", line 129, in main main() File "/home/bml/storage/wzy/projects/mmdetection/mmdet/.mim/tools/train.py", line 129, in main main() File "/home/bml/storage/wzy/projects/mmdetection/mmdet/.mim/tools/train.py", line 129, in main main() File "/home/bml/storage/wzy/projects/mmdetection/mmdet/.mim/tools/train.py", line 129, in main runner.train() File "/opt/conda/lib/python3.7/site-packages/mmengine/runner/runner.py", line 1692, in train self._init_model_weights() File "/opt/conda/lib/python3.7/site-packages/mmengine/runner/runner.py", line 899, in _init_model_weights model.init_weights() File "/home/bml/storage/wzy/projects/mmdetection/mmdet/models/detectors/dino.py", line 71, in init_weights runner.train() File "/opt/conda/lib/python3.7/site-packages/mmengine/runner/runner.py", line 1692, in train runner.train() File "/opt/conda/lib/python3.7/site-packages/mmengine/runner/runner.py", line 1692, in train runner.train() File "/opt/conda/lib/python3.7/site-packages/mmengine/runner/runner.py", line 1692, in train self._init_model_weights() File "/opt/conda/lib/python3.7/site-packages/mmengine/runner/runner.py", line 899, in _init_model_weights self._init_model_weights()model.init_weights()

File "/opt/conda/lib/python3.7/site-packages/mmengine/runner/runner.py", line 899, in _init_model_weights File "/home/bml/storage/wzy/projects/mmdetection/mmdet/models/detectors/dino.py", line 71, in init_weights self._init_model_weights() File "/opt/conda/lib/python3.7/site-packages/mmengine/runner/runner.py", line 899, in _init_model_weights model.init_weights() File "/home/bml/storage/wzy/projects/mmdetection/mmdet/models/detectors/dino.py", line 71, in init_weights model.init_weights() File "/home/bml/storage/wzy/projects/mmdetection/mmdet/models/detectors/dino.py", line 71, in init_weights super(DeformableDETR, self).init_weights() File "/opt/conda/lib/python3.7/site-packages/mmengine/model/base_module.py", line 130, in init_weights super(DeformableDETR, self).init_weights() File "/opt/conda/lib/python3.7/site-packages/mmengine/model/base_module.py", line 130, in init_weights m.init_weights() File "/home/bml/storage/wzy/projects/mmdetection/mmdet/models/backbones/swin.py", line 728, in init_weights m.init_weights() File "/home/bml/storage/wzy/projects/mmdetection/mmdet/models/backbones/swin.py", line 728, in init_weights super(DeformableDETR, self).init_weights() File "/opt/conda/lib/python3.7/site-packages/mmengine/model/base_module.py", line 130, in init_weights m.init_weights() File "/home/bml/storage/wzy/projects/mmdetection/mmdet/models/backbones/swin.py", line 728, in init_weights super(DeformableDETR, self).init_weights() File "/opt/conda/lib/python3.7/site-packages/mmengine/model/base_module.py", line 130, in init_weights m.init_weights() File "/home/bml/storage/wzy/projects/mmdetection/mmdet/models/backbones/swin.py", line 728, in init_weights table_current = self.state_dict()[table_key] KeyError: 'backbone.stages.0.blocks.0.attn.w_msa.relative_position_bias_table' table_current = self.state_dict()[table_key] KeyError: 'backbone.stages.0.blocks.0.attn.w_msa.relative_position_bias_table' table_current = self.state_dict()[table_key] KeyError: 'backbone.stages.0.blocks.0.attn.w_msa.relative_position_bias_table' table_current = self.state_dict()[table_key] KeyError: 'backbone.stages.0.blocks.0.attn.w_msa.relative_position_bias_table' ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 39229) of binary: /opt/conda/bin/python Traceback (most recent call last): File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in main() File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main launch(args) File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch run(args) File "/opt/conda/lib/python3.7/site-packages/torch/distributed/run.py", line 713, in run )(*cmd_args) File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent failures=result.failures, torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

/home/bml/storage/wzy/projects/mmdetection/mmdet/.mim/tools/train.py FAILED

Failures: [1]: time : 2023-05-18_10:38:19 host : jupyter-1ae41f6b2e337d68-0 rank : 1 (local_rank: 1) exitcode : 1 (pid: 39230) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [2]: time : 2023-05-18_10:38:19 host : jupyter-1ae41f6b2e337d68-0 rank : 2 (local_rank: 2) exitcode : 1 (pid: 39231) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [3]: time : 2023-05-18_10:38:19 host : jupyter-1ae41f6b2e337d68-0 rank : 3 (local_rank: 3) exitcode : 1 (pid: 39232) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure): [0]: time : 2023-05-18_10:38:19 host : jupyter-1ae41f6b2e337d68-0 rank : 0 (local_rank: 0) exitcode : 1 (pid: 39229) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Traceback (most recent call last): File "/home/bml/storage/.local/bin/mim", line 8, in sys.exit(cli()) File "/home/bml/storage/.local/lib/python3.7/site-packages/click/core.py", line 1130, in call return self.main(args, kwargs) File "/home/bml/storage/.local/lib/python3.7/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/home/bml/storage/.local/lib/python3.7/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/bml/storage/.local/lib/python3.7/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/bml/storage/.local/lib/python3.7/site-packages/click/core.py", line 760, in invoke return __callback(args, **kwargs) File "/home/bml/storage/.local/lib/python3.7/site-packages/mim/commands/train.py", line 111, in cli other_args=other_args) File "/home/bml/storage/.local/lib/python3.7/site-packages/mim/commands/train.py", line 262, in train cmd, env=dict(os.environ, MASTER_PORT=str(port))) File "/opt/conda/lib/python3.7/subprocess.py", line 363, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '-m', 'torch.distributed.launch', '--nproc_per_node=4', '--master_port=28365', '/home/bml/storage/wzy/projects/mmdetection/mmdet/.mim/tools/train.py', '/home/bml/storage/wzy/projects/mmdetection/configs/dino/dino-5scale_swin-l_8xb2-36e_coco.py', '--launcher', 'pytorch', '--cfg-options', 'model.backbone.init_cfg.type=Pretrained', 'model.backbone.init_cfg.checkpoint=./work_dirs/simmim_swin-large-w12_16xb128-amp-coslr-800e_in1k-192px/epoch_799.pth', 'model.backbone.init_cfg.prefix=backbone.']' returned non-zero exit status 1. bml@jupyter-1ae41f6b2e337d68-0:~/storage/wzy/projects/mmpretrain$

I think the key is “KeyError: 'backbone.stages.0.blocks.0.attn.w_msa.relative_position_bias_table' table_current = self.state_dict()[table_key]” but i can not fix it. could you pls help me？

open-mmlab / mmdetection

A Bug in Downstream task #10351

Runtime environment: cudnn_benchmark: False mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: None Distributed launcher: pytorch Distributed training: True GPU number: 4

/home/bml/storage/wzy/projects/mmdetection/mmdet/.mim/tools/train.py FAILED

Root Cause (first observed failure): [0]: time : 2023-05-18_10:38:19 host : jupyter-1ae41f6b2e337d68-0 rank : 0 (local_rank: 0) exitcode : 1 (pid: 39229) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html