Closed BebDong closed 2 years ago
Have you tested the ImageNet-1k dataset by using the given checkpoints? Make sure that your dataset can get given reported accuracy.
Yes, I have tested with the given checkpoints and got consistent results as the repo. So the dataset may be OK.
can you provide result.pkl
get by your checkpoint using test.py
,
./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--metrics ${METRICS}] [--out ${RESULT_FILE}]
External file sending may not be convenient. But I noticed that when using tools/test.py
to do evaluation the log showed The model and loaded state dict do not match exactly
. Details:
unexpected key in source state_dict: ema_backbone_cls_token, ema_backbone_tokens_to_token_attention1_ln1_weight, ema_backbone_tokens_to_token_attention1_ln1_bias, ema_backbone_tokens_to_token_attention1_attn_qkv_weight, ema_backbone_tokens_to_token_attention1_attn_proj_weight, ema_backbone_tokens_to_token_attention1_attn_proj_bias, ema_backbone_tokens_to_token_attention1_ln2_weight, ema_backbone_tokens_to_token_attention1_ln2_bias, ema_backbone_tokens_to_token_attention1_ffn_layers_0_0_weight, ema_backbone_tokens_to_token_attention1_ffn_layers_0_0_bias, ema_backbone_tokens_to_token_attention1_ffn_layers_1_weight, ema_backbone_tokens_to_token_attention1_ffn_layers_1_bias, ema_backbone_tokens_to_token_attention2_ln1_weight, ema_backbone_tokens_to_token_attention2_ln1_bias, ema_backbone_tokens_to_token_attention2_attn_qkv_weight, ema_backbone_tokens_to_token_attention2_attn_proj_weight, ema_backbone_tokens_to_token_attention2_attn_proj_bias, ema_backbone_tokens_to_token_attention2_ln2_weight, ema_backbone_tokens_to_token_attention2_ln2_bias, ema_backbone_tokens_to_token_attention2_ffn_layers_0_0_weight, ema_backbone_tokens_to_token_attention2_ffn_layers_0_0_bias, ema_backbone_tokens_to_token_attention2_ffn_layers_1_weight, ema_backbone_tokens_to_token_attention2_ffn_layers_1_bias, ema_backbone_tokens_to_token_project_weight, ema_backbone_tokens_to_token_project_bias, ema_backbone_encoder_0_ln1_weight, ema_backbone_encoder_0_ln1_bias, ema_backbone_encoder_0_attn_qkv_weight, ema_backbone_encoder_0_attn_proj_weight, ema_backbone_encoder_0_attn_proj_bias, ema_backbone_encoder_0_ln2_weight, ema_backbone_encoder_0_ln2_bias, ema_backbone_encoder_0_ffn_layers_0_0_weight, ema_backbone_encoder_0_ffn_layers_0_0_bias, ema_backbone_encoder_0_ffn_layers_1_weight, ema_backbone_encoder_0_ffn_layers_1_bias, ema_backbone_encoder_1_ln1_weight, ema_backbone_encoder_1_ln1_bias, ema_backbone_encoder_1_attn_qkv_weight, ema_backbone_encoder_1_attn_proj_weight, ema_backbone_encoder_1_attn_proj_bias, ema_backbone_encoder_1_ln2_weight, ema_backbone_encoder_1_ln2_bias, ema_backbone_encoder_1_ffn_layers_0_0_weight, ema_backbone_encoder_1_ffn_layers_0_0_bias, ema_backbone_encoder_1_ffn_layers_1_weight, ema_backbone_encoder_1_ffn_layers_1_bias, ema_backbone_encoder_2_ln1_weight, ema_backbone_encoder_2_ln1_bias, ema_backbone_encoder_2_attn_qkv_weight, ema_backbone_encoder_2_attn_proj_weight, ema_backbone_encoder_2_attn_proj_bias, ema_backbone_encoder_2_ln2_weight, ema_backbone_encoder_2_ln2_bias, ema_backbone_encoder_2_ffn_layers_0_0_weight, ema_backbone_encoder_2_ffn_layers_0_0_bias, ema_backbone_encoder_2_ffn_layers_1_weight, ema_backbone_encoder_2_ffn_layers_1_bias, ema_backbone_encoder_3_ln1_weight, ema_backbone_encoder_3_ln1_bias, ema_backbone_encoder_3_attn_qkv_weight, ema_backbone_encoder_3_attn_proj_weight, ema_backbone_encoder_3_attn_proj_bias, ema_backbone_encoder_3_ln2_weight, ema_backbone_encoder_3_ln2_bias, ema_backbone_encoder_3_ffn_layers_0_0_weight, ema_backbone_encoder_3_ffn_layers_0_0_bias, ema_backbone_encoder_3_ffn_layers_1_weight, ema_backbone_encoder_3_ffn_layers_1_bias, ema_backbone_encoder_4_ln1_weight, ema_backbone_encoder_4_ln1_bias, ema_backbone_encoder_4_attn_qkv_weight, ema_backbone_encoder_4_attn_proj_weight, ema_backbone_encoder_4_attn_proj_bias, ema_backbone_encoder_4_ln2_weight, ema_backbone_encoder_4_ln2_bias, ema_backbone_encoder_4_ffn_layers_0_0_weight, ema_backbone_encoder_4_ffn_layers_0_0_bias, ema_backbone_encoder_4_ffn_layers_1_weight, ema_backbone_encoder_4_ffn_layers_1_bias, ema_backbone_encoder_5_ln1_weight, ema_backbone_encoder_5_ln1_bias, ema_backbone_encoder_5_attn_qkv_weight, ema_backbone_encoder_5_attn_proj_weight, ema_backbone_encoder_5_attn_proj_bias, ema_backbone_encoder_5_ln2_weight, ema_backbone_encoder_5_ln2_bias, ema_backbone_encoder_5_ffn_layers_0_0_weight, ema_backbone_encoder_5_ffn_layers_0_0_bias, ema_backbone_encoder_5_ffn_layers_1_weight, ema_backbone_encoder_5_ffn_layers_1_bias, ema_backbone_encoder_6_ln1_weight, ema_backbone_encoder_6_ln1_bias, ema_backbone_encoder_6_attn_qkv_weight, ema_backbone_encoder_6_attn_proj_weight, ema_backbone_encoder_6_attn_proj_bias, ema_backbone_encoder_6_ln2_weight, ema_backbone_encoder_6_ln2_bias, ema_backbone_encoder_6_ffn_layers_0_0_weight, ema_backbone_encoder_6_ffn_layers_0_0_bias, ema_backbone_encoder_6_ffn_layers_1_weight, ema_backbone_encoder_6_ffn_layers_1_bias, ema_backbone_encoder_7_ln1_weight, ema_backbone_encoder_7_ln1_bias, ema_backbone_encoder_7_attn_qkv_weight, ema_backbone_encoder_7_attn_proj_weight, ema_backbone_encoder_7_attn_proj_bias, ema_backbone_encoder_7_ln2_weight, ema_backbone_encoder_7_ln2_bias, ema_backbone_encoder_7_ffn_layers_0_0_weight, ema_backbone_encoder_7_ffn_layers_0_0_bias, ema_backbone_encoder_7_ffn_layers_1_weight, ema_backbone_encoder_7_ffn_layers_1_bias, ema_backbone_encoder_8_ln1_weight, ema_backbone_encoder_8_ln1_bias, ema_backbone_encoder_8_attn_qkv_weight, ema_backbone_encoder_8_attn_proj_weight, ema_backbone_encoder_8_attn_proj_bias, ema_backbone_encoder_8_ln2_weight, ema_backbone_encoder_8_ln2_bias, ema_backbone_encoder_8_ffn_layers_0_0_weight, ema_backbone_encoder_8_ffn_layers_0_0_bias, ema_backbone_encoder_8_ffn_layers_1_weight, ema_backbone_encoder_8_ffn_layers_1_bias, ema_backbone_encoder_9_ln1_weight, ema_backbone_encoder_9_ln1_bias, ema_backbone_encoder_9_attn_qkv_weight, ema_backbone_encoder_9_attn_proj_weight, ema_backbone_encoder_9_attn_proj_bias, ema_backbone_encoder_9_ln2_weight, ema_backbone_encoder_9_ln2_bias, ema_backbone_encoder_9_ffn_layers_0_0_weight, ema_backbone_encoder_9_ffn_layers_0_0_bias, ema_backbone_encoder_9_ffn_layers_1_weight, ema_backbone_encoder_9_ffn_layers_1_bias, ema_backbone_encoder_10_ln1_weight, ema_backbone_encoder_10_ln1_bias, ema_backbone_encoder_10_attn_qkv_weight, ema_backbone_encoder_10_attn_proj_weight, ema_backbone_encoder_10_attn_proj_bias, ema_backbone_encoder_10_ln2_weight, ema_backbone_encoder_10_ln2_bias, ema_backbone_encoder_10_ffn_layers_0_0_weight, ema_backbone_encoder_10_ffn_layers_0_0_bias, ema_backbone_encoder_10_ffn_layers_1_weight, ema_backbone_encoder_10_ffn_layers_1_bias, ema_backbone_encoder_11_ln1_weight, ema_backbone_encoder_11_ln1_bias, ema_backbone_encoder_11_attn_qkv_weight, ema_backbone_encoder_11_attn_proj_weight, ema_backbone_encoder_11_attn_proj_bias, ema_backbone_encoder_11_ln2_weight, ema_backbone_encoder_11_ln2_bias, ema_backbone_encoder_11_ffn_layers_0_0_weight, ema_backbone_encoder_11_ffn_layers_0_0_bias, ema_backbone_encoder_11_ffn_layers_1_weight, ema_backbone_encoder_11_ffn_layers_1_bias, ema_backbone_encoder_12_ln1_weight, ema_backbone_encoder_12_ln1_bias, ema_backbone_encoder_12_attn_qkv_weight, ema_backbone_encoder_12_attn_proj_weight, ema_backbone_encoder_12_attn_proj_bias, ema_backbone_encoder_12_ln2_weight, ema_backbone_encoder_12_ln2_bias, ema_backbone_encoder_12_ffn_layers_0_0_weight, ema_backbone_encoder_12_ffn_layers_0_0_bias, ema_backbone_encoder_12_ffn_layers_1_weight, ema_backbone_encoder_12_ffn_layers_1_bias, ema_backbone_encoder_13_ln1_weight, ema_backbone_encoder_13_ln1_bias, ema_backbone_encoder_13_attn_qkv_weight, ema_backbone_encoder_13_attn_proj_weight, ema_backbone_encoder_13_attn_proj_bias, ema_backbone_encoder_13_ln2_weight, ema_backbone_encoder_13_ln2_bias, ema_backbone_encoder_13_ffn_layers_0_0_weight, ema_backbone_encoder_13_ffn_layers_0_0_bias, ema_backbone_encoder_13_ffn_layers_1_weight, ema_backbone_encoder_13_ffn_layers_1_bias, ema_backbone_norm_weight, ema_backbone_norm_bias, ema_head_layers_head_weight, ema_head_layers_head_bias
Part of the printed result.pkl
info:
{'accuracy_top-1': 0.9320000410079956, 'accuracy_top-5': 3.3960001468658447, 'class_scores': array([[0.00042726, 0.00068343, 0.0071601 , ..., 0.00039209, 0.00057438,
0.00125795],
[0.00038586, 0.00074933, 0.00519408, ..., 0.00047776, 0.00053544,
0.00120644],
[0.00067862, 0.0008437 , 0.00210273, ..., 0.00102068, 0.0007887 ,
0.00138641],
...,
[0.00043719, 0.00092473, 0.00403855, ..., 0.00063519, 0.0005136 ,
0.00120862],
[0.00047263, 0.00100704, 0.00258351, ..., 0.00073512, 0.0006326 ,
0.00279792],
[0.00056749, 0.00111329, 0.00241045, ..., 0.00080397, 0.00081281,
0.00100305]], dtype=float32), 'pred_score': array([0.00787427, 0.00642915, 0.00520228, ..., 0.00558689, 0.00464927,
0.00547685], dtype=float32), 'pred_label': array([ 6, 802, 500, ..., 500, 619, 116]), 'pred_class': ['stingray', 'snowmobile', 'cliff dwelling', 'water snake', 'chiton, coat-of-mail shell, sea cradle, polyplacophore', 'stingray', 'diamondback, diamondback rattlesnake, Crotalus adamanteus', 'stingray', 'cliff dwelling', 'lampshade, lamp shade',
Can you show your config? we will run to check.
Thanks a lot. Config is as follows:
2022-03-23 10:29:51,329 - mmcls - INFO - Distributed training: True
2022-03-23 10:29:51,855 - mmcls - INFO - Config:
embed_dims = 384
num_classes = 1000
model = dict(
type='ImageClassifier',
backbone=dict(
type='T2T_ViT',
img_size=224,
in_channels=3,
embed_dims=384,
t2t_cfg=dict(token_dims=64, use_performer=False),
num_layers=14,
layer_cfgs=dict(num_heads=6, feedforward_channels=1152),
drop_path_rate=0.1,
init_cfg=[
dict(type='TruncNormal', layer='Linear', std=0.02),
dict(type='Constant', layer='LayerNorm', val=1.0, bias=0.0)
]),
neck=None,
head=dict(
type='VisionTransformerClsHead',
num_classes=1000,
in_channels=384,
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
topk=(1, 5),
init_cfg=dict(type='TruncNormal', layer='Linear', std=0.02)),
train_cfg=dict(augments=[
dict(type='BatchMixup', alpha=0.8, prob=0.5, num_classes=1000),
dict(type='BatchCutMix', alpha=1.0, prob=0.5, num_classes=1000)
]))
rand_increasing_policies = [
dict(type='AutoContrast'),
dict(type='Equalize'),
dict(type='Invert'),
dict(type='Rotate', magnitude_key='angle', magnitude_range=(0, 30)),
dict(type='Posterize', magnitude_key='bits', magnitude_range=(4, 0)),
dict(type='Solarize', magnitude_key='thr', magnitude_range=(256, 0)),
dict(
type='SolarizeAdd',
magnitude_key='magnitude',
magnitude_range=(0, 110)),
dict(
type='ColorTransform',
magnitude_key='magnitude',
magnitude_range=(0, 0.9)),
dict(type='Contrast', magnitude_key='magnitude', magnitude_range=(0, 0.9)),
dict(
type='Brightness', magnitude_key='magnitude',
magnitude_range=(0, 0.9)),
dict(
type='Sharpness', magnitude_key='magnitude', magnitude_range=(0, 0.9)),
dict(
type='Shear',
magnitude_key='magnitude',
magnitude_range=(0, 0.3),
direction='horizontal'),
dict(
type='Shear',
magnitude_key='magnitude',
magnitude_range=(0, 0.3),
direction='vertical'),
dict(
type='Translate',
magnitude_key='magnitude',
magnitude_range=(0, 0.45),
direction='horizontal'),
dict(
type='Translate',
magnitude_key='magnitude',
magnitude_range=(0, 0.45),
direction='vertical')
]
dataset_type = 'ImageNet'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='RandomResizedCrop',
size=224,
backend='pillow',
interpolation='bicubic'),
dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
dict(
type='RandAugment',
policies=[
dict(type='AutoContrast'),
dict(type='Equalize'),
dict(type='Invert'),
dict(
type='Rotate', magnitude_key='angle', magnitude_range=(0, 30)),
dict(
type='Posterize', magnitude_key='bits',
magnitude_range=(4, 0)),
dict(
type='Solarize', magnitude_key='thr',
magnitude_range=(256, 0)),
dict(
type='SolarizeAdd',
magnitude_key='magnitude',
magnitude_range=(0, 110)),
dict(
type='ColorTransform',
magnitude_key='magnitude',
magnitude_range=(0, 0.9)),
dict(
type='Contrast',
magnitude_key='magnitude',
magnitude_range=(0, 0.9)),
dict(
type='Brightness',
magnitude_key='magnitude',
magnitude_range=(0, 0.9)),
dict(
type='Sharpness',
magnitude_key='magnitude',
magnitude_range=(0, 0.9)),
dict(
type='Shear',
magnitude_key='magnitude',
magnitude_range=(0, 0.3),
direction='horizontal'),
dict(
type='Shear',
magnitude_key='magnitude',
magnitude_range=(0, 0.3),
direction='vertical'),
dict(
type='Translate',
magnitude_key='magnitude',
magnitude_range=(0, 0.45),
direction='horizontal'),
dict(
type='Translate',
magnitude_key='magnitude',
magnitude_range=(0, 0.45),
direction='vertical')
],
num_policies=2,
total_level=10,
magnitude_level=9,
magnitude_std=0.5,
hparams=dict(pad_val=[104, 116, 124], interpolation='bicubic')),
dict(
type='RandomErasing',
erase_prob=0.25,
mode='rand',
min_area_ratio=0.02,
max_area_ratio=0.3333333333333333,
fill_color=[103.53, 116.28, 123.675],
fill_std=[57.375, 57.12, 58.395]),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='ImageToTensor', keys=['img']),
dict(type='ToTensor', keys=['gt_label']),
dict(type='Collect', keys=['img', 'gt_label'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='Resize',
size=(248, -1),
backend='pillow',
interpolation='bicubic'),
dict(type='CenterCrop', crop_size=224),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
]
data = dict(
samples_per_gpu=64,
workers_per_gpu=4,
train=dict(
type='ImageNet',
data_prefix='/cache/data/imagenet/train',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='RandomResizedCrop',
size=224,
backend='pillow',
interpolation='bicubic'),
dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
dict(
type='RandAugment',
policies=[
dict(type='AutoContrast'),
dict(type='Equalize'),
dict(type='Invert'),
dict(
type='Rotate',
magnitude_key='angle',
magnitude_range=(0, 30)),
dict(
type='Posterize',
magnitude_key='bits',
magnitude_range=(4, 0)),
dict(
type='Solarize',
magnitude_key='thr',
magnitude_range=(256, 0)),
dict(
type='SolarizeAdd',
magnitude_key='magnitude',
magnitude_range=(0, 110)),
dict(
type='ColorTransform',
magnitude_key='magnitude',
magnitude_range=(0, 0.9)),
dict(
type='Contrast',
magnitude_key='magnitude',
magnitude_range=(0, 0.9)),
dict(
type='Brightness',
magnitude_key='magnitude',
magnitude_range=(0, 0.9)),
dict(
type='Sharpness',
magnitude_key='magnitude',
magnitude_range=(0, 0.9)),
dict(
type='Shear',
magnitude_key='magnitude',
magnitude_range=(0, 0.3),
direction='horizontal'),
dict(
type='Shear',
magnitude_key='magnitude',
magnitude_range=(0, 0.3),
direction='vertical'),
dict(
type='Translate',
magnitude_key='magnitude',
magnitude_range=(0, 0.45),
direction='horizontal'),
dict(
type='Translate',
magnitude_key='magnitude',
magnitude_range=(0, 0.45),
direction='vertical')
],
num_policies=2,
total_level=10,
magnitude_level=9,
magnitude_std=0.5,
hparams=dict(pad_val=[104, 116, 124],
interpolation='bicubic')),
dict(
type='RandomErasing',
erase_prob=0.25,
mode='rand',
min_area_ratio=0.02,
max_area_ratio=0.3333333333333333,
fill_color=[103.53, 116.28, 123.675],
fill_std=[57.375, 57.12, 58.395]),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='ImageToTensor', keys=['img']),
dict(type='ToTensor', keys=['gt_label']),
dict(type='Collect', keys=['img', 'gt_label'])
]),
val=dict(
type='ImageNet',
data_prefix='/cache/data/imagenet/val',
ann_file='/cache/data/imagenet/meta/val.txt',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='Resize',
size=(248, -1),
backend='pillow',
interpolation='bicubic'),
dict(type='CenterCrop', crop_size=224),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
]),
test=dict(
type='ImageNet',
data_prefix='/cache/data/imagenet/val',
ann_file='/cache/data/imagenet/meta/val.txt',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='Resize',
size=(248, -1),
backend='pillow',
interpolation='bicubic'),
dict(type='CenterCrop', crop_size=224),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
]))
evaluation = dict(interval=1, metric='accuracy', save_best='auto')
checkpoint_config = dict(interval=1)
log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
paramwise_cfg = dict(
norm_decay_mult=0.0,
bias_decay_mult=0.0,
custom_keys=dict(cls_token=dict(decay_mult=0.0)))
optimizer = dict(
type='AdamW',
lr=0.0005,
weight_decay=0.05,
paramwise_cfg=dict(
norm_decay_mult=0.0,
bias_decay_mult=0.0,
custom_keys=dict(cls_token=dict(decay_mult=0.0))))
optimizer_config = dict(grad_clip=None)
lr_config = dict(
policy='CosineAnnealingCooldown',
min_lr=1e-05,
cool_down_time=10,
cool_down_ratio=0.1,
by_epoch=True,
warmup_by_epoch=True,
warmup='linear',
warmup_iters=10,
warmup_ratio=1e-06)
custom_hooks = [dict(type='EMAHook', momentum=4e-05, priority='ABOVE_NORMAL')]
runner = dict(type='EpochBasedRunner', max_epochs=310)
work_dir = '/home/ma-user/modelarts/outputs/train-url_0/'
gpu_ids = range(0, 8)
Well, our maximum accuracy is 100, instead of 1.0
@mzr1996 Well then, it is reasonable. Thanks!
But I noticed that when using tools/test.py
to do evaluation the log showed The model and loaded state dict do not match exactly
. Details:
unexpected key in source state_dict: ema_backbone_cls_token, ema_backbone_tokens_to_token_attention1_ln1_weight, ema_backbone_tokens_to_token_attention1_ln1_bias, ema_backbone_tokens_to_token_attention1_attn_qkv_weight, ema_backbone_tokens_to_token_attention1_attn_proj_weight, ema_backbone_tokens_to_token_attention1_attn_proj_bias, ema_backbone_tokens_to_token_attention1_ln2_weight, ema_backbone_tokens_to_token_attention1_ln2_bias, ema_backbone_tokens_to_token_attention1_ffn_layers_0_0_weight, ema_backbone_tokens_to_token_attention1_ffn_layers_0_0_bias, ema_backbone_tokens_to_token_attention1_ffn_layers_1_weight, ema_backbone_tokens_to_token_attention1_ffn_layers_1_bias, ema_backbone_tokens_to_token_attention2_ln1_weight, ema_backbone_tokens_to_token_attention2_ln1_bias, ema_backbone_tokens_to_token_attention2_attn_qkv_weight, ema_backbone_tokens_to_token_attention2_attn_proj_weight, ema_backbone_tokens_to_token_attention2_attn_proj_bias, ema_backbone_tokens_to_token_attention2_ln2_weight, ema_backbone_tokens_to_token_attention2_ln2_bias, ema_backbone_tokens_to_token_attention2_ffn_layers_0_0_weight, ema_backbone_tokens_to_token_attention2_ffn_layers_0_0_bias, ema_backbone_tokens_to_token_attention2_ffn_layers_1_weight, ema_backbone_tokens_to_token_attention2_ffn_layers_1_bias, ema_backbone_tokens_to_token_project_weight, ema_backbone_tokens_to_token_project_bias, ema_backbone_encoder_0_ln1_weight, ema_backbone_encoder_0_ln1_bias, ema_backbone_encoder_0_attn_qkv_weight, ema_backbone_encoder_0_attn_proj_weight, ema_backbone_encoder_0_attn_proj_bias, ema_backbone_encoder_0_ln2_weight, ema_backbone_encoder_0_ln2_bias, ema_backbone_encoder_0_ffn_layers_0_0_weight, ema_backbone_encoder_0_ffn_layers_0_0_bias, ema_backbone_encoder_0_ffn_layers_1_weight, ema_backbone_encoder_0_ffn_layers_1_bias, ema_backbone_encoder_1_ln1_weight, ema_backbone_encoder_1_ln1_bias, ema_backbone_encoder_1_attn_qkv_weight, ema_backbone_encoder_1_attn_proj_weight, ema_backbone_encoder_1_attn_proj_bias, ema_backbone_encoder_1_ln2_weight, ema_backbone_encoder_1_ln2_bias, ema_backbone_encoder_1_ffn_layers_0_0_weight, ema_backbone_encoder_1_ffn_layers_0_0_bias, ema_backbone_encoder_1_ffn_layers_1_weight, ema_backbone_encoder_1_ffn_layers_1_bias, ema_backbone_encoder_2_ln1_weight, ema_backbone_encoder_2_ln1_bias, ema_backbone_encoder_2_attn_qkv_weight, ema_backbone_encoder_2_attn_proj_weight, ema_backbone_encoder_2_attn_proj_bias, ema_backbone_encoder_2_ln2_weight, ema_backbone_encoder_2_ln2_bias, ema_backbone_encoder_2_ffn_layers_0_0_weight, ema_backbone_encoder_2_ffn_layers_0_0_bias, ema_backbone_encoder_2_ffn_layers_1_weight, ema_backbone_encoder_2_ffn_layers_1_bias, ema_backbone_encoder_3_ln1_weight, ema_backbone_encoder_3_ln1_bias, ema_backbone_encoder_3_attn_qkv_weight, ema_backbone_encoder_3_attn_proj_weight, ema_backbone_encoder_3_attn_proj_bias, ema_backbone_encoder_3_ln2_weight, ema_backbone_encoder_3_ln2_bias, ema_backbone_encoder_3_ffn_layers_0_0_weight, ema_backbone_encoder_3_ffn_layers_0_0_bias, ema_backbone_encoder_3_ffn_layers_1_weight, ema_backbone_encoder_3_ffn_layers_1_bias, ema_backbone_encoder_4_ln1_weight, ema_backbone_encoder_4_ln1_bias, ema_backbone_encoder_4_attn_qkv_weight, ema_backbone_encoder_4_attn_proj_weight, ema_backbone_encoder_4_attn_proj_bias, ema_backbone_encoder_4_ln2_weight, ema_backbone_encoder_4_ln2_bias, ema_backbone_encoder_4_ffn_layers_0_0_weight, ema_backbone_encoder_4_ffn_layers_0_0_bias, ema_backbone_encoder_4_ffn_layers_1_weight, ema_backbone_encoder_4_ffn_layers_1_bias, ema_backbone_encoder_5_ln1_weight, ema_backbone_encoder_5_ln1_bias, ema_backbone_encoder_5_attn_qkv_weight, ema_backbone_encoder_5_attn_proj_weight, ema_backbone_encoder_5_attn_proj_bias, ema_backbone_encoder_5_ln2_weight, ema_backbone_encoder_5_ln2_bias, ema_backbone_encoder_5_ffn_layers_0_0_weight, ema_backbone_encoder_5_ffn_layers_0_0_bias, ema_backbone_encoder_5_ffn_layers_1_weight, ema_backbone_encoder_5_ffn_layers_1_bias, ema_backbone_encoder_6_ln1_weight, ema_backbone_encoder_6_ln1_bias, ema_backbone_encoder_6_attn_qkv_weight, ema_backbone_encoder_6_attn_proj_weight, ema_backbone_encoder_6_attn_proj_bias, ema_backbone_encoder_6_ln2_weight, ema_backbone_encoder_6_ln2_bias, ema_backbone_encoder_6_ffn_layers_0_0_weight, ema_backbone_encoder_6_ffn_layers_0_0_bias, ema_backbone_encoder_6_ffn_layers_1_weight, ema_backbone_encoder_6_ffn_layers_1_bias, ema_backbone_encoder_7_ln1_weight, ema_backbone_encoder_7_ln1_bias, ema_backbone_encoder_7_attn_qkv_weight, ema_backbone_encoder_7_attn_proj_weight, ema_backbone_encoder_7_attn_proj_bias, ema_backbone_encoder_7_ln2_weight, ema_backbone_encoder_7_ln2_bias, ema_backbone_encoder_7_ffn_layers_0_0_weight, ema_backbone_encoder_7_ffn_layers_0_0_bias, ema_backbone_encoder_7_ffn_layers_1_weight, ema_backbone_encoder_7_ffn_layers_1_bias, ema_backbone_encoder_8_ln1_weight, ema_backbone_encoder_8_ln1_bias, ema_backbone_encoder_8_attn_qkv_weight, ema_backbone_encoder_8_attn_proj_weight, ema_backbone_encoder_8_attn_proj_bias, ema_backbone_encoder_8_ln2_weight, ema_backbone_encoder_8_ln2_bias, ema_backbone_encoder_8_ffn_layers_0_0_weight, ema_backbone_encoder_8_ffn_layers_0_0_bias, ema_backbone_encoder_8_ffn_layers_1_weight, ema_backbone_encoder_8_ffn_layers_1_bias, ema_backbone_encoder_9_ln1_weight, ema_backbone_encoder_9_ln1_bias, ema_backbone_encoder_9_attn_qkv_weight, ema_backbone_encoder_9_attn_proj_weight, ema_backbone_encoder_9_attn_proj_bias, ema_backbone_encoder_9_ln2_weight, ema_backbone_encoder_9_ln2_bias, ema_backbone_encoder_9_ffn_layers_0_0_weight, ema_backbone_encoder_9_ffn_layers_0_0_bias, ema_backbone_encoder_9_ffn_layers_1_weight, ema_backbone_encoder_9_ffn_layers_1_bias, ema_backbone_encoder_10_ln1_weight, ema_backbone_encoder_10_ln1_bias, ema_backbone_encoder_10_attn_qkv_weight, ema_backbone_encoder_10_attn_proj_weight, ema_backbone_encoder_10_attn_proj_bias, ema_backbone_encoder_10_ln2_weight, ema_backbone_encoder_10_ln2_bias, ema_backbone_encoder_10_ffn_layers_0_0_weight, ema_backbone_encoder_10_ffn_layers_0_0_bias, ema_backbone_encoder_10_ffn_layers_1_weight, ema_backbone_encoder_10_ffn_layers_1_bias, ema_backbone_encoder_11_ln1_weight, ema_backbone_encoder_11_ln1_bias, ema_backbone_encoder_11_attn_qkv_weight, ema_backbone_encoder_11_attn_proj_weight, ema_backbone_encoder_11_attn_proj_bias, ema_backbone_encoder_11_ln2_weight, ema_backbone_encoder_11_ln2_bias, ema_backbone_encoder_11_ffn_layers_0_0_weight, ema_backbone_encoder_11_ffn_layers_0_0_bias, ema_backbone_encoder_11_ffn_layers_1_weight, ema_backbone_encoder_11_ffn_layers_1_bias, ema_backbone_encoder_12_ln1_weight, ema_backbone_encoder_12_ln1_bias, ema_backbone_encoder_12_attn_qkv_weight, ema_backbone_encoder_12_attn_proj_weight, ema_backbone_encoder_12_attn_proj_bias, ema_backbone_encoder_12_ln2_weight, ema_backbone_encoder_12_ln2_bias, ema_backbone_encoder_12_ffn_layers_0_0_weight, ema_backbone_encoder_12_ffn_layers_0_0_bias, ema_backbone_encoder_12_ffn_layers_1_weight, ema_backbone_encoder_12_ffn_layers_1_bias, ema_backbone_encoder_13_ln1_weight, ema_backbone_encoder_13_ln1_bias, ema_backbone_encoder_13_attn_qkv_weight, ema_backbone_encoder_13_attn_proj_weight, ema_backbone_encoder_13_attn_proj_bias, ema_backbone_encoder_13_ln2_weight, ema_backbone_encoder_13_ln2_bias, ema_backbone_encoder_13_ffn_layers_0_0_weight, ema_backbone_encoder_13_ffn_layers_0_0_bias, ema_backbone_encoder_13_ffn_layers_1_weight, ema_backbone_encoder_13_ffn_layers_1_bias, ema_backbone_norm_weight, ema_backbone_norm_bias, ema_head_layers_head_weight, ema_head_layers_head_bias
It seems the saved state_dict
dose not match the model. I can not figure it out temporarily.
That's about the implementation of EMAHook, and these keys are used to resume EMA status. That warning can be ignored.
Checklist
Describe the question you meet
I am training the T2T-ViT model for ImageNet classification following exactly the provided config file. However, after several epochs, the TopK accuracy during validation becomes greater than 1.0.
Post related information
Environments
Your config file if you modified it or created a new one. No modifications to the provided config file.
Your train log file if you meet the problem during training.
Other code you modified in the
mmcls
folder. No extra modifications.