microsoft / Cream

This is a collection of our NAS and Vision Transformer work.
MIT License
1.62k stars 220 forks source link

Tinyvit: Size mismatch when finetuning with higher resolution #149

Closed KeyaoZhao closed 1 year ago

KeyaoZhao commented 1 year ago

How to finetune with higher resolution? When we finetune with higher resolution from 224 to 384, there will be error like:

"RuntimeError: Error(s) in loading state_dict for TinyViT:
        size mismatch for layers.1.blocks.0.attn.attention_biases: copying a param with shape torch.Size([6, 49]) from checkpoint, the shape in current model is torch.Size([6, 144]).
        size mismatch for layers.1.blocks.1.attn.attention_biases: copying a param with shape torch.Size([6, 49]) from checkpoint, the shape in current model is torch.Size([6, 144]).
        size mismatch for layers.2.blocks.0.attn.attention_biases: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([12, 576]).
        size mismatch for layers.2.blocks.1.attn.attention_biases: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([12, 576]).
        size mismatch for layers.2.blocks.2.attn.attention_biases: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([12, 576]).
        size mismatch for layers.2.blocks.3.attn.attention_biases: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([12, 576]).
        size mismatch for layers.2.blocks.4.attn.attention_biases: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([12, 576]).
        size mismatch for layers.2.blocks.5.attn.attention_biases: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([12, 576]).
        size mismatch for layers.3.blocks.0.attn.attention_biases: copying a param with shape torch.Size([18, 49]) from checkpoint, the shape in current model is torch.Size([18, 144]).
        size mismatch for layers.3.blocks.1.attn.attention_biases: copying a param with shape torch.Size([18, 49]) from checkpoint, the shape in current model is torch.Size([18, 144])."

Thanks a lot.

wkcn commented 1 year ago

Hi @KeyaoZhao, thanks for your attention to our work!

Sorry that I wrote a wrong command.

The --resume should be replaced with --pretrained, then the attn.attention_biases will be interpolated.

 python -m torch.distributed.launch --nproc_per_node 8 main.py --cfg configs/higher_resolution/tiny_vit_21m_224to384.yaml --data-path ./ImageNet --batch-size 32 --pretrained ./tiny_vit_21m_22kto1k_distill.pth --output ./output  --accumulation-steps 4