hustvl / YOLOS

[NeurIPS 2021] You Only Look at One Sequence
https://arxiv.org/abs/2106.00666
MIT License
827 stars 118 forks source link

Error of the size mismatch for pos_embed #21

Open lxn96 opened 2 years ago

lxn96 commented 2 years ago

We load our pretrained model of vit-base trained with mae method, and we meet the size mismatch for pos_embed. Is there any solution to this problem please?

RuntimeError: Error(s) in loading state_dict for VisionTransformer: size mismatch for pos_embed: copying a param with shape torch.Size([1, 785, 768]) from checkpoint, the shape in current model is torch.Size([1, 578, 768]).

Yuxin-CV commented 2 years ago

Hello! To my knowledge, MAE uses 2D sin-cos pos embed while YOLOS uses 1D abs learnable pos embed. I suggest changing the original YOLOS pos embed to MAE's.