yitu-opensource / T2T-ViT

ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
Other
1.15k stars 176 forks source link

"T2T-ViT-14, 384" cannot be loaded into "T2t_vit_14" model #72

Closed Carlisle-Liu closed 2 years ago

Carlisle-Liu commented 2 years ago

Hi,

It seems that the released weight "T2T-ViT-14, 384" is not compatible with "T2t_vit_14" model. It raises the following error when attempting the load the weight

size mismatch for pos_embed: copying a param with shape torch.Size([1, 577, 384]) from checkpoint, the shape in current model is torch.Size([1, 197, 384]).

Would you release the model compatible with the weight "T2T-ViT-14, 384" soon?

Thank you.

yuanli2333 commented 2 years ago

If you want to load the weights with size of 384, you should set the input size of model as 384. Our weight of "T2T-ViT-14, 384" is correct, but you load it with a wrong way.