yitu-opensource / T2T-ViT

ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
Other
1.14k stars 177 forks source link

Unable to train T2T-ViT for 384 x 384 image #53

Open SK124 opened 3 years ago

SK124 commented 3 years ago

Hi! Can you suggest what part of the code should be modified to prevent the following error? Also, Can i train images on my own input dimensions like 448 or 608?

from models.t2t_vit import * model = T2t_vit_14() inp=torch.rand(2,3,384,384) out=model(inp) out.shape

RuntimeError Traceback (most recent call last)

in () 1 inp=torch.rand(2,3,384,384) ----> 2 out=model(inp) 3 out.shape 2 frames /content/T2T-ViT/models/t2t_vit.py in forward_features(self, x) 159 cls_tokens = self.cls_token.expand(B, -1, -1) 160 x = torch.cat((cls_tokens, x), dim=1) --> 161 x = x + self.pos_embed 162 x = self.pos_drop(x) 163 RuntimeError: The size of tensor a (577) must match the size of tensor b (197) at non-singleton dimension 1
yuanli2333 commented 3 years ago

Hi,

If you want to train our model with other image size like 384x384, please use:

from models.t2t_vit import *
model = T2t_vit_14(img_size=384)