Unable to train T2T-ViT for 384 x 384 image

Hi! Can you suggest what part of the code should be modified to prevent the following error? Also, Can i train images on my own input dimensions like 448 or 608?

from models.t2t_vit import * model = T2t_vit_14() inp=torch.rand(2,3,384,384) out=model(inp) out.shape

RuntimeError Traceback (most recent call last)

in () 1 inp=torch.rand(2,3,384,384) ----> 2 out=model(inp) 3 out.shape 2 frames /content/T2T-ViT/models/t2t_vit.py in forward_features(self, x) 159 cls_tokens = self.cls_token.expand(B, -1, -1) 160 x = torch.cat((cls_tokens, x), dim=1) --> 161 x = x + self.pos_embed 162 x = self.pos_drop(x) 163 RuntimeError: The size of tensor a (577) must match the size of tensor b (197) at non-singleton dimension 1

yitu-opensource / T2T-ViT

Unable to train T2T-ViT for 384 x 384 image #53

from models.t2t_vit import * model = T2t_vit_14() inp=torch.rand(2,3,384,384) out=model(inp) out.shape