jeonsworld / ViT-pytorch

Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)
MIT License
1.95k stars 374 forks source link

Tensors do not match? #3

Closed chaoyanghe closed 4 years ago

chaoyanghe commented 4 years ago

File "/Users/chaoyanghe/sourcecode/FedML/fedml_api/model/cv/transformer/vit/vision_transformer.py", line 258, in forward x, attn_weights = self.transformer(x) File "/Users/chaoyanghe/opt/anaconda3/envs/fedml/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/Users/chaoyanghe/sourcecode/FedML/fedml_api/model/cv/transformer/vit/vision_transformer.py", line 242, in forward embedding_output = self.embeddings(input_ids) File "/Users/chaoyanghe/opt/anaconda3/envs/fedml/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, **kwargs) File "/Users/chaoyanghe/sourcecode/FedML/fedml_api/model/cv/transformer/vit/vision_transformer.py", line 151, in forward embeddings = x + self.position_embeddings RuntimeError: The size of tensor a (5) must match the size of tensor b (197) at non-singleton dimension 1

I am training using CIFAR10 but got the above issue? may I know why?

chaoyanghe commented 4 years ago

I figured out this issue by myself. The problem is that the paper claimed all fine-tuned tasks can use 224x224 resolution... I change the image_size in the model.init() function to 32x32, it's fixed.