jeonsworld / ViT-pytorch

Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)
MIT License
1.94k stars 370 forks source link

How to adapt arbitrary image size? #19

Closed Sumching closed 3 years ago

Sumching commented 3 years ago

The length of learnable position embedding should be specified when it is initialized, so it is impossible to process images of other sizes. Is there any way to solve this problem?

zhoufengfan commented 3 years ago

Just change your img_size from the bash command --img_size [new_image_size]. The code will fit the length of learnable position embedding adaptively.