Using vision transformers for different image resolutions

lucidrains / vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

MIT License

18.77k stars 2.86k forks source link

Using vision transformers for different image resolutions #280

Open Oussamab21 opened 9 months ago

Oussamab21 commented 9 months ago

Hi, I ma working on using vision transformers not only the vanilla ViT, but different models on UMDAA2 data set, this data set has an image resolution of 128*128 would it be better to transform the images into the vit desired resolution like 224 or 256 or it is better to keep the 128 and try to update the other vision transformer parameters to this resolution like the dim,depth,heads ?

lucidrains commented 9 months ago

@Oussamab21 you can keep the 128, but can lower the patch size to have more patch tokens