yitu-opensource / T2T-ViT

ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
Other
1.15k stars 176 forks source link

resizing images for feature visualization #48

Closed kahnchana closed 3 years ago

kahnchana commented 3 years ago

Hi, I'm really interested in this work, and was looking at the feature visualization section.

In this code, how do you feed larger size images to the model? (e.g. 512 to 384 VIT) Do you make any modifications?

yuanli2333 commented 3 years ago

Hi,

You can interpolate the position embedding for different image size with the function here.

Or directly use T2T-ViT as the way in the usage, we already put the interpolation function in the function of 'load_for_transfer_learning'.

kahnchana commented 3 years ago

Thanks a lot for the info. Got it working.