microsoft / Swin-Transformer

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
https://arxiv.org/abs/2103.14030
MIT License
13.98k stars 2.06k forks source link

Interpolation of positional embeddings when fine-tuning (320x320) when the model is pre-trained on 512x512. #363

Open AdharshC7777 opened 4 months ago

AdharshC7777 commented 4 months ago

just as the title says, do we need interpolation of positional embeddings in the scenerio. If yes, why?

sebbelese commented 1 month ago

I would say you need interpolation since your image has not the same resolution as the trained model (higher or lower).

Some details here: https://discuss.huggingface.co/t/fine-tuning-image-transformer-on-higher-resolution/22623/6