Interpolation of positional embeddings when fine-tuning (320x320) when the model is pre-trained on 512x512.

microsoft / Swin-Transformer

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".

https://arxiv.org/abs/2103.14030

MIT License

13.98k stars 2.06k forks source link

Open AdharshC7777 opened 4 months ago

AdharshC7777 commented 4 months ago

just as the title says, do we need interpolation of positional embeddings in the scenerio. If yes, why?

sebbelese commented 1 month ago

I would say you need interpolation since your image has not the same resolution as the trained model (higher or lower).