yitu-opensource / T2T-ViT

ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
Other
1.15k stars 176 forks source link

finetune on a custom dataset with different image size #44

Closed GuoleiSun closed 3 years ago

GuoleiSun commented 3 years ago

Hi, thanks for your great work. May I ask what parameters should I change if I want to finetune your pretrained network (T2T-ViT-19) on a new dataset with a different image size (512*512). Did you do any such experiments? If so, maybe you did some modification to load the pretrained network. For example, maybe you have to interpolate position embedding or so on.

yuanli2333 commented 3 years ago

Hi,

If you want to finetune our pretrained model, the most important hyber-parameters you need to change is weight_decay, only use 1e-5 or 5e-5 when finetuning our T2T.

I have the codes to interpolate pisition embedding, and I will update these part of codes very soon.