yitu-opensource / T2T-ViT

ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
Other
1.15k stars 176 forks source link

Which hyperparameters should I change if I have different input size? #35

Closed chenwydj closed 3 years ago

chenwydj commented 3 years ago

I assume current architecture-related hyper-parameters (e.g. kernel_size of first several soft_split layers) are designed for 224x224 imagenet images.

Which hyperparameters should I change if I have different input size, say 64x64 imagenet images?

Thank you very much!

yuanli2333 commented 3 years ago

We have tried to train our mode on the size of 384x384, the hyperparameters in our training scripts can achieve good results, like T2T-ViT-14 can achieve 83.3% top1 accuracy. So for 64x64, I guess you can try our hyperparamters first.

chenwydj commented 3 years ago

Thank you very much!