Closed chenwydj closed 3 years ago
We have tried to train our mode on the size of 384x384, the hyperparameters in our training scripts can achieve good results, like T2T-ViT-14 can achieve 83.3% top1 accuracy. So for 64x64, I guess you can try our hyperparamters first.
Thank you very much!
I assume current architecture-related hyper-parameters (e.g. kernel_size of first several soft_split layers) are designed for 224x224 imagenet images.
Which hyperparameters should I change if I have different input size, say 64x64 imagenet images?
Thank you very much!