Closed ywdong closed 2 years ago
Thanks for your attention to our work!
We use the same hyper-parameter as TinyViT to train Swin-T with and w/o ImageNet-21k pretraining and distillation.
You only need to change the model in the config.
For example, pretrain Swin-T on ImageNet-21k with distillation. Reference: tiny_vit_21m_22k_distill.yaml.
MODEL:
NAME: swin_tiny_patch4_window7_224
TYPE: swin
DROP_PATH_RATE: 0.1
SWIN:
EMBED_DIM: 96
DEPTHS: [ 2, 2, 6, 2 ]
NUM_HEADS: [ 3, 6, 12, 24 ]
WINDOW_SIZE: 7
TRAIN:
EPOCHS: 90
BASE_LR: 2.5e-4
WARMUP_EPOCHS: 5
WEIGHT_DECAY: 0.01
DATA:
DATASET: imagenet22k
AUG:
MIXUP: 0.0
CUTMIX: 0.0
Dear Author, Thanks for your great work! I read the paper you have do the experiments about the swin-tiny with and w/o ImageNet-21k pretraining and distillation. Can you share the config and code here? Thx~