microsoft / Cream

This is a collection of our NAS and Vision Transformer work.
MIT License
1.66k stars 225 forks source link

Question about the swin-tiny config #125

Closed ywdong closed 2 years ago

ywdong commented 2 years ago

Dear Author, Thanks for your great work! I read the paper you have do the experiments about the swin-tiny with and w/o ImageNet-21k pretraining and distillation. Can you share the config and code here? Thx~

wkcn commented 2 years ago

Thanks for your attention to our work!

We use the same hyper-parameter as TinyViT to train Swin-T with and w/o ImageNet-21k pretraining and distillation.

You only need to change the model in the config.

For example, pretrain Swin-T on ImageNet-21k with distillation. Reference: tiny_vit_21m_22k_distill.yaml.

MODEL:
  NAME: swin_tiny_patch4_window7_224
  TYPE: swin

  DROP_PATH_RATE: 0.1

  SWIN:
    EMBED_DIM: 96
    DEPTHS: [ 2, 2, 6, 2 ]
    NUM_HEADS: [ 3, 6, 12, 24 ]
    WINDOW_SIZE: 7

TRAIN:
  EPOCHS: 90
  BASE_LR: 2.5e-4
  WARMUP_EPOCHS: 5
  WEIGHT_DECAY: 0.01

DATA:
  DATASET: imagenet22k

AUG:
  MIXUP: 0.0
  CUTMIX: 0.0