Question about the swin-tiny config

Thanks for your attention to our work!

We use the same hyper-parameter as TinyViT to train Swin-T with and w/o ImageNet-21k pretraining and distillation.

You only need to change the model in the config.

For example, pretrain Swin-T on ImageNet-21k with distillation. Reference: tiny_vit_21m_22k_distill.yaml.

MODEL:
  NAME: swin_tiny_patch4_window7_224
  TYPE: swin

  DROP_PATH_RATE: 0.1

  SWIN:
    EMBED_DIM: 96
    DEPTHS: [ 2, 2, 6, 2 ]
    NUM_HEADS: [ 3, 6, 12, 24 ]
    WINDOW_SIZE: 7

TRAIN:
  EPOCHS: 90
  BASE_LR: 2.5e-4
  WARMUP_EPOCHS: 5
  WEIGHT_DECAY: 0.01

DATA:
  DATASET: imagenet22k

AUG:
  MIXUP: 0.0
  CUTMIX: 0.0

microsoft / Cream

Question about the swin-tiny config #125