wkcn / TinyViT

[ECCV 2022] TinyViT: Fast Pretraining Distillation for Small Vision Transformers (https://github.com/microsoft/Cream/tree/main/TinyViT)
Other
40 stars 1 forks source link

Finetune TinyViT-5M Model at Higher Resolution #3

Closed WNjun closed 6 days ago

WNjun commented 1 month ago

Hi,

I've recently trained a custom TinyViT-5M-22kto1k model and attempted to finetune it from an image resolution of 224 to 384. However, I observed no improvement in the model's performance post finetuning. I'm wondering if this could be due to the limitations of the model's size, or perhaps I might be missing something in my configuration.

Here is the YAML configuration for the model:

MODEL:
  NAME: TinyViT-5M-224to384
  TYPE: tiny_vit

  DROP_PATH_RATE: 0.1

  TINY_VIT:
    DEPTHS: [ 2, 2, 6, 2 ]
    NUM_HEADS: [ 2, 4, 5, 10 ]
    WINDOW_SIZES: [ 12, 12, 24, 12 ]
    EMBED_DIMS: [64, 128, 160, 320]

DATA:
  IMG_SIZE: 384

OUTPUT: ./output
SAVE_FREQ: 10

TRAIN:
  EPOCHS: 30
  WARMUP_EPOCHS: 5
  WEIGHT_DECAY: 1e-8
  BASE_LR: 2e-05
  WARMUP_LR: 2e-08
  MIN_LR: 2e-07
  EVAL_BN_WHEN_TRAINING: True
TEST:
  CROP: False

AUG:
  MIXUP: 0.0
  CUTMIX: 0.0

Could you please help me identify any potential issues or suggest improvements to enhance the finetuning process?

Thank you for your assistance!

wkcn commented 1 month ago

Hi @WNjun , thanks for your attention to our work!

TinyViT-5M is a small model trained with a drop path rate of 0. Therefore, the argument DROP_PATH_RATE should be set to 0.0

Besides, TRAIN.LAYER_LR_DECAY could be set to 0.8

WNjun commented 1 month ago

Hi @wkcn, thank you for your prompt reply!

I've added DROP_PATH_RATE = 0 and LAYER_LR_DECAY = 0.8 to the config file. The results improved slightly from 94.2% to 94.73%, but they still don't surpass the 224 model's accuracy of 94.89%. Do you have any further suggestions? Also, my dataset has 11 classes; could this be affecting the results?

wkcn commented 1 month ago

@WNjun The head's weight corresponding to the 11 classes can be inherited from the 1k classifier head. You can refer to the implementation: https://github.com/wkcn/TinyViT/blob/main/utils.py#L75

Besides, the problem of overfitting could be avoided by decreasing LAYER_LR_DECAY.

WNjun commented 1 month ago

Thank you for your help! I'll definitely look into these and try them out later. Thanks again!