keyu-tian / SparK

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
https://arxiv.org/abs/2301.03580
MIT License
1.42k stars 82 forks source link

Lr layer decay #15

Closed Vickeyhw closed 1 year ago

Vickeyhw commented 1 year ago

Thanks for your excellent work! I noticed that LR layer decay is not used in ImageNet fine-tuning, but it is used in detection. Why not use layer decay like in transformer fine-tuning? What influence will it have if layer decay is adopted in ImageNet fine-tuning?

keyu-tian commented 1 year ago

Thanks! The lr layer decay is also used in ImageNet fine-tuning, you could check the lr_scale in downstream_imagenet/arg.py line13-23 to see these values (0.8 or 0.7)