microsoft / esvit

EsViT: Efficient self-supervised Vision Transformers
MIT License
407 stars 45 forks source link

Question about the Learning Rate used for pretraining #19

Open Annbless opened 2 years ago

Annbless commented 2 years ago

Hello.

Thank you for the wonderful work! I have some questions about the learning rate used to pretrain the Swin model in Table 1. As the logs show, the learning rate for the Swin-T model is 0.0005180447994195404 at 201 epoch, while the learning rate for the Swin-S/B model is 0.00025939212681290886 at 201 epoch. however, the parameters shown for the 'args' keyword in the pre-trained model are the same.

Could you please tell me why there is a difference in learning rate in the training log?

Thanks in advance.