Thank you for the wonderful work!
I have some questions about the learning rate used to pretrain the Swin model in Table 1.
As the logs show, the learning rate for the Swin-T model is 0.0005180447994195404 at 201 epoch, while the learning rate for the Swin-S/B model is 0.00025939212681290886 at 201 epoch. however, the parameters shown for the 'args' keyword in the pre-trained model are the same.
Could you please tell me why there is a difference in learning rate in the training log?
Hello.
Thank you for the wonderful work! I have some questions about the learning rate used to pretrain the Swin model in Table 1. As the logs show, the learning rate for the Swin-T model is 0.0005180447994195404 at 201 epoch, while the learning rate for the Swin-S/B model is 0.00025939212681290886 at 201 epoch. however, the parameters shown for the 'args' keyword in the pre-trained model are the same.
Could you please tell me why there is a difference in learning rate in the training log?
Thanks in advance.