Inconsistent hyper-parameter configuration between the code and arxiv report

microsoft / CSWin-Transformer

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped, CVPR 2022

MIT License

532 stars 78 forks source link

Inconsistent hyper-parameter configuration between the code and arxiv report #9

Open tonysy opened 2 years ago

tonysy commented 2 years ago

Hi, I have noticed the hyper-parameter configuration used in the code is inconsistent with the arxiv report

lr: 2e-3 for bs256(code) vs 0.001 for bs1024(paper)

I'm wondering whether this inconsistency makes the comparison unfair?

LightDXY commented 2 years ago

Hi, we follow the setting in deit that learning rate= 5e-4 * batch_size /512, the batch size in our code is 256 per GPU, so its total batch size is 2048 with lr 2e-3, the learning rate and batch size in the paper is 1e-3 and 1024, so they are nearly the same,