Closed junwuzhang19 closed 1 year ago
Thanks. First note --base_lr
is the base learning rate: the actual lr would be base_lr * bs / 256
, as in /pretrain/utils/arg_util.py line131. And this lr scaling rule is commonly used in prior work like MAE.
So basically when we change the batch size, we don't need to care about --base_lr
. It will adjust itself.
But, yes, actually we do double the base_lr
when in 384 config, which is refer to BEiT. The reason could be the pretraining task in 384 are more difficult so we need a larger base_lr.
Hi, thanks for your work. In general, for smaller batch size, a smaller learning rate is needed. Why SparK needs a larger learning rate with smaller batch size while pretraining on 384x384?