Open timinar opened 2 months ago
Yes, this method def no_weight_decay()
is called in BEiT-2 when training. However, I searched for it and it is not called in the original 2nd place solution or fastai.
Also, I can find optimizer_grouped_parameters
in BEiT-2, but not in the original 2nd place solution or fastai.
The 2nd place solution uses fastai.vision.all.OptimWrapper. But fastai.vision.all.OptimWrapper or its base class do not contain this method or call this method.
I find this to be a historical reason by BEiT-2, and maybe this method def no_weight_decay()
can be removed.
The
DeepIce
model contains a method calledno_weight_decay()
which is intended to specify that thecls_token
parameter should not be subject to weight decay during training:However,
optimizer_grouped_parameters
are not specified during training, so this method has no effect. I believe that in the original 2nd place code, FastAI's wrapper around AdamW handled this automatically.