[bug] icemix.py port: Optimizer not handling weight decay for cls_token

graphnet-team / graphnet

A Deep learning library for neutrino telescopes

Apache License 2.0

85 stars 85 forks source link

The DeepIce model contains a method called no_weight_decay() which is intended to specify that the cls_token parameter should not be subject to weight decay during training:

@torch.jit.ignore
def no_weight_decay(self) -> Set:
    """cls_tocken should not be subject to weight decay during training."""
    return {"cls_token"}

However, optimizer_grouped_parameters are not specified during training, so this method has no effect. I believe that in the original 2nd place code, FastAI's wrapper around AdamW handled this automatically.

graphnet-team / graphnet

[bug] icemix.py port: Optimizer not handling weight decay for cls_token #713