In decoupled_optimizer.py, one finds the code fragment:
# Iterate through the named modules of the model.
for module_name, module in model.named_modules():
# Check if the current module is an instance of any of the desired
# types (LayerNorm or torch.nn.Embedding).
for ndim in [LayerNorm, torch.nn.Embedding]:
if isinstance(module, ndim):
# If torch.nn.Embedding, append its name with a ".weight"
# suffix to the no_decay list.
if module_name == exclude_module:
no_decay.append(f"{module_name}.weight")
else:
# If the module is an instance of LayerNorm
no_decay.append(f"{module_name}.gamma")
# Exit the inner loop since the desired module has been found.
break
If the module_name != exclude_module, this code appends a parameter named gamma to the no_decay list. In this case, the layer is a LayerNorm, defined in torch.nn.LayerNorm, which only has parameters weight and bias. Thus, .gamma should be replaced by weight.
Of course, I do not really know why bias is not included. But that is for another day.
Upvote & Fund
We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.
In
decoupled_optimizer.py
, one finds the code fragment:If the
module_name != exclude_module
, this code appends a parameter namedgamma
to theno_decay
list. In this case, the layer is a LayerNorm, defined in torch.nn.LayerNorm, which only has parametersweight
andbias
. Thus,.gamma
should be replaced byweight
.Of course, I do not really know why
bias
is not included. But that is for another day.Upvote & Fund