kyegomez / zeta

Build high-performance AI models with modular building blocks
https://zeta.apac.ai
Apache License 2.0
430 stars 42 forks source link

[BUG] No parameter named `gamma` in `decoupled_optimizer.py` #290

Open erlebach opened 1 month ago

erlebach commented 1 month ago

In decoupled_optimizer.py, one finds the code fragment:

    # Iterate through the named modules of the model.
    for module_name, module in model.named_modules():
        # Check if the current module is an instance of any of the desired
        # types (LayerNorm or torch.nn.Embedding).
        for ndim in [LayerNorm, torch.nn.Embedding]:
            if isinstance(module, ndim):
                # If torch.nn.Embedding, append its name with a ".weight"
                # suffix to the no_decay list.
                if module_name == exclude_module:
                    no_decay.append(f"{module_name}.weight")
                else:
                    # If the module is an instance of LayerNorm
                    no_decay.append(f"{module_name}.gamma")
                # Exit the inner loop since the desired module has been found.
                break

If the module_name != exclude_module, this code appends a parameter named gamma to the no_decay list. In this case, the layer is a LayerNorm, defined in torch.nn.LayerNorm, which only has parameters weight and bias. Thus, .gamma should be replaced by weight.

Of course, I do not really know why bias is not included. But that is for another day.

Upvote & Fund

Fund with Polar