make RMSNorm or other small parameters trainable with lora

IvanSedykh commented 1 month ago

Feature request

Add a convinient way to unfreeze and later save specific weigths, which are not nn.Linear.

I wonder, if it would be relevant to implement this functionality in this library. Or it is out of it's scope.

Motivation

While transformers are mostly consist of linear layers, they still has some other parameters(for example scaling weigths in normalization layers). One may benefit from training them. For example, some studies highlighted the importance of normalization layers scaling weigths. There are only few of them, so it would be super cheap to tune them.

Your contribution

Looks like it requires one extra parameter in the config(similar to target_modules), and a loop over model parameters to set requires_grad_(True).

I would be glad to implement it and open a PR with some little guidance on design choices.

BenjaminBossan commented 1 month ago

Hey, thanks for the suggestion. We already have a method called LayerNorm tuning which is specifically for fine-tuning via layer norms (other types would also work): https://huggingface.co/docs/peft/v0.12.0/en/package_reference/layernorm_tuning#layernorm-tuning.

If you want to combine this, however, with methods like LoRA, you should be able to do this by adding said layers to modules_to_save in the LoraConfig.

IvanSedykh commented 1 month ago

oh, okay :)

thanks closing the issue then

huggingface / peft