huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.95k stars 26.53k forks source link

How to implement weight decay towards the pre-trained model? #33909

Open sedol1339 opened 2 days ago

sedol1339 commented 2 days ago

Hello, let me one question.

If using HF Trainer for supervised fune-tuning, how do I implement penalizing the distance between starting and current weights? This was shown to be effective in https://arxiv.org/abs/1706.03610

niqodea commented 2 days ago

There is no out-of-the-box option to specify weight decay against pretrained weights that I am aware of, but you could achieve it easily by adding the loss yourself.

For example, here is an example torch training loop. You could patch it to include the differences between the initial parameters and current ones, square it, and sum to the total loss with a weight term.

sedol1339 commented 2 days ago

There is no out-of-the-box option to specify weight decay against pretrained weights that I am aware of, but you could achieve it easily by adding the loss yourself.

For example, here is an example torch training loop. You could patch it to include the differences between the initial parameters and current ones, square it, and sum to the total loss with a weight term.

However, this is not transformers.Trainer, but instead a custom loop. So, with Trainer this is not achievable?

niqodea commented 2 days ago

Looking at the available args for trainer it doesn't seem to be supported, but I am not 100% sure

ArthurZucker commented 8 hours ago

This seems like a custom usage, I don't think we support this out of the box!