Open sedol1339 opened 2 days ago
There is no out-of-the-box option to specify weight decay against pretrained weights that I am aware of, but you could achieve it easily by adding the loss yourself.
For example, here is an example torch training loop. You could patch it to include the differences between the initial parameters and current ones, square it, and sum to the total loss with a weight term.
There is no out-of-the-box option to specify weight decay against pretrained weights that I am aware of, but you could achieve it easily by adding the loss yourself.
For example, here is an example torch training loop. You could patch it to include the differences between the initial parameters and current ones, square it, and sum to the total loss with a weight term.
However, this is not transformers.Trainer
, but instead a custom loop. So, with Trainer this is not achievable?
Looking at the available args for trainer it doesn't seem to be supported, but I am not 100% sure
This seems like a custom usage, I don't think we support this out of the box!
Hello, let me one question.
If using HF Trainer for supervised fune-tuning, how do I implement penalizing the distance between starting and current weights? This was shown to be effective in https://arxiv.org/abs/1706.03610