Closed maxime-louis closed 2 months ago
Ok so after some tests, it seems that adding
model.set_adapter(['adapter_1', 'adapter_2'])
at the end of the forward method does allow for both adapters to be updated. Can you confirm this is the expected behaviour ?
I am not sure as this is peft specific, cc @BenjaminBossan !
As you correctly observed, @maxime-louis, it is crucial to ensure that requires_grad
is enabled for both adapters. For context, this is because by default, Trainer
from transformers (which I assume you're using) only passes the parameters with requires_grad=True
to the optimizer. When creating two adapters, only one has requires_grad=True
so the other adapter won't get any updates.
I noticed that a call to .parameters() does not get me all parameters (only those of active adapter), so I modified it to gather all parameters from both adapters (I checked, in the end I do get all parameters from both adapters and all of them requires_grad). I used that modification to declare an optimizer which I provided to the trainer.
I think you were on the right way, but it's not quite clear to me what you did, so I can't tell why this approach did not work. Generally, however, if you set model.set_adapter(['adapter_1', 'adapter_2'])
before passing the model to the Trainer
, both adapters should have requires_grad=True
and hence the optimizer should be initialized correctly.
it seems that adding
model.set_adapter(['adapter_1', 'adapter_2'])
at the end of the forward method does allow for both adapters to be updated
Normally, I don't think this should be necessary as long as earlier, it was ensured that both adapters had requires_grad=True
-- maybe this is related to the part earlier that I did not understand. If this workaround works for you, I think you can stick with it, but you could also check if my suggestion above is sufficient.
Thank you @BenjaminBossan, @ArthurZucker
Activating both adapters before giving the model to the trainer seems like the way to go and works well. (I haven't tried removing the activation within the forward yet, not sure it's a costly operation though) Maybe this could be part of the (very short!) documentation on adapters :)
Thank you !
Glad that it works now.
I agree, maybe a sentence or two could be added here: https://huggingface.co/docs/transformers/v4.43.4/en/peft#train-a-peft-adapter. But it is very much an edge case, as it requires training multiple adapters at the same time and also using Trainer
.
Ok thank you for your help, it's clearer now. More documentation is always welcome !
System Info
transformers
version: 4.42.4Who can help?
Hello !
I'm trying to simultaneously train some lora adapters on a model.
I use a syntax as follow: at model initialization init:
in my model forward:
Unlike most examples I found, I don't want to train the adapters 'separately' e.g. to do different tasks, but I want to train them at the same time (in the same trainer/dataset), using the two outputs to optimize a global loss.
I noticed that a call to .parameters() does not get me all parameters (only those of active adapter), so I modified it to gather all parameters from both adapters (I checked, in the end I do get all parameters from both adapters and all of them requires_grad). I used that modification to declare an optimizer which I provided to the trainer. In principle, the training should modify both adapter weights, but sadly only one of the adapters is modified during training
NB: I'm not interested in activating both adapters at the same time at any point
How should I proceed ?
Thanks :)
@muellerzr @ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Not provided at this point.
Expected behavior
I would expect being able to train multiple adapters on the same model at once. Maybe there is a way which I did not find in the documentation.