Open miguelscarv opened 6 months ago
I've come to the conclusion that this is happening because my lora weights are being set the .requires_grad
attribute to False
, although I am not sure where or why this only happens when I add 3 sets of LoRA adapters to my model
@miguelscarv I am trying to attempt a similar pipeline as you where I have multiple adapters. Have you found a solution? Thanks!
@jdchang1 Unfortunately I haven't, what I am doing is simply using deepseed stage 0 (DDP)
@jdchang1 Unfortunately I haven't, what I am doing is simply using deepseed stage 0 (DDP)
@miguelscarv I am trying to attempt a similar pipeline as you where I have multiple adapters. Have you found a solution? Thanks!
have you guys found a solution to training multiple LoRAs? I am also doing something similar. Thanks
I'm training a model that involves using multiple LoRA adapters (3 different sets of adapters to be precise). For each input (which in my case is an image) I have to pass one version through one set of LoRA parameters, another version of the image through another set of LoRA and the final version of the image through the final set of LoRA adapters. This consumes a lot of memory in the form of the autograd graph, I believe.
What is happening is that when I use the 3 sets of LoRA adapters I get errors in DeepSpeed claiming the the grad parameters are
None
. Here is the traceback using zero stage 3:and here is the traceback using zero stage 2:
My problem is very similar to https://github.com/microsoft/DeepSpeed/issues/700#issue-795318541, because when using zero stage 1 I get no issue, but I really need to use stage 2.
System info