Open benjamin-marie opened 3 weeks ago
This sounds like the X-LoRA classifier layers don't have requires_grad=True
. Could you please print all parameter names with requires_grad=True
on your model? What is your base model?
We're still working on a training example for X-LoRA, so it's possible that there are still some kinks that need to be ironed out.
@benjamin-marie thanks for the example. I'll take a look this.
I also observed another bug: The adapters must be named "0", "1", etc in the adapters dict() otherwise training won't start and will say that the adapters don't exist.
Hmm ok, thanks for reporting this, I'll see what could be causing it.
Here is my model (Llama 3.1 8B):
PeftModelForCausalLM(
(base_model): XLoraModel(
(lora_model): LoraModel(
(model): LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): Embedding(128256, 4096)
(layers): ModuleList(
(0-31): 32 x LlamaDecoderLayer(
(self_attn): LlamaFlashAttention2(
(q_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(0): Dropout(p=0.05, inplace=False)
(1): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(0): Linear(in_features=4096, out_features=16, bias=False)
(1): Linear(in_features=4096, out_features=16, bias=False)
)
(lora_B): ModuleDict(
(0): Linear(in_features=16, out_features=4096, bias=False)
(1): Linear(in_features=16, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(k_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=1024, bias=False)
(lora_dropout): ModuleDict(
(0): Dropout(p=0.05, inplace=False)
(1): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(0): Linear(in_features=4096, out_features=16, bias=False)
(1): Linear(in_features=4096, out_features=16, bias=False)
)
(lora_B): ModuleDict(
(0): Linear(in_features=16, out_features=1024, bias=False)
(1): Linear(in_features=16, out_features=1024, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(v_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=1024, bias=False)
(lora_dropout): ModuleDict(
(0): Dropout(p=0.05, inplace=False)
(1): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(0): Linear(in_features=4096, out_features=16, bias=False)
(1): Linear(in_features=4096, out_features=16, bias=False)
)
(lora_B): ModuleDict(
(0): Linear(in_features=16, out_features=1024, bias=False)
(1): Linear(in_features=16, out_features=1024, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(o_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(0): Dropout(p=0.05, inplace=False)
(1): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(0): Linear(in_features=4096, out_features=16, bias=False)
(1): Linear(in_features=4096, out_features=16, bias=False)
)
(lora_B): ModuleDict(
(0): Linear(in_features=16, out_features=4096, bias=False)
(1): Linear(in_features=16, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(rotary_emb): LlamaRotaryEmbedding()
)
(mlp): LlamaMLP(
(gate_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=14336, bias=False)
(lora_dropout): ModuleDict(
(0): Dropout(p=0.05, inplace=False)
(1): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(0): Linear(in_features=4096, out_features=16, bias=False)
(1): Linear(in_features=4096, out_features=16, bias=False)
)
(lora_B): ModuleDict(
(0): Linear(in_features=16, out_features=14336, bias=False)
(1): Linear(in_features=16, out_features=14336, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(up_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=14336, bias=False)
(lora_dropout): ModuleDict(
(0): Dropout(p=0.05, inplace=False)
(1): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(0): Linear(in_features=4096, out_features=16, bias=False)
(1): Linear(in_features=4096, out_features=16, bias=False)
)
(lora_B): ModuleDict(
(0): Linear(in_features=16, out_features=14336, bias=False)
(1): Linear(in_features=16, out_features=14336, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(down_proj): lora.Linear(
(base_layer): Linear(in_features=14336, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(0): Dropout(p=0.05, inplace=False)
(1): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(0): Linear(in_features=14336, out_features=16, bias=False)
(1): Linear(in_features=14336, out_features=16, bias=False)
)
(lora_B): ModuleDict(
(0): Linear(in_features=16, out_features=4096, bias=False)
(1): Linear(in_features=16, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(act_fn): SiLU()
)
(input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
(post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
)
)
(norm): LlamaRMSNorm((4096,), eps=1e-05)
(rotary_emb): LlamaRotaryEmbedding()
)
(lm_head): Linear(in_features=4096, out_features=128256, bias=False)
)
)
(internal_xlora_classifier): XLoraClassifier(
(softmax): TemperatureScaledSoftmax(
(softmax): Softmax(dim=-1)
)
(layers): Sequential(
(0): Linear(in_features=4096, out_features=2048, bias=True)
(1): ReLU()
(2): Dropout(p=0.2, inplace=False)
(3): Linear(in_features=2048, out_features=2048, bias=True)
(4): ReLU()
(5): Dropout(p=0.2, inplace=False)
(6): Linear(in_features=2048, out_features=2048, bias=True)
(7): ReLU()
(8): Dropout(p=0.2, inplace=False)
(9): Linear(in_features=2048, out_features=2048, bias=True)
(10): ReLU()
(11): Dropout(p=0.2, inplace=False)
(12): Linear(in_features=2048, out_features=2048, bias=True)
(13): ReLU()
(14): Dropout(p=0.2, inplace=False)
(15): Linear(in_features=2048, out_features=2048, bias=True)
(16): ReLU()
(17): Dropout(p=0.2, inplace=False)
(18): Linear(in_features=2048, out_features=2048, bias=True)
(19): ReLU()
(20): Dropout(p=0.2, inplace=False)
(21): Linear(in_features=2048, out_features=448, bias=True)
)
)
)
)
Could you please print all parameter names with requires_grad=True on your model?
Sure, how do you do this? None of the params seems to have a "requires_grad" but I'm not sure whether I did it right.
how do you do this
First of all, you can run model.print_trainable_parameters()
for a global overview. Then something like this should do:
for name, param in model.named_parameters():
if param.requires_grad:
print(name)
I added this code:
print(xlora_model.print_trainable_parameters())
print("--- Require grad? ----")
for name, param in model.named_parameters():
if param.requires_grad:
print(name)
print("----------------------")
It prints:
trainable params: 118,372,800 || all params: 8,148,634,048 || trainable%: 1.4527
None
--- Require grad? ----
model.layers.0.self_attn.q_proj.lora_A.0.weight
model.layers.0.self_attn.q_proj.lora_A.1.weight
model.layers.0.self_attn.q_proj.lora_B.0.weight
model.layers.0.self_attn.q_proj.lora_B.1.weight
model.layers.0.self_attn.k_proj.lora_A.0.weight
model.layers.0.self_attn.k_proj.lora_A.1.weight
model.layers.0.self_attn.k_proj.lora_B.0.weight
model.layers.0.self_attn.k_proj.lora_B.1.weight
model.layers.0.self_attn.v_proj.lora_A.0.weight
model.layers.0.self_attn.v_proj.lora_A.1.weight
model.layers.0.self_attn.v_proj.lora_B.0.weight
model.layers.0.self_attn.v_proj.lora_B.1.weight
model.layers.0.self_attn.o_proj.lora_A.0.weight
model.layers.0.self_attn.o_proj.lora_A.1.weight
model.layers.0.self_attn.o_proj.lora_B.0.weight
model.layers.0.self_attn.o_proj.lora_B.1.weight
model.layers.0.mlp.gate_proj.lora_A.0.weight
model.layers.0.mlp.gate_proj.lora_A.1.weight
model.layers.0.mlp.gate_proj.lora_B.0.weight
model.layers.0.mlp.gate_proj.lora_B.1.weight
model.layers.0.mlp.up_proj.lora_A.0.weight
model.layers.0.mlp.up_proj.lora_A.1.weight
model.layers.0.mlp.up_proj.lora_B.0.weight
model.layers.0.mlp.up_proj.lora_B.1.weight
model.layers.0.mlp.down_proj.lora_A.0.weight
model.layers.0.mlp.down_proj.lora_A.1.weight
model.layers.0.mlp.down_proj.lora_B.0.weight
model.layers.0.mlp.down_proj.lora_B.1.weight
model.layers.1.self_attn.q_proj.lora_A.0.weight
model.layers.1.self_attn.q_proj.lora_A.1.weight
model.layers.1.self_attn.q_proj.lora_B.0.weight
model.layers.1.self_attn.q_proj.lora_B.1.weight
model.layers.1.self_attn.k_proj.lora_A.0.weight
model.layers.1.self_attn.k_proj.lora_A.1.weight
model.layers.1.self_attn.k_proj.lora_B.0.weight
model.layers.1.self_attn.k_proj.lora_B.1.weight
model.layers.1.self_attn.v_proj.lora_A.0.weight
model.layers.1.self_attn.v_proj.lora_A.1.weight
model.layers.1.self_attn.v_proj.lora_B.0.weight
model.layers.1.self_attn.v_proj.lora_B.1.weight
model.layers.1.self_attn.o_proj.lora_A.0.weight
model.layers.1.self_attn.o_proj.lora_A.1.weight
model.layers.1.self_attn.o_proj.lora_B.0.weight
model.layers.1.self_attn.o_proj.lora_B.1.weight
model.layers.1.mlp.gate_proj.lora_A.0.weight
model.layers.1.mlp.gate_proj.lora_A.1.weight
model.layers.1.mlp.gate_proj.lora_B.0.weight
model.layers.1.mlp.gate_proj.lora_B.1.weight
model.layers.1.mlp.up_proj.lora_A.0.weight
model.layers.1.mlp.up_proj.lora_A.1.weight
model.layers.1.mlp.up_proj.lora_B.0.weight
model.layers.1.mlp.up_proj.lora_B.1.weight
model.layers.1.mlp.down_proj.lora_A.0.weight
model.layers.1.mlp.down_proj.lora_A.1.weight
model.layers.1.mlp.down_proj.lora_B.0.weight
model.layers.1.mlp.down_proj.lora_B.1.weight
model.layers.2.self_attn.q_proj.lora_A.0.weight
model.layers.2.self_attn.q_proj.lora_A.1.weight
model.layers.2.self_attn.q_proj.lora_B.0.weight
model.layers.2.self_attn.q_proj.lora_B.1.weight
model.layers.2.self_attn.k_proj.lora_A.0.weight
model.layers.2.self_attn.k_proj.lora_A.1.weight
model.layers.2.self_attn.k_proj.lora_B.0.weight
model.layers.2.self_attn.k_proj.lora_B.1.weight
model.layers.2.self_attn.v_proj.lora_A.0.weight
model.layers.2.self_attn.v_proj.lora_A.1.weight
model.layers.2.self_attn.v_proj.lora_B.0.weight
model.layers.2.self_attn.v_proj.lora_B.1.weight
model.layers.2.self_attn.o_proj.lora_A.0.weight
model.layers.2.self_attn.o_proj.lora_A.1.weight
model.layers.2.self_attn.o_proj.lora_B.0.weight
model.layers.2.self_attn.o_proj.lora_B.1.weight
model.layers.2.mlp.gate_proj.lora_A.0.weight
model.layers.2.mlp.gate_proj.lora_A.1.weight
model.layers.2.mlp.gate_proj.lora_B.0.weight
model.layers.2.mlp.gate_proj.lora_B.1.weight
model.layers.2.mlp.up_proj.lora_A.0.weight
model.layers.2.mlp.up_proj.lora_A.1.weight
model.layers.2.mlp.up_proj.lora_B.0.weight
model.layers.2.mlp.up_proj.lora_B.1.weight
model.layers.2.mlp.down_proj.lora_A.0.weight
model.layers.2.mlp.down_proj.lora_A.1.weight
model.layers.2.mlp.down_proj.lora_B.0.weight
model.layers.2.mlp.down_proj.lora_B.1.weight
model.layers.3.self_attn.q_proj.lora_A.0.weight
model.layers.3.self_attn.q_proj.lora_A.1.weight
model.layers.3.self_attn.q_proj.lora_B.0.weight
model.layers.3.self_attn.q_proj.lora_B.1.weight
model.layers.3.self_attn.k_proj.lora_A.0.weight
model.layers.3.self_attn.k_proj.lora_A.1.weight
model.layers.3.self_attn.k_proj.lora_B.0.weight
model.layers.3.self_attn.k_proj.lora_B.1.weight
model.layers.3.self_attn.v_proj.lora_A.0.weight
model.layers.3.self_attn.v_proj.lora_A.1.weight
model.layers.3.self_attn.v_proj.lora_B.0.weight
model.layers.3.self_attn.v_proj.lora_B.1.weight
model.layers.3.self_attn.o_proj.lora_A.0.weight
model.layers.3.self_attn.o_proj.lora_A.1.weight
model.layers.3.self_attn.o_proj.lora_B.0.weight
model.layers.3.self_attn.o_proj.lora_B.1.weight
model.layers.3.mlp.gate_proj.lora_A.0.weight
model.layers.3.mlp.gate_proj.lora_A.1.weight
model.layers.3.mlp.gate_proj.lora_B.0.weight
model.layers.3.mlp.gate_proj.lora_B.1.weight
model.layers.3.mlp.up_proj.lora_A.0.weight
model.layers.3.mlp.up_proj.lora_A.1.weight
model.layers.3.mlp.up_proj.lora_B.0.weight
model.layers.3.mlp.up_proj.lora_B.1.weight
model.layers.3.mlp.down_proj.lora_A.0.weight
model.layers.3.mlp.down_proj.lora_A.1.weight
model.layers.3.mlp.down_proj.lora_B.0.weight
model.layers.3.mlp.down_proj.lora_B.1.weight
model.layers.4.self_attn.q_proj.lora_A.0.weight
model.layers.4.self_attn.q_proj.lora_A.1.weight
model.layers.4.self_attn.q_proj.lora_B.0.weight
model.layers.4.self_attn.q_proj.lora_B.1.weight
model.layers.4.self_attn.k_proj.lora_A.0.weight
model.layers.4.self_attn.k_proj.lora_A.1.weight
model.layers.4.self_attn.k_proj.lora_B.0.weight
model.layers.4.self_attn.k_proj.lora_B.1.weight
model.layers.4.self_attn.v_proj.lora_A.0.weight
model.layers.4.self_attn.v_proj.lora_A.1.weight
model.layers.4.self_attn.v_proj.lora_B.0.weight
model.layers.4.self_attn.v_proj.lora_B.1.weight
model.layers.4.self_attn.o_proj.lora_A.0.weight
model.layers.4.self_attn.o_proj.lora_A.1.weight
model.layers.4.self_attn.o_proj.lora_B.0.weight
model.layers.4.self_attn.o_proj.lora_B.1.weight
model.layers.4.mlp.gate_proj.lora_A.0.weight
model.layers.4.mlp.gate_proj.lora_A.1.weight
model.layers.4.mlp.gate_proj.lora_B.0.weight
model.layers.4.mlp.gate_proj.lora_B.1.weight
model.layers.4.mlp.up_proj.lora_A.0.weight
model.layers.4.mlp.up_proj.lora_A.1.weight
model.layers.4.mlp.up_proj.lora_B.0.weight
model.layers.4.mlp.up_proj.lora_B.1.weight
model.layers.4.mlp.down_proj.lora_A.0.weight
model.layers.4.mlp.down_proj.lora_A.1.weight
model.layers.4.mlp.down_proj.lora_B.0.weight
model.layers.4.mlp.down_proj.lora_B.1.weight
model.layers.5.self_attn.q_proj.lora_A.0.weight
model.layers.5.self_attn.q_proj.lora_A.1.weight
model.layers.5.self_attn.q_proj.lora_B.0.weight
model.layers.5.self_attn.q_proj.lora_B.1.weight
model.layers.5.self_attn.k_proj.lora_A.0.weight
model.layers.5.self_attn.k_proj.lora_A.1.weight
model.layers.5.self_attn.k_proj.lora_B.0.weight
model.layers.5.self_attn.k_proj.lora_B.1.weight
model.layers.5.self_attn.v_proj.lora_A.0.weight
model.layers.5.self_attn.v_proj.lora_A.1.weight
model.layers.5.self_attn.v_proj.lora_B.0.weight
model.layers.5.self_attn.v_proj.lora_B.1.weight
model.layers.5.self_attn.o_proj.lora_A.0.weight
model.layers.5.self_attn.o_proj.lora_A.1.weight
model.layers.5.self_attn.o_proj.lora_B.0.weight
model.layers.5.self_attn.o_proj.lora_B.1.weight
model.layers.5.mlp.gate_proj.lora_A.0.weight
model.layers.5.mlp.gate_proj.lora_A.1.weight
model.layers.5.mlp.gate_proj.lora_B.0.weight
model.layers.5.mlp.gate_proj.lora_B.1.weight
model.layers.5.mlp.up_proj.lora_A.0.weight
model.layers.5.mlp.up_proj.lora_A.1.weight
model.layers.5.mlp.up_proj.lora_B.0.weight
model.layers.5.mlp.up_proj.lora_B.1.weight
model.layers.5.mlp.down_proj.lora_A.0.weight
model.layers.5.mlp.down_proj.lora_A.1.weight
model.layers.5.mlp.down_proj.lora_B.0.weight
model.layers.5.mlp.down_proj.lora_B.1.weight
model.layers.6.self_attn.q_proj.lora_A.0.weight
model.layers.6.self_attn.q_proj.lora_A.1.weight
model.layers.6.self_attn.q_proj.lora_B.0.weight
model.layers.6.self_attn.q_proj.lora_B.1.weight
model.layers.6.self_attn.k_proj.lora_A.0.weight
model.layers.6.self_attn.k_proj.lora_A.1.weight
model.layers.6.self_attn.k_proj.lora_B.0.weight
model.layers.6.self_attn.k_proj.lora_B.1.weight
model.layers.6.self_attn.v_proj.lora_A.0.weight
model.layers.6.self_attn.v_proj.lora_A.1.weight
model.layers.6.self_attn.v_proj.lora_B.0.weight
model.layers.6.self_attn.v_proj.lora_B.1.weight
model.layers.6.self_attn.o_proj.lora_A.0.weight
model.layers.6.self_attn.o_proj.lora_A.1.weight
model.layers.6.self_attn.o_proj.lora_B.0.weight
model.layers.6.self_attn.o_proj.lora_B.1.weight
model.layers.6.mlp.gate_proj.lora_A.0.weight
model.layers.6.mlp.gate_proj.lora_A.1.weight
model.layers.6.mlp.gate_proj.lora_B.0.weight
model.layers.6.mlp.gate_proj.lora_B.1.weight
model.layers.6.mlp.up_proj.lora_A.0.weight
model.layers.6.mlp.up_proj.lora_A.1.weight
model.layers.6.mlp.up_proj.lora_B.0.weight
model.layers.6.mlp.up_proj.lora_B.1.weight
model.layers.6.mlp.down_proj.lora_A.0.weight
model.layers.6.mlp.down_proj.lora_A.1.weight
model.layers.6.mlp.down_proj.lora_B.0.weight
model.layers.6.mlp.down_proj.lora_B.1.weight
model.layers.7.self_attn.q_proj.lora_A.0.weight
model.layers.7.self_attn.q_proj.lora_A.1.weight
model.layers.7.self_attn.q_proj.lora_B.0.weight
model.layers.7.self_attn.q_proj.lora_B.1.weight
model.layers.7.self_attn.k_proj.lora_A.0.weight
model.layers.7.self_attn.k_proj.lora_A.1.weight
model.layers.7.self_attn.k_proj.lora_B.0.weight
model.layers.7.self_attn.k_proj.lora_B.1.weight
model.layers.7.self_attn.v_proj.lora_A.0.weight
model.layers.7.self_attn.v_proj.lora_A.1.weight
model.layers.7.self_attn.v_proj.lora_B.0.weight
model.layers.7.self_attn.v_proj.lora_B.1.weight
model.layers.7.self_attn.o_proj.lora_A.0.weight
model.layers.7.self_attn.o_proj.lora_A.1.weight
model.layers.7.self_attn.o_proj.lora_B.0.weight
model.layers.7.self_attn.o_proj.lora_B.1.weight
model.layers.7.mlp.gate_proj.lora_A.0.weight
model.layers.7.mlp.gate_proj.lora_A.1.weight
model.layers.7.mlp.gate_proj.lora_B.0.weight
model.layers.7.mlp.gate_proj.lora_B.1.weight
model.layers.7.mlp.up_proj.lora_A.0.weight
model.layers.7.mlp.up_proj.lora_A.1.weight
model.layers.7.mlp.up_proj.lora_B.0.weight
model.layers.7.mlp.up_proj.lora_B.1.weight
model.layers.7.mlp.down_proj.lora_A.0.weight
model.layers.7.mlp.down_proj.lora_A.1.weight
model.layers.7.mlp.down_proj.lora_B.0.weight
model.layers.7.mlp.down_proj.lora_B.1.weight
model.layers.8.self_attn.q_proj.lora_A.0.weight
model.layers.8.self_attn.q_proj.lora_A.1.weight
model.layers.8.self_attn.q_proj.lora_B.0.weight
model.layers.8.self_attn.q_proj.lora_B.1.weight
model.layers.8.self_attn.k_proj.lora_A.0.weight
model.layers.8.self_attn.k_proj.lora_A.1.weight
model.layers.8.self_attn.k_proj.lora_B.0.weight
model.layers.8.self_attn.k_proj.lora_B.1.weight
model.layers.8.self_attn.v_proj.lora_A.0.weight
model.layers.8.self_attn.v_proj.lora_A.1.weight
model.layers.8.self_attn.v_proj.lora_B.0.weight
model.layers.8.self_attn.v_proj.lora_B.1.weight
model.layers.8.self_attn.o_proj.lora_A.0.weight
model.layers.8.self_attn.o_proj.lora_A.1.weight
model.layers.8.self_attn.o_proj.lora_B.0.weight
model.layers.8.self_attn.o_proj.lora_B.1.weight
model.layers.8.mlp.gate_proj.lora_A.0.weight
model.layers.8.mlp.gate_proj.lora_A.1.weight
model.layers.8.mlp.gate_proj.lora_B.0.weight
model.layers.8.mlp.gate_proj.lora_B.1.weight
model.layers.8.mlp.up_proj.lora_A.0.weight
model.layers.8.mlp.up_proj.lora_A.1.weight
model.layers.8.mlp.up_proj.lora_B.0.weight
model.layers.8.mlp.up_proj.lora_B.1.weight
model.layers.8.mlp.down_proj.lora_A.0.weight
model.layers.8.mlp.down_proj.lora_A.1.weight
model.layers.8.mlp.down_proj.lora_B.0.weight
model.layers.8.mlp.down_proj.lora_B.1.weight
model.layers.9.self_attn.q_proj.lora_A.0.weight
model.layers.9.self_attn.q_proj.lora_A.1.weight
model.layers.9.self_attn.q_proj.lora_B.0.weight
model.layers.9.self_attn.q_proj.lora_B.1.weight
model.layers.9.self_attn.k_proj.lora_A.0.weight
model.layers.9.self_attn.k_proj.lora_A.1.weight
model.layers.9.self_attn.k_proj.lora_B.0.weight
model.layers.9.self_attn.k_proj.lora_B.1.weight
model.layers.9.self_attn.v_proj.lora_A.0.weight
model.layers.9.self_attn.v_proj.lora_A.1.weight
model.layers.9.self_attn.v_proj.lora_B.0.weight
model.layers.9.self_attn.v_proj.lora_B.1.weight
model.layers.9.self_attn.o_proj.lora_A.0.weight
model.layers.9.self_attn.o_proj.lora_A.1.weight
model.layers.9.self_attn.o_proj.lora_B.0.weight
model.layers.9.self_attn.o_proj.lora_B.1.weight
model.layers.9.mlp.gate_proj.lora_A.0.weight
model.layers.9.mlp.gate_proj.lora_A.1.weight
model.layers.9.mlp.gate_proj.lora_B.0.weight
model.layers.9.mlp.gate_proj.lora_B.1.weight
model.layers.9.mlp.up_proj.lora_A.0.weight
model.layers.9.mlp.up_proj.lora_A.1.weight
model.layers.9.mlp.up_proj.lora_B.0.weight
model.layers.9.mlp.up_proj.lora_B.1.weight
model.layers.9.mlp.down_proj.lora_A.0.weight
model.layers.9.mlp.down_proj.lora_A.1.weight
model.layers.9.mlp.down_proj.lora_B.0.weight
model.layers.9.mlp.down_proj.lora_B.1.weight
model.layers.10.self_attn.q_proj.lora_A.0.weight
model.layers.10.self_attn.q_proj.lora_A.1.weight
model.layers.10.self_attn.q_proj.lora_B.0.weight
model.layers.10.self_attn.q_proj.lora_B.1.weight
model.layers.10.self_attn.k_proj.lora_A.0.weight
model.layers.10.self_attn.k_proj.lora_A.1.weight
model.layers.10.self_attn.k_proj.lora_B.0.weight
model.layers.10.self_attn.k_proj.lora_B.1.weight
model.layers.10.self_attn.v_proj.lora_A.0.weight
model.layers.10.self_attn.v_proj.lora_A.1.weight
model.layers.10.self_attn.v_proj.lora_B.0.weight
model.layers.10.self_attn.v_proj.lora_B.1.weight
model.layers.10.self_attn.o_proj.lora_A.0.weight
model.layers.10.self_attn.o_proj.lora_A.1.weight
model.layers.10.self_attn.o_proj.lora_B.0.weight
model.layers.10.self_attn.o_proj.lora_B.1.weight
model.layers.10.mlp.gate_proj.lora_A.0.weight
model.layers.10.mlp.gate_proj.lora_A.1.weight
model.layers.10.mlp.gate_proj.lora_B.0.weight
model.layers.10.mlp.gate_proj.lora_B.1.weight
model.layers.10.mlp.up_proj.lora_A.0.weight
model.layers.10.mlp.up_proj.lora_A.1.weight
model.layers.10.mlp.up_proj.lora_B.0.weight
model.layers.10.mlp.up_proj.lora_B.1.weight
model.layers.10.mlp.down_proj.lora_A.0.weight
model.layers.10.mlp.down_proj.lora_A.1.weight
model.layers.10.mlp.down_proj.lora_B.0.weight
model.layers.10.mlp.down_proj.lora_B.1.weight
model.layers.11.self_attn.q_proj.lora_A.0.weight
model.layers.11.self_attn.q_proj.lora_A.1.weight
model.layers.11.self_attn.q_proj.lora_B.0.weight
model.layers.11.self_attn.q_proj.lora_B.1.weight
model.layers.11.self_attn.k_proj.lora_A.0.weight
model.layers.11.self_attn.k_proj.lora_A.1.weight
model.layers.11.self_attn.k_proj.lora_B.0.weight
model.layers.11.self_attn.k_proj.lora_B.1.weight
model.layers.11.self_attn.v_proj.lora_A.0.weight
model.layers.11.self_attn.v_proj.lora_A.1.weight
model.layers.11.self_attn.v_proj.lora_B.0.weight
model.layers.11.self_attn.v_proj.lora_B.1.weight
model.layers.11.self_attn.o_proj.lora_A.0.weight
model.layers.11.self_attn.o_proj.lora_A.1.weight
model.layers.11.self_attn.o_proj.lora_B.0.weight
model.layers.11.self_attn.o_proj.lora_B.1.weight
model.layers.11.mlp.gate_proj.lora_A.0.weight
model.layers.11.mlp.gate_proj.lora_A.1.weight
model.layers.11.mlp.gate_proj.lora_B.0.weight
model.layers.11.mlp.gate_proj.lora_B.1.weight
model.layers.11.mlp.up_proj.lora_A.0.weight
model.layers.11.mlp.up_proj.lora_A.1.weight
model.layers.11.mlp.up_proj.lora_B.0.weight
model.layers.11.mlp.up_proj.lora_B.1.weight
model.layers.11.mlp.down_proj.lora_A.0.weight
model.layers.11.mlp.down_proj.lora_A.1.weight
model.layers.11.mlp.down_proj.lora_B.0.weight
model.layers.11.mlp.down_proj.lora_B.1.weight
model.layers.12.self_attn.q_proj.lora_A.0.weight
model.layers.12.self_attn.q_proj.lora_A.1.weight
model.layers.12.self_attn.q_proj.lora_B.0.weight
model.layers.12.self_attn.q_proj.lora_B.1.weight
model.layers.12.self_attn.k_proj.lora_A.0.weight
model.layers.12.self_attn.k_proj.lora_A.1.weight
model.layers.12.self_attn.k_proj.lora_B.0.weight
model.layers.12.self_attn.k_proj.lora_B.1.weight
model.layers.12.self_attn.v_proj.lora_A.0.weight
model.layers.12.self_attn.v_proj.lora_A.1.weight
model.layers.12.self_attn.v_proj.lora_B.0.weight
model.layers.12.self_attn.v_proj.lora_B.1.weight
model.layers.12.self_attn.o_proj.lora_A.0.weight
model.layers.12.self_attn.o_proj.lora_A.1.weight
model.layers.12.self_attn.o_proj.lora_B.0.weight
model.layers.12.self_attn.o_proj.lora_B.1.weight
model.layers.12.mlp.gate_proj.lora_A.0.weight
model.layers.12.mlp.gate_proj.lora_A.1.weight
model.layers.12.mlp.gate_proj.lora_B.0.weight
model.layers.12.mlp.gate_proj.lora_B.1.weight
model.layers.12.mlp.up_proj.lora_A.0.weight
model.layers.12.mlp.up_proj.lora_A.1.weight
model.layers.12.mlp.up_proj.lora_B.0.weight
model.layers.12.mlp.up_proj.lora_B.1.weight
model.layers.12.mlp.down_proj.lora_A.0.weight
model.layers.12.mlp.down_proj.lora_A.1.weight
model.layers.12.mlp.down_proj.lora_B.0.weight
model.layers.12.mlp.down_proj.lora_B.1.weight
model.layers.13.self_attn.q_proj.lora_A.0.weight
model.layers.13.self_attn.q_proj.lora_A.1.weight
model.layers.13.self_attn.q_proj.lora_B.0.weight
model.layers.13.self_attn.q_proj.lora_B.1.weight
model.layers.13.self_attn.k_proj.lora_A.0.weight
model.layers.13.self_attn.k_proj.lora_A.1.weight
model.layers.13.self_attn.k_proj.lora_B.0.weight
model.layers.13.self_attn.k_proj.lora_B.1.weight
model.layers.13.self_attn.v_proj.lora_A.0.weight
model.layers.13.self_attn.v_proj.lora_A.1.weight
model.layers.13.self_attn.v_proj.lora_B.0.weight
model.layers.13.self_attn.v_proj.lora_B.1.weight
model.layers.13.self_attn.o_proj.lora_A.0.weight
model.layers.13.self_attn.o_proj.lora_A.1.weight
model.layers.13.self_attn.o_proj.lora_B.0.weight
model.layers.13.self_attn.o_proj.lora_B.1.weight
model.layers.13.mlp.gate_proj.lora_A.0.weight
model.layers.13.mlp.gate_proj.lora_A.1.weight
model.layers.13.mlp.gate_proj.lora_B.0.weight
model.layers.13.mlp.gate_proj.lora_B.1.weight
model.layers.13.mlp.up_proj.lora_A.0.weight
model.layers.13.mlp.up_proj.lora_A.1.weight
model.layers.13.mlp.up_proj.lora_B.0.weight
model.layers.13.mlp.up_proj.lora_B.1.weight
model.layers.13.mlp.down_proj.lora_A.0.weight
model.layers.13.mlp.down_proj.lora_A.1.weight
model.layers.13.mlp.down_proj.lora_B.0.weight
model.layers.13.mlp.down_proj.lora_B.1.weight
model.layers.14.self_attn.q_proj.lora_A.0.weight
model.layers.14.self_attn.q_proj.lora_A.1.weight
model.layers.14.self_attn.q_proj.lora_B.0.weight
model.layers.14.self_attn.q_proj.lora_B.1.weight
model.layers.14.self_attn.k_proj.lora_A.0.weight
model.layers.14.self_attn.k_proj.lora_A.1.weight
model.layers.14.self_attn.k_proj.lora_B.0.weight
model.layers.14.self_attn.k_proj.lora_B.1.weight
model.layers.14.self_attn.v_proj.lora_A.0.weight
model.layers.14.self_attn.v_proj.lora_A.1.weight
model.layers.14.self_attn.v_proj.lora_B.0.weight
model.layers.14.self_attn.v_proj.lora_B.1.weight
model.layers.14.self_attn.o_proj.lora_A.0.weight
model.layers.14.self_attn.o_proj.lora_A.1.weight
model.layers.14.self_attn.o_proj.lora_B.0.weight
model.layers.14.self_attn.o_proj.lora_B.1.weight
model.layers.14.mlp.gate_proj.lora_A.0.weight
model.layers.14.mlp.gate_proj.lora_A.1.weight
model.layers.14.mlp.gate_proj.lora_B.0.weight
model.layers.14.mlp.gate_proj.lora_B.1.weight
model.layers.14.mlp.up_proj.lora_A.0.weight
model.layers.14.mlp.up_proj.lora_A.1.weight
model.layers.14.mlp.up_proj.lora_B.0.weight
model.layers.14.mlp.up_proj.lora_B.1.weight
model.layers.14.mlp.down_proj.lora_A.0.weight
model.layers.14.mlp.down_proj.lora_A.1.weight
model.layers.14.mlp.down_proj.lora_B.0.weight
model.layers.14.mlp.down_proj.lora_B.1.weight
model.layers.15.self_attn.q_proj.lora_A.0.weight
model.layers.15.self_attn.q_proj.lora_A.1.weight
model.layers.15.self_attn.q_proj.lora_B.0.weight
model.layers.15.self_attn.q_proj.lora_B.1.weight
model.layers.15.self_attn.k_proj.lora_A.0.weight
model.layers.15.self_attn.k_proj.lora_A.1.weight
model.layers.15.self_attn.k_proj.lora_B.0.weight
model.layers.15.self_attn.k_proj.lora_B.1.weight
model.layers.15.self_attn.v_proj.lora_A.0.weight
model.layers.15.self_attn.v_proj.lora_A.1.weight
model.layers.15.self_attn.v_proj.lora_B.0.weight
model.layers.15.self_attn.v_proj.lora_B.1.weight
model.layers.15.self_attn.o_proj.lora_A.0.weight
model.layers.15.self_attn.o_proj.lora_A.1.weight
model.layers.15.self_attn.o_proj.lora_B.0.weight
model.layers.15.self_attn.o_proj.lora_B.1.weight
model.layers.15.mlp.gate_proj.lora_A.0.weight
model.layers.15.mlp.gate_proj.lora_A.1.weight
model.layers.15.mlp.gate_proj.lora_B.0.weight
model.layers.15.mlp.gate_proj.lora_B.1.weight
model.layers.15.mlp.up_proj.lora_A.0.weight
model.layers.15.mlp.up_proj.lora_A.1.weight
model.layers.15.mlp.up_proj.lora_B.0.weight
model.layers.15.mlp.up_proj.lora_B.1.weight
model.layers.15.mlp.down_proj.lora_A.0.weight
model.layers.15.mlp.down_proj.lora_A.1.weight
model.layers.15.mlp.down_proj.lora_B.0.weight
model.layers.15.mlp.down_proj.lora_B.1.weight
model.layers.16.self_attn.q_proj.lora_A.0.weight
model.layers.16.self_attn.q_proj.lora_A.1.weight
model.layers.16.self_attn.q_proj.lora_B.0.weight
model.layers.16.self_attn.q_proj.lora_B.1.weight
model.layers.16.self_attn.k_proj.lora_A.0.weight
model.layers.16.self_attn.k_proj.lora_A.1.weight
model.layers.16.self_attn.k_proj.lora_B.0.weight
model.layers.16.self_attn.k_proj.lora_B.1.weight
model.layers.16.self_attn.v_proj.lora_A.0.weight
model.layers.16.self_attn.v_proj.lora_A.1.weight
model.layers.16.self_attn.v_proj.lora_B.0.weight
model.layers.16.self_attn.v_proj.lora_B.1.weight
model.layers.16.self_attn.o_proj.lora_A.0.weight
model.layers.16.self_attn.o_proj.lora_A.1.weight
model.layers.16.self_attn.o_proj.lora_B.0.weight
model.layers.16.self_attn.o_proj.lora_B.1.weight
model.layers.16.mlp.gate_proj.lora_A.0.weight
model.layers.16.mlp.gate_proj.lora_A.1.weight
model.layers.16.mlp.gate_proj.lora_B.0.weight
model.layers.16.mlp.gate_proj.lora_B.1.weight
model.layers.16.mlp.up_proj.lora_A.0.weight
model.layers.16.mlp.up_proj.lora_A.1.weight
model.layers.16.mlp.up_proj.lora_B.0.weight
model.layers.16.mlp.up_proj.lora_B.1.weight
model.layers.16.mlp.down_proj.lora_A.0.weight
model.layers.16.mlp.down_proj.lora_A.1.weight
model.layers.16.mlp.down_proj.lora_B.0.weight
model.layers.16.mlp.down_proj.lora_B.1.weight
model.layers.17.self_attn.q_proj.lora_A.0.weight
model.layers.17.self_attn.q_proj.lora_A.1.weight
model.layers.17.self_attn.q_proj.lora_B.0.weight
model.layers.17.self_attn.q_proj.lora_B.1.weight
model.layers.17.self_attn.k_proj.lora_A.0.weight
model.layers.17.self_attn.k_proj.lora_A.1.weight
model.layers.17.self_attn.k_proj.lora_B.0.weight
model.layers.17.self_attn.k_proj.lora_B.1.weight
model.layers.17.self_attn.v_proj.lora_A.0.weight
model.layers.17.self_attn.v_proj.lora_A.1.weight
model.layers.17.self_attn.v_proj.lora_B.0.weight
model.layers.17.self_attn.v_proj.lora_B.1.weight
model.layers.17.self_attn.o_proj.lora_A.0.weight
model.layers.17.self_attn.o_proj.lora_A.1.weight
model.layers.17.self_attn.o_proj.lora_B.0.weight
model.layers.17.self_attn.o_proj.lora_B.1.weight
model.layers.17.mlp.gate_proj.lora_A.0.weight
model.layers.17.mlp.gate_proj.lora_A.1.weight
model.layers.17.mlp.gate_proj.lora_B.0.weight
model.layers.17.mlp.gate_proj.lora_B.1.weight
model.layers.17.mlp.up_proj.lora_A.0.weight
model.layers.17.mlp.up_proj.lora_A.1.weight
model.layers.17.mlp.up_proj.lora_B.0.weight
model.layers.17.mlp.up_proj.lora_B.1.weight
model.layers.17.mlp.down_proj.lora_A.0.weight
model.layers.17.mlp.down_proj.lora_A.1.weight
model.layers.17.mlp.down_proj.lora_B.0.weight
model.layers.17.mlp.down_proj.lora_B.1.weight
model.layers.18.self_attn.q_proj.lora_A.0.weight
model.layers.18.self_attn.q_proj.lora_A.1.weight
model.layers.18.self_attn.q_proj.lora_B.0.weight
model.layers.18.self_attn.q_proj.lora_B.1.weight
model.layers.18.self_attn.k_proj.lora_A.0.weight
model.layers.18.self_attn.k_proj.lora_A.1.weight
model.layers.18.self_attn.k_proj.lora_B.0.weight
model.layers.18.self_attn.k_proj.lora_B.1.weight
model.layers.18.self_attn.v_proj.lora_A.0.weight
model.layers.18.self_attn.v_proj.lora_A.1.weight
model.layers.18.self_attn.v_proj.lora_B.0.weight
model.layers.18.self_attn.v_proj.lora_B.1.weight
model.layers.18.self_attn.o_proj.lora_A.0.weight
model.layers.18.self_attn.o_proj.lora_A.1.weight
model.layers.18.self_attn.o_proj.lora_B.0.weight
model.layers.18.self_attn.o_proj.lora_B.1.weight
model.layers.18.mlp.gate_proj.lora_A.0.weight
model.layers.18.mlp.gate_proj.lora_A.1.weight
model.layers.18.mlp.gate_proj.lora_B.0.weight
model.layers.18.mlp.gate_proj.lora_B.1.weight
model.layers.18.mlp.up_proj.lora_A.0.weight
model.layers.18.mlp.up_proj.lora_A.1.weight
model.layers.18.mlp.up_proj.lora_B.0.weight
model.layers.18.mlp.up_proj.lora_B.1.weight
model.layers.18.mlp.down_proj.lora_A.0.weight
model.layers.18.mlp.down_proj.lora_A.1.weight
model.layers.18.mlp.down_proj.lora_B.0.weight
model.layers.18.mlp.down_proj.lora_B.1.weight
model.layers.19.self_attn.q_proj.lora_A.0.weight
model.layers.19.self_attn.q_proj.lora_A.1.weight
model.layers.19.self_attn.q_proj.lora_B.0.weight
model.layers.19.self_attn.q_proj.lora_B.1.weight
model.layers.19.self_attn.k_proj.lora_A.0.weight
model.layers.19.self_attn.k_proj.lora_A.1.weight
model.layers.19.self_attn.k_proj.lora_B.0.weight
model.layers.19.self_attn.k_proj.lora_B.1.weight
model.layers.19.self_attn.v_proj.lora_A.0.weight
model.layers.19.self_attn.v_proj.lora_A.1.weight
model.layers.19.self_attn.v_proj.lora_B.0.weight
model.layers.19.self_attn.v_proj.lora_B.1.weight
model.layers.19.self_attn.o_proj.lora_A.0.weight
model.layers.19.self_attn.o_proj.lora_A.1.weight
model.layers.19.self_attn.o_proj.lora_B.0.weight
model.layers.19.self_attn.o_proj.lora_B.1.weight
model.layers.19.mlp.gate_proj.lora_A.0.weight
model.layers.19.mlp.gate_proj.lora_A.1.weight
model.layers.19.mlp.gate_proj.lora_B.0.weight
model.layers.19.mlp.gate_proj.lora_B.1.weight
model.layers.19.mlp.up_proj.lora_A.0.weight
model.layers.19.mlp.up_proj.lora_A.1.weight
model.layers.19.mlp.up_proj.lora_B.0.weight
model.layers.19.mlp.up_proj.lora_B.1.weight
model.layers.19.mlp.down_proj.lora_A.0.weight
model.layers.19.mlp.down_proj.lora_A.1.weight
model.layers.19.mlp.down_proj.lora_B.0.weight
model.layers.19.mlp.down_proj.lora_B.1.weight
model.layers.20.self_attn.q_proj.lora_A.0.weight
model.layers.20.self_attn.q_proj.lora_A.1.weight
model.layers.20.self_attn.q_proj.lora_B.0.weight
model.layers.20.self_attn.q_proj.lora_B.1.weight
model.layers.20.self_attn.k_proj.lora_A.0.weight
model.layers.20.self_attn.k_proj.lora_A.1.weight
model.layers.20.self_attn.k_proj.lora_B.0.weight
model.layers.20.self_attn.k_proj.lora_B.1.weight
model.layers.20.self_attn.v_proj.lora_A.0.weight
model.layers.20.self_attn.v_proj.lora_A.1.weight
model.layers.20.self_attn.v_proj.lora_B.0.weight
model.layers.20.self_attn.v_proj.lora_B.1.weight
model.layers.20.self_attn.o_proj.lora_A.0.weight
model.layers.20.self_attn.o_proj.lora_A.1.weight
model.layers.20.self_attn.o_proj.lora_B.0.weight
model.layers.20.self_attn.o_proj.lora_B.1.weight
model.layers.20.mlp.gate_proj.lora_A.0.weight
model.layers.20.mlp.gate_proj.lora_A.1.weight
model.layers.20.mlp.gate_proj.lora_B.0.weight
model.layers.20.mlp.gate_proj.lora_B.1.weight
model.layers.20.mlp.up_proj.lora_A.0.weight
model.layers.20.mlp.up_proj.lora_A.1.weight
model.layers.20.mlp.up_proj.lora_B.0.weight
model.layers.20.mlp.up_proj.lora_B.1.weight
model.layers.20.mlp.down_proj.lora_A.0.weight
model.layers.20.mlp.down_proj.lora_A.1.weight
model.layers.20.mlp.down_proj.lora_B.0.weight
model.layers.20.mlp.down_proj.lora_B.1.weight
model.layers.21.self_attn.q_proj.lora_A.0.weight
model.layers.21.self_attn.q_proj.lora_A.1.weight
model.layers.21.self_attn.q_proj.lora_B.0.weight
model.layers.21.self_attn.q_proj.lora_B.1.weight
model.layers.21.self_attn.k_proj.lora_A.0.weight
model.layers.21.self_attn.k_proj.lora_A.1.weight
model.layers.21.self_attn.k_proj.lora_B.0.weight
model.layers.21.self_attn.k_proj.lora_B.1.weight
model.layers.21.self_attn.v_proj.lora_A.0.weight
model.layers.21.self_attn.v_proj.lora_A.1.weight
model.layers.21.self_attn.v_proj.lora_B.0.weight
model.layers.21.self_attn.v_proj.lora_B.1.weight
model.layers.21.self_attn.o_proj.lora_A.0.weight
model.layers.21.self_attn.o_proj.lora_A.1.weight
model.layers.21.self_attn.o_proj.lora_B.0.weight
model.layers.21.self_attn.o_proj.lora_B.1.weight
model.layers.21.mlp.gate_proj.lora_A.0.weight
model.layers.21.mlp.gate_proj.lora_A.1.weight
model.layers.21.mlp.gate_proj.lora_B.0.weight
model.layers.21.mlp.gate_proj.lora_B.1.weight
model.layers.21.mlp.up_proj.lora_A.0.weight
model.layers.21.mlp.up_proj.lora_A.1.weight
model.layers.21.mlp.up_proj.lora_B.0.weight
model.layers.21.mlp.up_proj.lora_B.1.weight
model.layers.21.mlp.down_proj.lora_A.0.weight
model.layers.21.mlp.down_proj.lora_A.1.weight
model.layers.21.mlp.down_proj.lora_B.0.weight
model.layers.21.mlp.down_proj.lora_B.1.weight
model.layers.22.self_attn.q_proj.lora_A.0.weight
model.layers.22.self_attn.q_proj.lora_A.1.weight
model.layers.22.self_attn.q_proj.lora_B.0.weight
model.layers.22.self_attn.q_proj.lora_B.1.weight
model.layers.22.self_attn.k_proj.lora_A.0.weight
model.layers.22.self_attn.k_proj.lora_A.1.weight
model.layers.22.self_attn.k_proj.lora_B.0.weight
model.layers.22.self_attn.k_proj.lora_B.1.weight
model.layers.22.self_attn.v_proj.lora_A.0.weight
model.layers.22.self_attn.v_proj.lora_A.1.weight
model.layers.22.self_attn.v_proj.lora_B.0.weight
model.layers.22.self_attn.v_proj.lora_B.1.weight
model.layers.22.self_attn.o_proj.lora_A.0.weight
model.layers.22.self_attn.o_proj.lora_A.1.weight
model.layers.22.self_attn.o_proj.lora_B.0.weight
model.layers.22.self_attn.o_proj.lora_B.1.weight
model.layers.22.mlp.gate_proj.lora_A.0.weight
model.layers.22.mlp.gate_proj.lora_A.1.weight
model.layers.22.mlp.gate_proj.lora_B.0.weight
model.layers.22.mlp.gate_proj.lora_B.1.weight
model.layers.22.mlp.up_proj.lora_A.0.weight
model.layers.22.mlp.up_proj.lora_A.1.weight
model.layers.22.mlp.up_proj.lora_B.0.weight
model.layers.22.mlp.up_proj.lora_B.1.weight
model.layers.22.mlp.down_proj.lora_A.0.weight
model.layers.22.mlp.down_proj.lora_A.1.weight
model.layers.22.mlp.down_proj.lora_B.0.weight
model.layers.22.mlp.down_proj.lora_B.1.weight
model.layers.23.self_attn.q_proj.lora_A.0.weight
model.layers.23.self_attn.q_proj.lora_A.1.weight
model.layers.23.self_attn.q_proj.lora_B.0.weight
model.layers.23.self_attn.q_proj.lora_B.1.weight
model.layers.23.self_attn.k_proj.lora_A.0.weight
model.layers.23.self_attn.k_proj.lora_A.1.weight
model.layers.23.self_attn.k_proj.lora_B.0.weight
model.layers.23.self_attn.k_proj.lora_B.1.weight
model.layers.23.self_attn.v_proj.lora_A.0.weight
model.layers.23.self_attn.v_proj.lora_A.1.weight
model.layers.23.self_attn.v_proj.lora_B.0.weight
model.layers.23.self_attn.v_proj.lora_B.1.weight
model.layers.23.self_attn.o_proj.lora_A.0.weight
model.layers.23.self_attn.o_proj.lora_A.1.weight
model.layers.23.self_attn.o_proj.lora_B.0.weight
model.layers.23.self_attn.o_proj.lora_B.1.weight
model.layers.23.mlp.gate_proj.lora_A.0.weight
model.layers.23.mlp.gate_proj.lora_A.1.weight
model.layers.23.mlp.gate_proj.lora_B.0.weight
model.layers.23.mlp.gate_proj.lora_B.1.weight
model.layers.23.mlp.up_proj.lora_A.0.weight
model.layers.23.mlp.up_proj.lora_A.1.weight
model.layers.23.mlp.up_proj.lora_B.0.weight
model.layers.23.mlp.up_proj.lora_B.1.weight
model.layers.23.mlp.down_proj.lora_A.0.weight
model.layers.23.mlp.down_proj.lora_A.1.weight
model.layers.23.mlp.down_proj.lora_B.0.weight
model.layers.23.mlp.down_proj.lora_B.1.weight
model.layers.24.self_attn.q_proj.lora_A.0.weight
model.layers.24.self_attn.q_proj.lora_A.1.weight
model.layers.24.self_attn.q_proj.lora_B.0.weight
model.layers.24.self_attn.q_proj.lora_B.1.weight
model.layers.24.self_attn.k_proj.lora_A.0.weight
model.layers.24.self_attn.k_proj.lora_A.1.weight
model.layers.24.self_attn.k_proj.lora_B.0.weight
model.layers.24.self_attn.k_proj.lora_B.1.weight
model.layers.24.self_attn.v_proj.lora_A.0.weight
model.layers.24.self_attn.v_proj.lora_A.1.weight
model.layers.24.self_attn.v_proj.lora_B.0.weight
model.layers.24.self_attn.v_proj.lora_B.1.weight
model.layers.24.self_attn.o_proj.lora_A.0.weight
model.layers.24.self_attn.o_proj.lora_A.1.weight
model.layers.24.self_attn.o_proj.lora_B.0.weight
model.layers.24.self_attn.o_proj.lora_B.1.weight
model.layers.24.mlp.gate_proj.lora_A.0.weight
model.layers.24.mlp.gate_proj.lora_A.1.weight
model.layers.24.mlp.gate_proj.lora_B.0.weight
model.layers.24.mlp.gate_proj.lora_B.1.weight
model.layers.24.mlp.up_proj.lora_A.0.weight
model.layers.24.mlp.up_proj.lora_A.1.weight
model.layers.24.mlp.up_proj.lora_B.0.weight
model.layers.24.mlp.up_proj.lora_B.1.weight
model.layers.24.mlp.down_proj.lora_A.0.weight
model.layers.24.mlp.down_proj.lora_A.1.weight
model.layers.24.mlp.down_proj.lora_B.0.weight
model.layers.24.mlp.down_proj.lora_B.1.weight
model.layers.25.self_attn.q_proj.lora_A.0.weight
model.layers.25.self_attn.q_proj.lora_A.1.weight
model.layers.25.self_attn.q_proj.lora_B.0.weight
model.layers.25.self_attn.q_proj.lora_B.1.weight
model.layers.25.self_attn.k_proj.lora_A.0.weight
model.layers.25.self_attn.k_proj.lora_A.1.weight
model.layers.25.self_attn.k_proj.lora_B.0.weight
model.layers.25.self_attn.k_proj.lora_B.1.weight
model.layers.25.self_attn.v_proj.lora_A.0.weight
model.layers.25.self_attn.v_proj.lora_A.1.weight
model.layers.25.self_attn.v_proj.lora_B.0.weight
model.layers.25.self_attn.v_proj.lora_B.1.weight
model.layers.25.self_attn.o_proj.lora_A.0.weight
model.layers.25.self_attn.o_proj.lora_A.1.weight
model.layers.25.self_attn.o_proj.lora_B.0.weight
model.layers.25.self_attn.o_proj.lora_B.1.weight
model.layers.25.mlp.gate_proj.lora_A.0.weight
model.layers.25.mlp.gate_proj.lora_A.1.weight
model.layers.25.mlp.gate_proj.lora_B.0.weight
model.layers.25.mlp.gate_proj.lora_B.1.weight
model.layers.25.mlp.up_proj.lora_A.0.weight
model.layers.25.mlp.up_proj.lora_A.1.weight
model.layers.25.mlp.up_proj.lora_B.0.weight
model.layers.25.mlp.up_proj.lora_B.1.weight
model.layers.25.mlp.down_proj.lora_A.0.weight
model.layers.25.mlp.down_proj.lora_A.1.weight
model.layers.25.mlp.down_proj.lora_B.0.weight
model.layers.25.mlp.down_proj.lora_B.1.weight
model.layers.26.self_attn.q_proj.lora_A.0.weight
model.layers.26.self_attn.q_proj.lora_A.1.weight
model.layers.26.self_attn.q_proj.lora_B.0.weight
model.layers.26.self_attn.q_proj.lora_B.1.weight
model.layers.26.self_attn.k_proj.lora_A.0.weight
model.layers.26.self_attn.k_proj.lora_A.1.weight
model.layers.26.self_attn.k_proj.lora_B.0.weight
model.layers.26.self_attn.k_proj.lora_B.1.weight
model.layers.26.self_attn.v_proj.lora_A.0.weight
model.layers.26.self_attn.v_proj.lora_A.1.weight
model.layers.26.self_attn.v_proj.lora_B.0.weight
model.layers.26.self_attn.v_proj.lora_B.1.weight
model.layers.26.self_attn.o_proj.lora_A.0.weight
model.layers.26.self_attn.o_proj.lora_A.1.weight
model.layers.26.self_attn.o_proj.lora_B.0.weight
model.layers.26.self_attn.o_proj.lora_B.1.weight
model.layers.26.mlp.gate_proj.lora_A.0.weight
model.layers.26.mlp.gate_proj.lora_A.1.weight
model.layers.26.mlp.gate_proj.lora_B.0.weight
model.layers.26.mlp.gate_proj.lora_B.1.weight
model.layers.26.mlp.up_proj.lora_A.0.weight
model.layers.26.mlp.up_proj.lora_A.1.weight
model.layers.26.mlp.up_proj.lora_B.0.weight
model.layers.26.mlp.up_proj.lora_B.1.weight
model.layers.26.mlp.down_proj.lora_A.0.weight
model.layers.26.mlp.down_proj.lora_A.1.weight
model.layers.26.mlp.down_proj.lora_B.0.weight
model.layers.26.mlp.down_proj.lora_B.1.weight
model.layers.27.self_attn.q_proj.lora_A.0.weight
model.layers.27.self_attn.q_proj.lora_A.1.weight
model.layers.27.self_attn.q_proj.lora_B.0.weight
model.layers.27.self_attn.q_proj.lora_B.1.weight
model.layers.27.self_attn.k_proj.lora_A.0.weight
model.layers.27.self_attn.k_proj.lora_A.1.weight
model.layers.27.self_attn.k_proj.lora_B.0.weight
model.layers.27.self_attn.k_proj.lora_B.1.weight
model.layers.27.self_attn.v_proj.lora_A.0.weight
model.layers.27.self_attn.v_proj.lora_A.1.weight
model.layers.27.self_attn.v_proj.lora_B.0.weight
model.layers.27.self_attn.v_proj.lora_B.1.weight
model.layers.27.self_attn.o_proj.lora_A.0.weight
model.layers.27.self_attn.o_proj.lora_A.1.weight
model.layers.27.self_attn.o_proj.lora_B.0.weight
model.layers.27.self_attn.o_proj.lora_B.1.weight
model.layers.27.mlp.gate_proj.lora_A.0.weight
model.layers.27.mlp.gate_proj.lora_A.1.weight
model.layers.27.mlp.gate_proj.lora_B.0.weight
model.layers.27.mlp.gate_proj.lora_B.1.weight
model.layers.27.mlp.up_proj.lora_A.0.weight
model.layers.27.mlp.up_proj.lora_A.1.weight
model.layers.27.mlp.up_proj.lora_B.0.weight
model.layers.27.mlp.up_proj.lora_B.1.weight
model.layers.27.mlp.down_proj.lora_A.0.weight
model.layers.27.mlp.down_proj.lora_A.1.weight
model.layers.27.mlp.down_proj.lora_B.0.weight
model.layers.27.mlp.down_proj.lora_B.1.weight
model.layers.28.self_attn.q_proj.lora_A.0.weight
model.layers.28.self_attn.q_proj.lora_A.1.weight
model.layers.28.self_attn.q_proj.lora_B.0.weight
model.layers.28.self_attn.q_proj.lora_B.1.weight
model.layers.28.self_attn.k_proj.lora_A.0.weight
model.layers.28.self_attn.k_proj.lora_A.1.weight
model.layers.28.self_attn.k_proj.lora_B.0.weight
model.layers.28.self_attn.k_proj.lora_B.1.weight
model.layers.28.self_attn.v_proj.lora_A.0.weight
model.layers.28.self_attn.v_proj.lora_A.1.weight
model.layers.28.self_attn.v_proj.lora_B.0.weight
model.layers.28.self_attn.v_proj.lora_B.1.weight
model.layers.28.self_attn.o_proj.lora_A.0.weight
model.layers.28.self_attn.o_proj.lora_A.1.weight
model.layers.28.self_attn.o_proj.lora_B.0.weight
model.layers.28.self_attn.o_proj.lora_B.1.weight
model.layers.28.mlp.gate_proj.lora_A.0.weight
model.layers.28.mlp.gate_proj.lora_A.1.weight
model.layers.28.mlp.gate_proj.lora_B.0.weight
model.layers.28.mlp.gate_proj.lora_B.1.weight
model.layers.28.mlp.up_proj.lora_A.0.weight
model.layers.28.mlp.up_proj.lora_A.1.weight
model.layers.28.mlp.up_proj.lora_B.0.weight
model.layers.28.mlp.up_proj.lora_B.1.weight
model.layers.28.mlp.down_proj.lora_A.0.weight
model.layers.28.mlp.down_proj.lora_A.1.weight
model.layers.28.mlp.down_proj.lora_B.0.weight
model.layers.28.mlp.down_proj.lora_B.1.weight
model.layers.29.self_attn.q_proj.lora_A.0.weight
model.layers.29.self_attn.q_proj.lora_A.1.weight
model.layers.29.self_attn.q_proj.lora_B.0.weight
model.layers.29.self_attn.q_proj.lora_B.1.weight
model.layers.29.self_attn.k_proj.lora_A.0.weight
model.layers.29.self_attn.k_proj.lora_A.1.weight
model.layers.29.self_attn.k_proj.lora_B.0.weight
model.layers.29.self_attn.k_proj.lora_B.1.weight
model.layers.29.self_attn.v_proj.lora_A.0.weight
model.layers.29.self_attn.v_proj.lora_A.1.weight
model.layers.29.self_attn.v_proj.lora_B.0.weight
model.layers.29.self_attn.v_proj.lora_B.1.weight
model.layers.29.self_attn.o_proj.lora_A.0.weight
model.layers.29.self_attn.o_proj.lora_A.1.weight
model.layers.29.self_attn.o_proj.lora_B.0.weight
model.layers.29.self_attn.o_proj.lora_B.1.weight
model.layers.29.mlp.gate_proj.lora_A.0.weight
model.layers.29.mlp.gate_proj.lora_A.1.weight
model.layers.29.mlp.gate_proj.lora_B.0.weight
model.layers.29.mlp.gate_proj.lora_B.1.weight
model.layers.29.mlp.up_proj.lora_A.0.weight
model.layers.29.mlp.up_proj.lora_A.1.weight
model.layers.29.mlp.up_proj.lora_B.0.weight
model.layers.29.mlp.up_proj.lora_B.1.weight
model.layers.29.mlp.down_proj.lora_A.0.weight
model.layers.29.mlp.down_proj.lora_A.1.weight
model.layers.29.mlp.down_proj.lora_B.0.weight
model.layers.29.mlp.down_proj.lora_B.1.weight
model.layers.30.self_attn.q_proj.lora_A.0.weight
model.layers.30.self_attn.q_proj.lora_A.1.weight
model.layers.30.self_attn.q_proj.lora_B.0.weight
model.layers.30.self_attn.q_proj.lora_B.1.weight
model.layers.30.self_attn.k_proj.lora_A.0.weight
model.layers.30.self_attn.k_proj.lora_A.1.weight
model.layers.30.self_attn.k_proj.lora_B.0.weight
model.layers.30.self_attn.k_proj.lora_B.1.weight
model.layers.30.self_attn.v_proj.lora_A.0.weight
model.layers.30.self_attn.v_proj.lora_A.1.weight
model.layers.30.self_attn.v_proj.lora_B.0.weight
model.layers.30.self_attn.v_proj.lora_B.1.weight
model.layers.30.self_attn.o_proj.lora_A.0.weight
model.layers.30.self_attn.o_proj.lora_A.1.weight
model.layers.30.self_attn.o_proj.lora_B.0.weight
model.layers.30.self_attn.o_proj.lora_B.1.weight
model.layers.30.mlp.gate_proj.lora_A.0.weight
model.layers.30.mlp.gate_proj.lora_A.1.weight
model.layers.30.mlp.gate_proj.lora_B.0.weight
model.layers.30.mlp.gate_proj.lora_B.1.weight
model.layers.30.mlp.up_proj.lora_A.0.weight
model.layers.30.mlp.up_proj.lora_A.1.weight
model.layers.30.mlp.up_proj.lora_B.0.weight
model.layers.30.mlp.up_proj.lora_B.1.weight
model.layers.30.mlp.down_proj.lora_A.0.weight
model.layers.30.mlp.down_proj.lora_A.1.weight
model.layers.30.mlp.down_proj.lora_B.0.weight
model.layers.30.mlp.down_proj.lora_B.1.weight
model.layers.31.self_attn.q_proj.lora_A.0.weight
model.layers.31.self_attn.q_proj.lora_A.1.weight
model.layers.31.self_attn.q_proj.lora_B.0.weight
model.layers.31.self_attn.q_proj.lora_B.1.weight
model.layers.31.self_attn.k_proj.lora_A.0.weight
model.layers.31.self_attn.k_proj.lora_A.1.weight
model.layers.31.self_attn.k_proj.lora_B.0.weight
model.layers.31.self_attn.k_proj.lora_B.1.weight
model.layers.31.self_attn.v_proj.lora_A.0.weight
model.layers.31.self_attn.v_proj.lora_A.1.weight
model.layers.31.self_attn.v_proj.lora_B.0.weight
model.layers.31.self_attn.v_proj.lora_B.1.weight
model.layers.31.self_attn.o_proj.lora_A.0.weight
model.layers.31.self_attn.o_proj.lora_A.1.weight
model.layers.31.self_attn.o_proj.lora_B.0.weight
model.layers.31.self_attn.o_proj.lora_B.1.weight
model.layers.31.mlp.gate_proj.lora_A.0.weight
model.layers.31.mlp.gate_proj.lora_A.1.weight
model.layers.31.mlp.gate_proj.lora_B.0.weight
model.layers.31.mlp.gate_proj.lora_B.1.weight
model.layers.31.mlp.up_proj.lora_A.0.weight
model.layers.31.mlp.up_proj.lora_A.1.weight
model.layers.31.mlp.up_proj.lora_B.0.weight
model.layers.31.mlp.up_proj.lora_B.1.weight
model.layers.31.mlp.down_proj.lora_A.0.weight
model.layers.31.mlp.down_proj.lora_A.1.weight
model.layers.31.mlp.down_proj.lora_B.0.weight
model.layers.31.mlp.down_proj.lora_B.1.weight
----------------------
And then, just after that, I run the SFTTrainer, which prints, exactly:
Using auto half precision backend
Currently training with a batch size of: 2
***** Running training *****
Num examples = 1,053
Num Epochs = 1
Instantaneous batch size per device = 2
Total train batch size (w. parallel, distributed & accumulation) = 32
Gradient Accumulation steps = 16
Total optimization steps = 32
Number of trainable parameters = 118,372,800
Detected flash_attn version: 2.6.3
/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:91: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
Thanks @benjamin-marie. The internal_xlora_classifier
does not appear among the trainable parameters, whereas the LoRAs should be frozen, right @EricLBuehler?
Yes, exactly. I'll try to reproduce and fix this!
I installed PEFT from source. And use the latest versions of Transformers and TRL. I passed the XLoRA model to TRL but the training doesn't seem to work (training loss doesn't decrease and validation loss remains constant). I get this warning: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
I load Llama 3.1 (without quantization) and then run this code:
I also observed another bug: The adapters must be named "0", "1", etc in the adapters dict() otherwise training won't start and will say that the adapters don't exist.
Maybe @EricLBuehler can help with this?