huggingface / peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
https://huggingface.co/docs/peft
Apache License 2.0
15.81k stars 1.53k forks source link

XLoRA: training issues, Gradients will be None #2015

Open benjamin-marie opened 3 weeks ago

benjamin-marie commented 3 weeks ago

I installed PEFT from source. And use the latest versions of Transformers and TRL. I passed the XLoRA model to TRL but the training doesn't seem to work (training loss doesn't decrease and validation loss remains constant). I get this warning: UserWarning: None of the inputs have requires_grad=True. Gradients will be None

I load Llama 3.1 (without quantization) and then run this code:

adapters = dict()
adapters["0"] = './adapter1/'
adapters["1"] = './adapter2/'

peft_config = XLoraConfig(
  task_type=TaskType.CAUSAL_LM,
  peft_type=PeftType.XLORA,
  hidden_size=model.config.hidden_size,
  xlora_depth=8,
  adapters=adapters,
  xlora_size=2048,
  layerwise_scalings=True,
  xlora_dropout_p=0.2
)

xlora_model = get_peft_model(model, peft_config)

training_arguments = SFTConfig(
        output_dir="./output/",
        optim="paged_adamw_8bit",
        per_device_train_batch_size=2,
        gradient_accumulation_steps=16,
        save_strategy="epoch",
        log_level="debug",
        logging_steps=1,
        learning_rate=1e-5,
        bf16 = True,
        num_train_epochs=1,
        warmup_ratio=0.1,
        lr_scheduler_type="linear",
        dataset_text_field="text",
        max_seq_length=512,
)

trainer = SFTTrainer(
        model=xlora_model,
        train_dataset=ds,
        tokenizer=tokenizer,
        args=training_arguments,
)

trainer.train()

I also observed another bug: The adapters must be named "0", "1", etc in the adapters dict() otherwise training won't start and will say that the adapters don't exist.

Maybe @EricLBuehler can help with this?

BenjaminBossan commented 3 weeks ago

This sounds like the X-LoRA classifier layers don't have requires_grad=True. Could you please print all parameter names with requires_grad=True on your model? What is your base model?

We're still working on a training example for X-LoRA, so it's possible that there are still some kinks that need to be ironed out.

EricLBuehler commented 3 weeks ago

@benjamin-marie thanks for the example. I'll take a look this.

I also observed another bug: The adapters must be named "0", "1", etc in the adapters dict() otherwise training won't start and will say that the adapters don't exist.

Hmm ok, thanks for reporting this, I'll see what could be causing it.

benjamin-marie commented 3 weeks ago

Here is my model (Llama 3.1 8B):

PeftModelForCausalLM(
  (base_model): XLoraModel(
    (lora_model): LoraModel(
      (model): LlamaForCausalLM(
        (model): LlamaModel(
          (embed_tokens): Embedding(128256, 4096)
          (layers): ModuleList(
            (0-31): 32 x LlamaDecoderLayer(
              (self_attn): LlamaFlashAttention2(
                (q_proj): lora.Linear(
                  (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
                  (lora_dropout): ModuleDict(
                    (0): Dropout(p=0.05, inplace=False)
                    (1): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (0): Linear(in_features=4096, out_features=16, bias=False)
                    (1): Linear(in_features=4096, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (0): Linear(in_features=16, out_features=4096, bias=False)
                    (1): Linear(in_features=16, out_features=4096, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                  (lora_magnitude_vector): ModuleDict()
                )
                (k_proj): lora.Linear(
                  (base_layer): Linear(in_features=4096, out_features=1024, bias=False)
                  (lora_dropout): ModuleDict(
                    (0): Dropout(p=0.05, inplace=False)
                    (1): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (0): Linear(in_features=4096, out_features=16, bias=False)
                    (1): Linear(in_features=4096, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (0): Linear(in_features=16, out_features=1024, bias=False)
                    (1): Linear(in_features=16, out_features=1024, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                  (lora_magnitude_vector): ModuleDict()
                )
                (v_proj): lora.Linear(
                  (base_layer): Linear(in_features=4096, out_features=1024, bias=False)
                  (lora_dropout): ModuleDict(
                    (0): Dropout(p=0.05, inplace=False)
                    (1): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (0): Linear(in_features=4096, out_features=16, bias=False)
                    (1): Linear(in_features=4096, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (0): Linear(in_features=16, out_features=1024, bias=False)
                    (1): Linear(in_features=16, out_features=1024, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                  (lora_magnitude_vector): ModuleDict()
                )
                (o_proj): lora.Linear(
                  (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
                  (lora_dropout): ModuleDict(
                    (0): Dropout(p=0.05, inplace=False)
                    (1): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (0): Linear(in_features=4096, out_features=16, bias=False)
                    (1): Linear(in_features=4096, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (0): Linear(in_features=16, out_features=4096, bias=False)
                    (1): Linear(in_features=16, out_features=4096, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                  (lora_magnitude_vector): ModuleDict()
                )
                (rotary_emb): LlamaRotaryEmbedding()
              )
              (mlp): LlamaMLP(
                (gate_proj): lora.Linear(
                  (base_layer): Linear(in_features=4096, out_features=14336, bias=False)
                  (lora_dropout): ModuleDict(
                    (0): Dropout(p=0.05, inplace=False)
                    (1): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (0): Linear(in_features=4096, out_features=16, bias=False)
                    (1): Linear(in_features=4096, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (0): Linear(in_features=16, out_features=14336, bias=False)
                    (1): Linear(in_features=16, out_features=14336, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                  (lora_magnitude_vector): ModuleDict()
                )
                (up_proj): lora.Linear(
                  (base_layer): Linear(in_features=4096, out_features=14336, bias=False)
                  (lora_dropout): ModuleDict(
                    (0): Dropout(p=0.05, inplace=False)
                    (1): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (0): Linear(in_features=4096, out_features=16, bias=False)
                    (1): Linear(in_features=4096, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (0): Linear(in_features=16, out_features=14336, bias=False)
                    (1): Linear(in_features=16, out_features=14336, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                  (lora_magnitude_vector): ModuleDict()
                )
                (down_proj): lora.Linear(
                  (base_layer): Linear(in_features=14336, out_features=4096, bias=False)
                  (lora_dropout): ModuleDict(
                    (0): Dropout(p=0.05, inplace=False)
                    (1): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (0): Linear(in_features=14336, out_features=16, bias=False)
                    (1): Linear(in_features=14336, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (0): Linear(in_features=16, out_features=4096, bias=False)
                    (1): Linear(in_features=16, out_features=4096, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                  (lora_magnitude_vector): ModuleDict()
                )
                (act_fn): SiLU()
              )
              (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
              (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
            )
          )
          (norm): LlamaRMSNorm((4096,), eps=1e-05)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (lm_head): Linear(in_features=4096, out_features=128256, bias=False)
      )
    )
    (internal_xlora_classifier): XLoraClassifier(
      (softmax): TemperatureScaledSoftmax(
        (softmax): Softmax(dim=-1)
      )
      (layers): Sequential(
        (0): Linear(in_features=4096, out_features=2048, bias=True)
        (1): ReLU()
        (2): Dropout(p=0.2, inplace=False)
        (3): Linear(in_features=2048, out_features=2048, bias=True)
        (4): ReLU()
        (5): Dropout(p=0.2, inplace=False)
        (6): Linear(in_features=2048, out_features=2048, bias=True)
        (7): ReLU()
        (8): Dropout(p=0.2, inplace=False)
        (9): Linear(in_features=2048, out_features=2048, bias=True)
        (10): ReLU()
        (11): Dropout(p=0.2, inplace=False)
        (12): Linear(in_features=2048, out_features=2048, bias=True)
        (13): ReLU()
        (14): Dropout(p=0.2, inplace=False)
        (15): Linear(in_features=2048, out_features=2048, bias=True)
        (16): ReLU()
        (17): Dropout(p=0.2, inplace=False)
        (18): Linear(in_features=2048, out_features=2048, bias=True)
        (19): ReLU()
        (20): Dropout(p=0.2, inplace=False)
        (21): Linear(in_features=2048, out_features=448, bias=True)
      )
    )
  )
)

Could you please print all parameter names with requires_grad=True on your model?

Sure, how do you do this? None of the params seems to have a "requires_grad" but I'm not sure whether I did it right.

BenjaminBossan commented 3 weeks ago

how do you do this

First of all, you can run model.print_trainable_parameters() for a global overview. Then something like this should do:

for name, param in model.named_parameters():
    if param.requires_grad:
        print(name)
benjamin-marie commented 3 weeks ago

I added this code:

print(xlora_model.print_trainable_parameters())
print("--- Require grad? ----")
for name, param in model.named_parameters():
    if param.requires_grad:
        print(name)
print("----------------------")

It prints:

trainable params: 118,372,800 || all params: 8,148,634,048 || trainable%: 1.4527
None
--- Require grad? ----
model.layers.0.self_attn.q_proj.lora_A.0.weight
model.layers.0.self_attn.q_proj.lora_A.1.weight
model.layers.0.self_attn.q_proj.lora_B.0.weight
model.layers.0.self_attn.q_proj.lora_B.1.weight
model.layers.0.self_attn.k_proj.lora_A.0.weight
model.layers.0.self_attn.k_proj.lora_A.1.weight
model.layers.0.self_attn.k_proj.lora_B.0.weight
model.layers.0.self_attn.k_proj.lora_B.1.weight
model.layers.0.self_attn.v_proj.lora_A.0.weight
model.layers.0.self_attn.v_proj.lora_A.1.weight
model.layers.0.self_attn.v_proj.lora_B.0.weight
model.layers.0.self_attn.v_proj.lora_B.1.weight
model.layers.0.self_attn.o_proj.lora_A.0.weight
model.layers.0.self_attn.o_proj.lora_A.1.weight
model.layers.0.self_attn.o_proj.lora_B.0.weight
model.layers.0.self_attn.o_proj.lora_B.1.weight
model.layers.0.mlp.gate_proj.lora_A.0.weight
model.layers.0.mlp.gate_proj.lora_A.1.weight
model.layers.0.mlp.gate_proj.lora_B.0.weight
model.layers.0.mlp.gate_proj.lora_B.1.weight
model.layers.0.mlp.up_proj.lora_A.0.weight
model.layers.0.mlp.up_proj.lora_A.1.weight
model.layers.0.mlp.up_proj.lora_B.0.weight
model.layers.0.mlp.up_proj.lora_B.1.weight
model.layers.0.mlp.down_proj.lora_A.0.weight
model.layers.0.mlp.down_proj.lora_A.1.weight
model.layers.0.mlp.down_proj.lora_B.0.weight
model.layers.0.mlp.down_proj.lora_B.1.weight
model.layers.1.self_attn.q_proj.lora_A.0.weight
model.layers.1.self_attn.q_proj.lora_A.1.weight
model.layers.1.self_attn.q_proj.lora_B.0.weight
model.layers.1.self_attn.q_proj.lora_B.1.weight
model.layers.1.self_attn.k_proj.lora_A.0.weight
model.layers.1.self_attn.k_proj.lora_A.1.weight
model.layers.1.self_attn.k_proj.lora_B.0.weight
model.layers.1.self_attn.k_proj.lora_B.1.weight
model.layers.1.self_attn.v_proj.lora_A.0.weight
model.layers.1.self_attn.v_proj.lora_A.1.weight
model.layers.1.self_attn.v_proj.lora_B.0.weight
model.layers.1.self_attn.v_proj.lora_B.1.weight
model.layers.1.self_attn.o_proj.lora_A.0.weight
model.layers.1.self_attn.o_proj.lora_A.1.weight
model.layers.1.self_attn.o_proj.lora_B.0.weight
model.layers.1.self_attn.o_proj.lora_B.1.weight
model.layers.1.mlp.gate_proj.lora_A.0.weight
model.layers.1.mlp.gate_proj.lora_A.1.weight
model.layers.1.mlp.gate_proj.lora_B.0.weight
model.layers.1.mlp.gate_proj.lora_B.1.weight
model.layers.1.mlp.up_proj.lora_A.0.weight
model.layers.1.mlp.up_proj.lora_A.1.weight
model.layers.1.mlp.up_proj.lora_B.0.weight
model.layers.1.mlp.up_proj.lora_B.1.weight
model.layers.1.mlp.down_proj.lora_A.0.weight
model.layers.1.mlp.down_proj.lora_A.1.weight
model.layers.1.mlp.down_proj.lora_B.0.weight
model.layers.1.mlp.down_proj.lora_B.1.weight
model.layers.2.self_attn.q_proj.lora_A.0.weight
model.layers.2.self_attn.q_proj.lora_A.1.weight
model.layers.2.self_attn.q_proj.lora_B.0.weight
model.layers.2.self_attn.q_proj.lora_B.1.weight
model.layers.2.self_attn.k_proj.lora_A.0.weight
model.layers.2.self_attn.k_proj.lora_A.1.weight
model.layers.2.self_attn.k_proj.lora_B.0.weight
model.layers.2.self_attn.k_proj.lora_B.1.weight
model.layers.2.self_attn.v_proj.lora_A.0.weight
model.layers.2.self_attn.v_proj.lora_A.1.weight
model.layers.2.self_attn.v_proj.lora_B.0.weight
model.layers.2.self_attn.v_proj.lora_B.1.weight
model.layers.2.self_attn.o_proj.lora_A.0.weight
model.layers.2.self_attn.o_proj.lora_A.1.weight
model.layers.2.self_attn.o_proj.lora_B.0.weight
model.layers.2.self_attn.o_proj.lora_B.1.weight
model.layers.2.mlp.gate_proj.lora_A.0.weight
model.layers.2.mlp.gate_proj.lora_A.1.weight
model.layers.2.mlp.gate_proj.lora_B.0.weight
model.layers.2.mlp.gate_proj.lora_B.1.weight
model.layers.2.mlp.up_proj.lora_A.0.weight
model.layers.2.mlp.up_proj.lora_A.1.weight
model.layers.2.mlp.up_proj.lora_B.0.weight
model.layers.2.mlp.up_proj.lora_B.1.weight
model.layers.2.mlp.down_proj.lora_A.0.weight
model.layers.2.mlp.down_proj.lora_A.1.weight
model.layers.2.mlp.down_proj.lora_B.0.weight
model.layers.2.mlp.down_proj.lora_B.1.weight
model.layers.3.self_attn.q_proj.lora_A.0.weight
model.layers.3.self_attn.q_proj.lora_A.1.weight
model.layers.3.self_attn.q_proj.lora_B.0.weight
model.layers.3.self_attn.q_proj.lora_B.1.weight
model.layers.3.self_attn.k_proj.lora_A.0.weight
model.layers.3.self_attn.k_proj.lora_A.1.weight
model.layers.3.self_attn.k_proj.lora_B.0.weight
model.layers.3.self_attn.k_proj.lora_B.1.weight
model.layers.3.self_attn.v_proj.lora_A.0.weight
model.layers.3.self_attn.v_proj.lora_A.1.weight
model.layers.3.self_attn.v_proj.lora_B.0.weight
model.layers.3.self_attn.v_proj.lora_B.1.weight
model.layers.3.self_attn.o_proj.lora_A.0.weight
model.layers.3.self_attn.o_proj.lora_A.1.weight
model.layers.3.self_attn.o_proj.lora_B.0.weight
model.layers.3.self_attn.o_proj.lora_B.1.weight
model.layers.3.mlp.gate_proj.lora_A.0.weight
model.layers.3.mlp.gate_proj.lora_A.1.weight
model.layers.3.mlp.gate_proj.lora_B.0.weight
model.layers.3.mlp.gate_proj.lora_B.1.weight
model.layers.3.mlp.up_proj.lora_A.0.weight
model.layers.3.mlp.up_proj.lora_A.1.weight
model.layers.3.mlp.up_proj.lora_B.0.weight
model.layers.3.mlp.up_proj.lora_B.1.weight
model.layers.3.mlp.down_proj.lora_A.0.weight
model.layers.3.mlp.down_proj.lora_A.1.weight
model.layers.3.mlp.down_proj.lora_B.0.weight
model.layers.3.mlp.down_proj.lora_B.1.weight
model.layers.4.self_attn.q_proj.lora_A.0.weight
model.layers.4.self_attn.q_proj.lora_A.1.weight
model.layers.4.self_attn.q_proj.lora_B.0.weight
model.layers.4.self_attn.q_proj.lora_B.1.weight
model.layers.4.self_attn.k_proj.lora_A.0.weight
model.layers.4.self_attn.k_proj.lora_A.1.weight
model.layers.4.self_attn.k_proj.lora_B.0.weight
model.layers.4.self_attn.k_proj.lora_B.1.weight
model.layers.4.self_attn.v_proj.lora_A.0.weight
model.layers.4.self_attn.v_proj.lora_A.1.weight
model.layers.4.self_attn.v_proj.lora_B.0.weight
model.layers.4.self_attn.v_proj.lora_B.1.weight
model.layers.4.self_attn.o_proj.lora_A.0.weight
model.layers.4.self_attn.o_proj.lora_A.1.weight
model.layers.4.self_attn.o_proj.lora_B.0.weight
model.layers.4.self_attn.o_proj.lora_B.1.weight
model.layers.4.mlp.gate_proj.lora_A.0.weight
model.layers.4.mlp.gate_proj.lora_A.1.weight
model.layers.4.mlp.gate_proj.lora_B.0.weight
model.layers.4.mlp.gate_proj.lora_B.1.weight
model.layers.4.mlp.up_proj.lora_A.0.weight
model.layers.4.mlp.up_proj.lora_A.1.weight
model.layers.4.mlp.up_proj.lora_B.0.weight
model.layers.4.mlp.up_proj.lora_B.1.weight
model.layers.4.mlp.down_proj.lora_A.0.weight
model.layers.4.mlp.down_proj.lora_A.1.weight
model.layers.4.mlp.down_proj.lora_B.0.weight
model.layers.4.mlp.down_proj.lora_B.1.weight
model.layers.5.self_attn.q_proj.lora_A.0.weight
model.layers.5.self_attn.q_proj.lora_A.1.weight
model.layers.5.self_attn.q_proj.lora_B.0.weight
model.layers.5.self_attn.q_proj.lora_B.1.weight
model.layers.5.self_attn.k_proj.lora_A.0.weight
model.layers.5.self_attn.k_proj.lora_A.1.weight
model.layers.5.self_attn.k_proj.lora_B.0.weight
model.layers.5.self_attn.k_proj.lora_B.1.weight
model.layers.5.self_attn.v_proj.lora_A.0.weight
model.layers.5.self_attn.v_proj.lora_A.1.weight
model.layers.5.self_attn.v_proj.lora_B.0.weight
model.layers.5.self_attn.v_proj.lora_B.1.weight
model.layers.5.self_attn.o_proj.lora_A.0.weight
model.layers.5.self_attn.o_proj.lora_A.1.weight
model.layers.5.self_attn.o_proj.lora_B.0.weight
model.layers.5.self_attn.o_proj.lora_B.1.weight
model.layers.5.mlp.gate_proj.lora_A.0.weight
model.layers.5.mlp.gate_proj.lora_A.1.weight
model.layers.5.mlp.gate_proj.lora_B.0.weight
model.layers.5.mlp.gate_proj.lora_B.1.weight
model.layers.5.mlp.up_proj.lora_A.0.weight
model.layers.5.mlp.up_proj.lora_A.1.weight
model.layers.5.mlp.up_proj.lora_B.0.weight
model.layers.5.mlp.up_proj.lora_B.1.weight
model.layers.5.mlp.down_proj.lora_A.0.weight
model.layers.5.mlp.down_proj.lora_A.1.weight
model.layers.5.mlp.down_proj.lora_B.0.weight
model.layers.5.mlp.down_proj.lora_B.1.weight
model.layers.6.self_attn.q_proj.lora_A.0.weight
model.layers.6.self_attn.q_proj.lora_A.1.weight
model.layers.6.self_attn.q_proj.lora_B.0.weight
model.layers.6.self_attn.q_proj.lora_B.1.weight
model.layers.6.self_attn.k_proj.lora_A.0.weight
model.layers.6.self_attn.k_proj.lora_A.1.weight
model.layers.6.self_attn.k_proj.lora_B.0.weight
model.layers.6.self_attn.k_proj.lora_B.1.weight
model.layers.6.self_attn.v_proj.lora_A.0.weight
model.layers.6.self_attn.v_proj.lora_A.1.weight
model.layers.6.self_attn.v_proj.lora_B.0.weight
model.layers.6.self_attn.v_proj.lora_B.1.weight
model.layers.6.self_attn.o_proj.lora_A.0.weight
model.layers.6.self_attn.o_proj.lora_A.1.weight
model.layers.6.self_attn.o_proj.lora_B.0.weight
model.layers.6.self_attn.o_proj.lora_B.1.weight
model.layers.6.mlp.gate_proj.lora_A.0.weight
model.layers.6.mlp.gate_proj.lora_A.1.weight
model.layers.6.mlp.gate_proj.lora_B.0.weight
model.layers.6.mlp.gate_proj.lora_B.1.weight
model.layers.6.mlp.up_proj.lora_A.0.weight
model.layers.6.mlp.up_proj.lora_A.1.weight
model.layers.6.mlp.up_proj.lora_B.0.weight
model.layers.6.mlp.up_proj.lora_B.1.weight
model.layers.6.mlp.down_proj.lora_A.0.weight
model.layers.6.mlp.down_proj.lora_A.1.weight
model.layers.6.mlp.down_proj.lora_B.0.weight
model.layers.6.mlp.down_proj.lora_B.1.weight
model.layers.7.self_attn.q_proj.lora_A.0.weight
model.layers.7.self_attn.q_proj.lora_A.1.weight
model.layers.7.self_attn.q_proj.lora_B.0.weight
model.layers.7.self_attn.q_proj.lora_B.1.weight
model.layers.7.self_attn.k_proj.lora_A.0.weight
model.layers.7.self_attn.k_proj.lora_A.1.weight
model.layers.7.self_attn.k_proj.lora_B.0.weight
model.layers.7.self_attn.k_proj.lora_B.1.weight
model.layers.7.self_attn.v_proj.lora_A.0.weight
model.layers.7.self_attn.v_proj.lora_A.1.weight
model.layers.7.self_attn.v_proj.lora_B.0.weight
model.layers.7.self_attn.v_proj.lora_B.1.weight
model.layers.7.self_attn.o_proj.lora_A.0.weight
model.layers.7.self_attn.o_proj.lora_A.1.weight
model.layers.7.self_attn.o_proj.lora_B.0.weight
model.layers.7.self_attn.o_proj.lora_B.1.weight
model.layers.7.mlp.gate_proj.lora_A.0.weight
model.layers.7.mlp.gate_proj.lora_A.1.weight
model.layers.7.mlp.gate_proj.lora_B.0.weight
model.layers.7.mlp.gate_proj.lora_B.1.weight
model.layers.7.mlp.up_proj.lora_A.0.weight
model.layers.7.mlp.up_proj.lora_A.1.weight
model.layers.7.mlp.up_proj.lora_B.0.weight
model.layers.7.mlp.up_proj.lora_B.1.weight
model.layers.7.mlp.down_proj.lora_A.0.weight
model.layers.7.mlp.down_proj.lora_A.1.weight
model.layers.7.mlp.down_proj.lora_B.0.weight
model.layers.7.mlp.down_proj.lora_B.1.weight
model.layers.8.self_attn.q_proj.lora_A.0.weight
model.layers.8.self_attn.q_proj.lora_A.1.weight
model.layers.8.self_attn.q_proj.lora_B.0.weight
model.layers.8.self_attn.q_proj.lora_B.1.weight
model.layers.8.self_attn.k_proj.lora_A.0.weight
model.layers.8.self_attn.k_proj.lora_A.1.weight
model.layers.8.self_attn.k_proj.lora_B.0.weight
model.layers.8.self_attn.k_proj.lora_B.1.weight
model.layers.8.self_attn.v_proj.lora_A.0.weight
model.layers.8.self_attn.v_proj.lora_A.1.weight
model.layers.8.self_attn.v_proj.lora_B.0.weight
model.layers.8.self_attn.v_proj.lora_B.1.weight
model.layers.8.self_attn.o_proj.lora_A.0.weight
model.layers.8.self_attn.o_proj.lora_A.1.weight
model.layers.8.self_attn.o_proj.lora_B.0.weight
model.layers.8.self_attn.o_proj.lora_B.1.weight
model.layers.8.mlp.gate_proj.lora_A.0.weight
model.layers.8.mlp.gate_proj.lora_A.1.weight
model.layers.8.mlp.gate_proj.lora_B.0.weight
model.layers.8.mlp.gate_proj.lora_B.1.weight
model.layers.8.mlp.up_proj.lora_A.0.weight
model.layers.8.mlp.up_proj.lora_A.1.weight
model.layers.8.mlp.up_proj.lora_B.0.weight
model.layers.8.mlp.up_proj.lora_B.1.weight
model.layers.8.mlp.down_proj.lora_A.0.weight
model.layers.8.mlp.down_proj.lora_A.1.weight
model.layers.8.mlp.down_proj.lora_B.0.weight
model.layers.8.mlp.down_proj.lora_B.1.weight
model.layers.9.self_attn.q_proj.lora_A.0.weight
model.layers.9.self_attn.q_proj.lora_A.1.weight
model.layers.9.self_attn.q_proj.lora_B.0.weight
model.layers.9.self_attn.q_proj.lora_B.1.weight
model.layers.9.self_attn.k_proj.lora_A.0.weight
model.layers.9.self_attn.k_proj.lora_A.1.weight
model.layers.9.self_attn.k_proj.lora_B.0.weight
model.layers.9.self_attn.k_proj.lora_B.1.weight
model.layers.9.self_attn.v_proj.lora_A.0.weight
model.layers.9.self_attn.v_proj.lora_A.1.weight
model.layers.9.self_attn.v_proj.lora_B.0.weight
model.layers.9.self_attn.v_proj.lora_B.1.weight
model.layers.9.self_attn.o_proj.lora_A.0.weight
model.layers.9.self_attn.o_proj.lora_A.1.weight
model.layers.9.self_attn.o_proj.lora_B.0.weight
model.layers.9.self_attn.o_proj.lora_B.1.weight
model.layers.9.mlp.gate_proj.lora_A.0.weight
model.layers.9.mlp.gate_proj.lora_A.1.weight
model.layers.9.mlp.gate_proj.lora_B.0.weight
model.layers.9.mlp.gate_proj.lora_B.1.weight
model.layers.9.mlp.up_proj.lora_A.0.weight
model.layers.9.mlp.up_proj.lora_A.1.weight
model.layers.9.mlp.up_proj.lora_B.0.weight
model.layers.9.mlp.up_proj.lora_B.1.weight
model.layers.9.mlp.down_proj.lora_A.0.weight
model.layers.9.mlp.down_proj.lora_A.1.weight
model.layers.9.mlp.down_proj.lora_B.0.weight
model.layers.9.mlp.down_proj.lora_B.1.weight
model.layers.10.self_attn.q_proj.lora_A.0.weight
model.layers.10.self_attn.q_proj.lora_A.1.weight
model.layers.10.self_attn.q_proj.lora_B.0.weight
model.layers.10.self_attn.q_proj.lora_B.1.weight
model.layers.10.self_attn.k_proj.lora_A.0.weight
model.layers.10.self_attn.k_proj.lora_A.1.weight
model.layers.10.self_attn.k_proj.lora_B.0.weight
model.layers.10.self_attn.k_proj.lora_B.1.weight
model.layers.10.self_attn.v_proj.lora_A.0.weight
model.layers.10.self_attn.v_proj.lora_A.1.weight
model.layers.10.self_attn.v_proj.lora_B.0.weight
model.layers.10.self_attn.v_proj.lora_B.1.weight
model.layers.10.self_attn.o_proj.lora_A.0.weight
model.layers.10.self_attn.o_proj.lora_A.1.weight
model.layers.10.self_attn.o_proj.lora_B.0.weight
model.layers.10.self_attn.o_proj.lora_B.1.weight
model.layers.10.mlp.gate_proj.lora_A.0.weight
model.layers.10.mlp.gate_proj.lora_A.1.weight
model.layers.10.mlp.gate_proj.lora_B.0.weight
model.layers.10.mlp.gate_proj.lora_B.1.weight
model.layers.10.mlp.up_proj.lora_A.0.weight
model.layers.10.mlp.up_proj.lora_A.1.weight
model.layers.10.mlp.up_proj.lora_B.0.weight
model.layers.10.mlp.up_proj.lora_B.1.weight
model.layers.10.mlp.down_proj.lora_A.0.weight
model.layers.10.mlp.down_proj.lora_A.1.weight
model.layers.10.mlp.down_proj.lora_B.0.weight
model.layers.10.mlp.down_proj.lora_B.1.weight
model.layers.11.self_attn.q_proj.lora_A.0.weight
model.layers.11.self_attn.q_proj.lora_A.1.weight
model.layers.11.self_attn.q_proj.lora_B.0.weight
model.layers.11.self_attn.q_proj.lora_B.1.weight
model.layers.11.self_attn.k_proj.lora_A.0.weight
model.layers.11.self_attn.k_proj.lora_A.1.weight
model.layers.11.self_attn.k_proj.lora_B.0.weight
model.layers.11.self_attn.k_proj.lora_B.1.weight
model.layers.11.self_attn.v_proj.lora_A.0.weight
model.layers.11.self_attn.v_proj.lora_A.1.weight
model.layers.11.self_attn.v_proj.lora_B.0.weight
model.layers.11.self_attn.v_proj.lora_B.1.weight
model.layers.11.self_attn.o_proj.lora_A.0.weight
model.layers.11.self_attn.o_proj.lora_A.1.weight
model.layers.11.self_attn.o_proj.lora_B.0.weight
model.layers.11.self_attn.o_proj.lora_B.1.weight
model.layers.11.mlp.gate_proj.lora_A.0.weight
model.layers.11.mlp.gate_proj.lora_A.1.weight
model.layers.11.mlp.gate_proj.lora_B.0.weight
model.layers.11.mlp.gate_proj.lora_B.1.weight
model.layers.11.mlp.up_proj.lora_A.0.weight
model.layers.11.mlp.up_proj.lora_A.1.weight
model.layers.11.mlp.up_proj.lora_B.0.weight
model.layers.11.mlp.up_proj.lora_B.1.weight
model.layers.11.mlp.down_proj.lora_A.0.weight
model.layers.11.mlp.down_proj.lora_A.1.weight
model.layers.11.mlp.down_proj.lora_B.0.weight
model.layers.11.mlp.down_proj.lora_B.1.weight
model.layers.12.self_attn.q_proj.lora_A.0.weight
model.layers.12.self_attn.q_proj.lora_A.1.weight
model.layers.12.self_attn.q_proj.lora_B.0.weight
model.layers.12.self_attn.q_proj.lora_B.1.weight
model.layers.12.self_attn.k_proj.lora_A.0.weight
model.layers.12.self_attn.k_proj.lora_A.1.weight
model.layers.12.self_attn.k_proj.lora_B.0.weight
model.layers.12.self_attn.k_proj.lora_B.1.weight
model.layers.12.self_attn.v_proj.lora_A.0.weight
model.layers.12.self_attn.v_proj.lora_A.1.weight
model.layers.12.self_attn.v_proj.lora_B.0.weight
model.layers.12.self_attn.v_proj.lora_B.1.weight
model.layers.12.self_attn.o_proj.lora_A.0.weight
model.layers.12.self_attn.o_proj.lora_A.1.weight
model.layers.12.self_attn.o_proj.lora_B.0.weight
model.layers.12.self_attn.o_proj.lora_B.1.weight
model.layers.12.mlp.gate_proj.lora_A.0.weight
model.layers.12.mlp.gate_proj.lora_A.1.weight
model.layers.12.mlp.gate_proj.lora_B.0.weight
model.layers.12.mlp.gate_proj.lora_B.1.weight
model.layers.12.mlp.up_proj.lora_A.0.weight
model.layers.12.mlp.up_proj.lora_A.1.weight
model.layers.12.mlp.up_proj.lora_B.0.weight
model.layers.12.mlp.up_proj.lora_B.1.weight
model.layers.12.mlp.down_proj.lora_A.0.weight
model.layers.12.mlp.down_proj.lora_A.1.weight
model.layers.12.mlp.down_proj.lora_B.0.weight
model.layers.12.mlp.down_proj.lora_B.1.weight
model.layers.13.self_attn.q_proj.lora_A.0.weight
model.layers.13.self_attn.q_proj.lora_A.1.weight
model.layers.13.self_attn.q_proj.lora_B.0.weight
model.layers.13.self_attn.q_proj.lora_B.1.weight
model.layers.13.self_attn.k_proj.lora_A.0.weight
model.layers.13.self_attn.k_proj.lora_A.1.weight
model.layers.13.self_attn.k_proj.lora_B.0.weight
model.layers.13.self_attn.k_proj.lora_B.1.weight
model.layers.13.self_attn.v_proj.lora_A.0.weight
model.layers.13.self_attn.v_proj.lora_A.1.weight
model.layers.13.self_attn.v_proj.lora_B.0.weight
model.layers.13.self_attn.v_proj.lora_B.1.weight
model.layers.13.self_attn.o_proj.lora_A.0.weight
model.layers.13.self_attn.o_proj.lora_A.1.weight
model.layers.13.self_attn.o_proj.lora_B.0.weight
model.layers.13.self_attn.o_proj.lora_B.1.weight
model.layers.13.mlp.gate_proj.lora_A.0.weight
model.layers.13.mlp.gate_proj.lora_A.1.weight
model.layers.13.mlp.gate_proj.lora_B.0.weight
model.layers.13.mlp.gate_proj.lora_B.1.weight
model.layers.13.mlp.up_proj.lora_A.0.weight
model.layers.13.mlp.up_proj.lora_A.1.weight
model.layers.13.mlp.up_proj.lora_B.0.weight
model.layers.13.mlp.up_proj.lora_B.1.weight
model.layers.13.mlp.down_proj.lora_A.0.weight
model.layers.13.mlp.down_proj.lora_A.1.weight
model.layers.13.mlp.down_proj.lora_B.0.weight
model.layers.13.mlp.down_proj.lora_B.1.weight
model.layers.14.self_attn.q_proj.lora_A.0.weight
model.layers.14.self_attn.q_proj.lora_A.1.weight
model.layers.14.self_attn.q_proj.lora_B.0.weight
model.layers.14.self_attn.q_proj.lora_B.1.weight
model.layers.14.self_attn.k_proj.lora_A.0.weight
model.layers.14.self_attn.k_proj.lora_A.1.weight
model.layers.14.self_attn.k_proj.lora_B.0.weight
model.layers.14.self_attn.k_proj.lora_B.1.weight
model.layers.14.self_attn.v_proj.lora_A.0.weight
model.layers.14.self_attn.v_proj.lora_A.1.weight
model.layers.14.self_attn.v_proj.lora_B.0.weight
model.layers.14.self_attn.v_proj.lora_B.1.weight
model.layers.14.self_attn.o_proj.lora_A.0.weight
model.layers.14.self_attn.o_proj.lora_A.1.weight
model.layers.14.self_attn.o_proj.lora_B.0.weight
model.layers.14.self_attn.o_proj.lora_B.1.weight
model.layers.14.mlp.gate_proj.lora_A.0.weight
model.layers.14.mlp.gate_proj.lora_A.1.weight
model.layers.14.mlp.gate_proj.lora_B.0.weight
model.layers.14.mlp.gate_proj.lora_B.1.weight
model.layers.14.mlp.up_proj.lora_A.0.weight
model.layers.14.mlp.up_proj.lora_A.1.weight
model.layers.14.mlp.up_proj.lora_B.0.weight
model.layers.14.mlp.up_proj.lora_B.1.weight
model.layers.14.mlp.down_proj.lora_A.0.weight
model.layers.14.mlp.down_proj.lora_A.1.weight
model.layers.14.mlp.down_proj.lora_B.0.weight
model.layers.14.mlp.down_proj.lora_B.1.weight
model.layers.15.self_attn.q_proj.lora_A.0.weight
model.layers.15.self_attn.q_proj.lora_A.1.weight
model.layers.15.self_attn.q_proj.lora_B.0.weight
model.layers.15.self_attn.q_proj.lora_B.1.weight
model.layers.15.self_attn.k_proj.lora_A.0.weight
model.layers.15.self_attn.k_proj.lora_A.1.weight
model.layers.15.self_attn.k_proj.lora_B.0.weight
model.layers.15.self_attn.k_proj.lora_B.1.weight
model.layers.15.self_attn.v_proj.lora_A.0.weight
model.layers.15.self_attn.v_proj.lora_A.1.weight
model.layers.15.self_attn.v_proj.lora_B.0.weight
model.layers.15.self_attn.v_proj.lora_B.1.weight
model.layers.15.self_attn.o_proj.lora_A.0.weight
model.layers.15.self_attn.o_proj.lora_A.1.weight
model.layers.15.self_attn.o_proj.lora_B.0.weight
model.layers.15.self_attn.o_proj.lora_B.1.weight
model.layers.15.mlp.gate_proj.lora_A.0.weight
model.layers.15.mlp.gate_proj.lora_A.1.weight
model.layers.15.mlp.gate_proj.lora_B.0.weight
model.layers.15.mlp.gate_proj.lora_B.1.weight
model.layers.15.mlp.up_proj.lora_A.0.weight
model.layers.15.mlp.up_proj.lora_A.1.weight
model.layers.15.mlp.up_proj.lora_B.0.weight
model.layers.15.mlp.up_proj.lora_B.1.weight
model.layers.15.mlp.down_proj.lora_A.0.weight
model.layers.15.mlp.down_proj.lora_A.1.weight
model.layers.15.mlp.down_proj.lora_B.0.weight
model.layers.15.mlp.down_proj.lora_B.1.weight
model.layers.16.self_attn.q_proj.lora_A.0.weight
model.layers.16.self_attn.q_proj.lora_A.1.weight
model.layers.16.self_attn.q_proj.lora_B.0.weight
model.layers.16.self_attn.q_proj.lora_B.1.weight
model.layers.16.self_attn.k_proj.lora_A.0.weight
model.layers.16.self_attn.k_proj.lora_A.1.weight
model.layers.16.self_attn.k_proj.lora_B.0.weight
model.layers.16.self_attn.k_proj.lora_B.1.weight
model.layers.16.self_attn.v_proj.lora_A.0.weight
model.layers.16.self_attn.v_proj.lora_A.1.weight
model.layers.16.self_attn.v_proj.lora_B.0.weight
model.layers.16.self_attn.v_proj.lora_B.1.weight
model.layers.16.self_attn.o_proj.lora_A.0.weight
model.layers.16.self_attn.o_proj.lora_A.1.weight
model.layers.16.self_attn.o_proj.lora_B.0.weight
model.layers.16.self_attn.o_proj.lora_B.1.weight
model.layers.16.mlp.gate_proj.lora_A.0.weight
model.layers.16.mlp.gate_proj.lora_A.1.weight
model.layers.16.mlp.gate_proj.lora_B.0.weight
model.layers.16.mlp.gate_proj.lora_B.1.weight
model.layers.16.mlp.up_proj.lora_A.0.weight
model.layers.16.mlp.up_proj.lora_A.1.weight
model.layers.16.mlp.up_proj.lora_B.0.weight
model.layers.16.mlp.up_proj.lora_B.1.weight
model.layers.16.mlp.down_proj.lora_A.0.weight
model.layers.16.mlp.down_proj.lora_A.1.weight
model.layers.16.mlp.down_proj.lora_B.0.weight
model.layers.16.mlp.down_proj.lora_B.1.weight
model.layers.17.self_attn.q_proj.lora_A.0.weight
model.layers.17.self_attn.q_proj.lora_A.1.weight
model.layers.17.self_attn.q_proj.lora_B.0.weight
model.layers.17.self_attn.q_proj.lora_B.1.weight
model.layers.17.self_attn.k_proj.lora_A.0.weight
model.layers.17.self_attn.k_proj.lora_A.1.weight
model.layers.17.self_attn.k_proj.lora_B.0.weight
model.layers.17.self_attn.k_proj.lora_B.1.weight
model.layers.17.self_attn.v_proj.lora_A.0.weight
model.layers.17.self_attn.v_proj.lora_A.1.weight
model.layers.17.self_attn.v_proj.lora_B.0.weight
model.layers.17.self_attn.v_proj.lora_B.1.weight
model.layers.17.self_attn.o_proj.lora_A.0.weight
model.layers.17.self_attn.o_proj.lora_A.1.weight
model.layers.17.self_attn.o_proj.lora_B.0.weight
model.layers.17.self_attn.o_proj.lora_B.1.weight
model.layers.17.mlp.gate_proj.lora_A.0.weight
model.layers.17.mlp.gate_proj.lora_A.1.weight
model.layers.17.mlp.gate_proj.lora_B.0.weight
model.layers.17.mlp.gate_proj.lora_B.1.weight
model.layers.17.mlp.up_proj.lora_A.0.weight
model.layers.17.mlp.up_proj.lora_A.1.weight
model.layers.17.mlp.up_proj.lora_B.0.weight
model.layers.17.mlp.up_proj.lora_B.1.weight
model.layers.17.mlp.down_proj.lora_A.0.weight
model.layers.17.mlp.down_proj.lora_A.1.weight
model.layers.17.mlp.down_proj.lora_B.0.weight
model.layers.17.mlp.down_proj.lora_B.1.weight
model.layers.18.self_attn.q_proj.lora_A.0.weight
model.layers.18.self_attn.q_proj.lora_A.1.weight
model.layers.18.self_attn.q_proj.lora_B.0.weight
model.layers.18.self_attn.q_proj.lora_B.1.weight
model.layers.18.self_attn.k_proj.lora_A.0.weight
model.layers.18.self_attn.k_proj.lora_A.1.weight
model.layers.18.self_attn.k_proj.lora_B.0.weight
model.layers.18.self_attn.k_proj.lora_B.1.weight
model.layers.18.self_attn.v_proj.lora_A.0.weight
model.layers.18.self_attn.v_proj.lora_A.1.weight
model.layers.18.self_attn.v_proj.lora_B.0.weight
model.layers.18.self_attn.v_proj.lora_B.1.weight
model.layers.18.self_attn.o_proj.lora_A.0.weight
model.layers.18.self_attn.o_proj.lora_A.1.weight
model.layers.18.self_attn.o_proj.lora_B.0.weight
model.layers.18.self_attn.o_proj.lora_B.1.weight
model.layers.18.mlp.gate_proj.lora_A.0.weight
model.layers.18.mlp.gate_proj.lora_A.1.weight
model.layers.18.mlp.gate_proj.lora_B.0.weight
model.layers.18.mlp.gate_proj.lora_B.1.weight
model.layers.18.mlp.up_proj.lora_A.0.weight
model.layers.18.mlp.up_proj.lora_A.1.weight
model.layers.18.mlp.up_proj.lora_B.0.weight
model.layers.18.mlp.up_proj.lora_B.1.weight
model.layers.18.mlp.down_proj.lora_A.0.weight
model.layers.18.mlp.down_proj.lora_A.1.weight
model.layers.18.mlp.down_proj.lora_B.0.weight
model.layers.18.mlp.down_proj.lora_B.1.weight
model.layers.19.self_attn.q_proj.lora_A.0.weight
model.layers.19.self_attn.q_proj.lora_A.1.weight
model.layers.19.self_attn.q_proj.lora_B.0.weight
model.layers.19.self_attn.q_proj.lora_B.1.weight
model.layers.19.self_attn.k_proj.lora_A.0.weight
model.layers.19.self_attn.k_proj.lora_A.1.weight
model.layers.19.self_attn.k_proj.lora_B.0.weight
model.layers.19.self_attn.k_proj.lora_B.1.weight
model.layers.19.self_attn.v_proj.lora_A.0.weight
model.layers.19.self_attn.v_proj.lora_A.1.weight
model.layers.19.self_attn.v_proj.lora_B.0.weight
model.layers.19.self_attn.v_proj.lora_B.1.weight
model.layers.19.self_attn.o_proj.lora_A.0.weight
model.layers.19.self_attn.o_proj.lora_A.1.weight
model.layers.19.self_attn.o_proj.lora_B.0.weight
model.layers.19.self_attn.o_proj.lora_B.1.weight
model.layers.19.mlp.gate_proj.lora_A.0.weight
model.layers.19.mlp.gate_proj.lora_A.1.weight
model.layers.19.mlp.gate_proj.lora_B.0.weight
model.layers.19.mlp.gate_proj.lora_B.1.weight
model.layers.19.mlp.up_proj.lora_A.0.weight
model.layers.19.mlp.up_proj.lora_A.1.weight
model.layers.19.mlp.up_proj.lora_B.0.weight
model.layers.19.mlp.up_proj.lora_B.1.weight
model.layers.19.mlp.down_proj.lora_A.0.weight
model.layers.19.mlp.down_proj.lora_A.1.weight
model.layers.19.mlp.down_proj.lora_B.0.weight
model.layers.19.mlp.down_proj.lora_B.1.weight
model.layers.20.self_attn.q_proj.lora_A.0.weight
model.layers.20.self_attn.q_proj.lora_A.1.weight
model.layers.20.self_attn.q_proj.lora_B.0.weight
model.layers.20.self_attn.q_proj.lora_B.1.weight
model.layers.20.self_attn.k_proj.lora_A.0.weight
model.layers.20.self_attn.k_proj.lora_A.1.weight
model.layers.20.self_attn.k_proj.lora_B.0.weight
model.layers.20.self_attn.k_proj.lora_B.1.weight
model.layers.20.self_attn.v_proj.lora_A.0.weight
model.layers.20.self_attn.v_proj.lora_A.1.weight
model.layers.20.self_attn.v_proj.lora_B.0.weight
model.layers.20.self_attn.v_proj.lora_B.1.weight
model.layers.20.self_attn.o_proj.lora_A.0.weight
model.layers.20.self_attn.o_proj.lora_A.1.weight
model.layers.20.self_attn.o_proj.lora_B.0.weight
model.layers.20.self_attn.o_proj.lora_B.1.weight
model.layers.20.mlp.gate_proj.lora_A.0.weight
model.layers.20.mlp.gate_proj.lora_A.1.weight
model.layers.20.mlp.gate_proj.lora_B.0.weight
model.layers.20.mlp.gate_proj.lora_B.1.weight
model.layers.20.mlp.up_proj.lora_A.0.weight
model.layers.20.mlp.up_proj.lora_A.1.weight
model.layers.20.mlp.up_proj.lora_B.0.weight
model.layers.20.mlp.up_proj.lora_B.1.weight
model.layers.20.mlp.down_proj.lora_A.0.weight
model.layers.20.mlp.down_proj.lora_A.1.weight
model.layers.20.mlp.down_proj.lora_B.0.weight
model.layers.20.mlp.down_proj.lora_B.1.weight
model.layers.21.self_attn.q_proj.lora_A.0.weight
model.layers.21.self_attn.q_proj.lora_A.1.weight
model.layers.21.self_attn.q_proj.lora_B.0.weight
model.layers.21.self_attn.q_proj.lora_B.1.weight
model.layers.21.self_attn.k_proj.lora_A.0.weight
model.layers.21.self_attn.k_proj.lora_A.1.weight
model.layers.21.self_attn.k_proj.lora_B.0.weight
model.layers.21.self_attn.k_proj.lora_B.1.weight
model.layers.21.self_attn.v_proj.lora_A.0.weight
model.layers.21.self_attn.v_proj.lora_A.1.weight
model.layers.21.self_attn.v_proj.lora_B.0.weight
model.layers.21.self_attn.v_proj.lora_B.1.weight
model.layers.21.self_attn.o_proj.lora_A.0.weight
model.layers.21.self_attn.o_proj.lora_A.1.weight
model.layers.21.self_attn.o_proj.lora_B.0.weight
model.layers.21.self_attn.o_proj.lora_B.1.weight
model.layers.21.mlp.gate_proj.lora_A.0.weight
model.layers.21.mlp.gate_proj.lora_A.1.weight
model.layers.21.mlp.gate_proj.lora_B.0.weight
model.layers.21.mlp.gate_proj.lora_B.1.weight
model.layers.21.mlp.up_proj.lora_A.0.weight
model.layers.21.mlp.up_proj.lora_A.1.weight
model.layers.21.mlp.up_proj.lora_B.0.weight
model.layers.21.mlp.up_proj.lora_B.1.weight
model.layers.21.mlp.down_proj.lora_A.0.weight
model.layers.21.mlp.down_proj.lora_A.1.weight
model.layers.21.mlp.down_proj.lora_B.0.weight
model.layers.21.mlp.down_proj.lora_B.1.weight
model.layers.22.self_attn.q_proj.lora_A.0.weight
model.layers.22.self_attn.q_proj.lora_A.1.weight
model.layers.22.self_attn.q_proj.lora_B.0.weight
model.layers.22.self_attn.q_proj.lora_B.1.weight
model.layers.22.self_attn.k_proj.lora_A.0.weight
model.layers.22.self_attn.k_proj.lora_A.1.weight
model.layers.22.self_attn.k_proj.lora_B.0.weight
model.layers.22.self_attn.k_proj.lora_B.1.weight
model.layers.22.self_attn.v_proj.lora_A.0.weight
model.layers.22.self_attn.v_proj.lora_A.1.weight
model.layers.22.self_attn.v_proj.lora_B.0.weight
model.layers.22.self_attn.v_proj.lora_B.1.weight
model.layers.22.self_attn.o_proj.lora_A.0.weight
model.layers.22.self_attn.o_proj.lora_A.1.weight
model.layers.22.self_attn.o_proj.lora_B.0.weight
model.layers.22.self_attn.o_proj.lora_B.1.weight
model.layers.22.mlp.gate_proj.lora_A.0.weight
model.layers.22.mlp.gate_proj.lora_A.1.weight
model.layers.22.mlp.gate_proj.lora_B.0.weight
model.layers.22.mlp.gate_proj.lora_B.1.weight
model.layers.22.mlp.up_proj.lora_A.0.weight
model.layers.22.mlp.up_proj.lora_A.1.weight
model.layers.22.mlp.up_proj.lora_B.0.weight
model.layers.22.mlp.up_proj.lora_B.1.weight
model.layers.22.mlp.down_proj.lora_A.0.weight
model.layers.22.mlp.down_proj.lora_A.1.weight
model.layers.22.mlp.down_proj.lora_B.0.weight
model.layers.22.mlp.down_proj.lora_B.1.weight
model.layers.23.self_attn.q_proj.lora_A.0.weight
model.layers.23.self_attn.q_proj.lora_A.1.weight
model.layers.23.self_attn.q_proj.lora_B.0.weight
model.layers.23.self_attn.q_proj.lora_B.1.weight
model.layers.23.self_attn.k_proj.lora_A.0.weight
model.layers.23.self_attn.k_proj.lora_A.1.weight
model.layers.23.self_attn.k_proj.lora_B.0.weight
model.layers.23.self_attn.k_proj.lora_B.1.weight
model.layers.23.self_attn.v_proj.lora_A.0.weight
model.layers.23.self_attn.v_proj.lora_A.1.weight
model.layers.23.self_attn.v_proj.lora_B.0.weight
model.layers.23.self_attn.v_proj.lora_B.1.weight
model.layers.23.self_attn.o_proj.lora_A.0.weight
model.layers.23.self_attn.o_proj.lora_A.1.weight
model.layers.23.self_attn.o_proj.lora_B.0.weight
model.layers.23.self_attn.o_proj.lora_B.1.weight
model.layers.23.mlp.gate_proj.lora_A.0.weight
model.layers.23.mlp.gate_proj.lora_A.1.weight
model.layers.23.mlp.gate_proj.lora_B.0.weight
model.layers.23.mlp.gate_proj.lora_B.1.weight
model.layers.23.mlp.up_proj.lora_A.0.weight
model.layers.23.mlp.up_proj.lora_A.1.weight
model.layers.23.mlp.up_proj.lora_B.0.weight
model.layers.23.mlp.up_proj.lora_B.1.weight
model.layers.23.mlp.down_proj.lora_A.0.weight
model.layers.23.mlp.down_proj.lora_A.1.weight
model.layers.23.mlp.down_proj.lora_B.0.weight
model.layers.23.mlp.down_proj.lora_B.1.weight
model.layers.24.self_attn.q_proj.lora_A.0.weight
model.layers.24.self_attn.q_proj.lora_A.1.weight
model.layers.24.self_attn.q_proj.lora_B.0.weight
model.layers.24.self_attn.q_proj.lora_B.1.weight
model.layers.24.self_attn.k_proj.lora_A.0.weight
model.layers.24.self_attn.k_proj.lora_A.1.weight
model.layers.24.self_attn.k_proj.lora_B.0.weight
model.layers.24.self_attn.k_proj.lora_B.1.weight
model.layers.24.self_attn.v_proj.lora_A.0.weight
model.layers.24.self_attn.v_proj.lora_A.1.weight
model.layers.24.self_attn.v_proj.lora_B.0.weight
model.layers.24.self_attn.v_proj.lora_B.1.weight
model.layers.24.self_attn.o_proj.lora_A.0.weight
model.layers.24.self_attn.o_proj.lora_A.1.weight
model.layers.24.self_attn.o_proj.lora_B.0.weight
model.layers.24.self_attn.o_proj.lora_B.1.weight
model.layers.24.mlp.gate_proj.lora_A.0.weight
model.layers.24.mlp.gate_proj.lora_A.1.weight
model.layers.24.mlp.gate_proj.lora_B.0.weight
model.layers.24.mlp.gate_proj.lora_B.1.weight
model.layers.24.mlp.up_proj.lora_A.0.weight
model.layers.24.mlp.up_proj.lora_A.1.weight
model.layers.24.mlp.up_proj.lora_B.0.weight
model.layers.24.mlp.up_proj.lora_B.1.weight
model.layers.24.mlp.down_proj.lora_A.0.weight
model.layers.24.mlp.down_proj.lora_A.1.weight
model.layers.24.mlp.down_proj.lora_B.0.weight
model.layers.24.mlp.down_proj.lora_B.1.weight
model.layers.25.self_attn.q_proj.lora_A.0.weight
model.layers.25.self_attn.q_proj.lora_A.1.weight
model.layers.25.self_attn.q_proj.lora_B.0.weight
model.layers.25.self_attn.q_proj.lora_B.1.weight
model.layers.25.self_attn.k_proj.lora_A.0.weight
model.layers.25.self_attn.k_proj.lora_A.1.weight
model.layers.25.self_attn.k_proj.lora_B.0.weight
model.layers.25.self_attn.k_proj.lora_B.1.weight
model.layers.25.self_attn.v_proj.lora_A.0.weight
model.layers.25.self_attn.v_proj.lora_A.1.weight
model.layers.25.self_attn.v_proj.lora_B.0.weight
model.layers.25.self_attn.v_proj.lora_B.1.weight
model.layers.25.self_attn.o_proj.lora_A.0.weight
model.layers.25.self_attn.o_proj.lora_A.1.weight
model.layers.25.self_attn.o_proj.lora_B.0.weight
model.layers.25.self_attn.o_proj.lora_B.1.weight
model.layers.25.mlp.gate_proj.lora_A.0.weight
model.layers.25.mlp.gate_proj.lora_A.1.weight
model.layers.25.mlp.gate_proj.lora_B.0.weight
model.layers.25.mlp.gate_proj.lora_B.1.weight
model.layers.25.mlp.up_proj.lora_A.0.weight
model.layers.25.mlp.up_proj.lora_A.1.weight
model.layers.25.mlp.up_proj.lora_B.0.weight
model.layers.25.mlp.up_proj.lora_B.1.weight
model.layers.25.mlp.down_proj.lora_A.0.weight
model.layers.25.mlp.down_proj.lora_A.1.weight
model.layers.25.mlp.down_proj.lora_B.0.weight
model.layers.25.mlp.down_proj.lora_B.1.weight
model.layers.26.self_attn.q_proj.lora_A.0.weight
model.layers.26.self_attn.q_proj.lora_A.1.weight
model.layers.26.self_attn.q_proj.lora_B.0.weight
model.layers.26.self_attn.q_proj.lora_B.1.weight
model.layers.26.self_attn.k_proj.lora_A.0.weight
model.layers.26.self_attn.k_proj.lora_A.1.weight
model.layers.26.self_attn.k_proj.lora_B.0.weight
model.layers.26.self_attn.k_proj.lora_B.1.weight
model.layers.26.self_attn.v_proj.lora_A.0.weight
model.layers.26.self_attn.v_proj.lora_A.1.weight
model.layers.26.self_attn.v_proj.lora_B.0.weight
model.layers.26.self_attn.v_proj.lora_B.1.weight
model.layers.26.self_attn.o_proj.lora_A.0.weight
model.layers.26.self_attn.o_proj.lora_A.1.weight
model.layers.26.self_attn.o_proj.lora_B.0.weight
model.layers.26.self_attn.o_proj.lora_B.1.weight
model.layers.26.mlp.gate_proj.lora_A.0.weight
model.layers.26.mlp.gate_proj.lora_A.1.weight
model.layers.26.mlp.gate_proj.lora_B.0.weight
model.layers.26.mlp.gate_proj.lora_B.1.weight
model.layers.26.mlp.up_proj.lora_A.0.weight
model.layers.26.mlp.up_proj.lora_A.1.weight
model.layers.26.mlp.up_proj.lora_B.0.weight
model.layers.26.mlp.up_proj.lora_B.1.weight
model.layers.26.mlp.down_proj.lora_A.0.weight
model.layers.26.mlp.down_proj.lora_A.1.weight
model.layers.26.mlp.down_proj.lora_B.0.weight
model.layers.26.mlp.down_proj.lora_B.1.weight
model.layers.27.self_attn.q_proj.lora_A.0.weight
model.layers.27.self_attn.q_proj.lora_A.1.weight
model.layers.27.self_attn.q_proj.lora_B.0.weight
model.layers.27.self_attn.q_proj.lora_B.1.weight
model.layers.27.self_attn.k_proj.lora_A.0.weight
model.layers.27.self_attn.k_proj.lora_A.1.weight
model.layers.27.self_attn.k_proj.lora_B.0.weight
model.layers.27.self_attn.k_proj.lora_B.1.weight
model.layers.27.self_attn.v_proj.lora_A.0.weight
model.layers.27.self_attn.v_proj.lora_A.1.weight
model.layers.27.self_attn.v_proj.lora_B.0.weight
model.layers.27.self_attn.v_proj.lora_B.1.weight
model.layers.27.self_attn.o_proj.lora_A.0.weight
model.layers.27.self_attn.o_proj.lora_A.1.weight
model.layers.27.self_attn.o_proj.lora_B.0.weight
model.layers.27.self_attn.o_proj.lora_B.1.weight
model.layers.27.mlp.gate_proj.lora_A.0.weight
model.layers.27.mlp.gate_proj.lora_A.1.weight
model.layers.27.mlp.gate_proj.lora_B.0.weight
model.layers.27.mlp.gate_proj.lora_B.1.weight
model.layers.27.mlp.up_proj.lora_A.0.weight
model.layers.27.mlp.up_proj.lora_A.1.weight
model.layers.27.mlp.up_proj.lora_B.0.weight
model.layers.27.mlp.up_proj.lora_B.1.weight
model.layers.27.mlp.down_proj.lora_A.0.weight
model.layers.27.mlp.down_proj.lora_A.1.weight
model.layers.27.mlp.down_proj.lora_B.0.weight
model.layers.27.mlp.down_proj.lora_B.1.weight
model.layers.28.self_attn.q_proj.lora_A.0.weight
model.layers.28.self_attn.q_proj.lora_A.1.weight
model.layers.28.self_attn.q_proj.lora_B.0.weight
model.layers.28.self_attn.q_proj.lora_B.1.weight
model.layers.28.self_attn.k_proj.lora_A.0.weight
model.layers.28.self_attn.k_proj.lora_A.1.weight
model.layers.28.self_attn.k_proj.lora_B.0.weight
model.layers.28.self_attn.k_proj.lora_B.1.weight
model.layers.28.self_attn.v_proj.lora_A.0.weight
model.layers.28.self_attn.v_proj.lora_A.1.weight
model.layers.28.self_attn.v_proj.lora_B.0.weight
model.layers.28.self_attn.v_proj.lora_B.1.weight
model.layers.28.self_attn.o_proj.lora_A.0.weight
model.layers.28.self_attn.o_proj.lora_A.1.weight
model.layers.28.self_attn.o_proj.lora_B.0.weight
model.layers.28.self_attn.o_proj.lora_B.1.weight
model.layers.28.mlp.gate_proj.lora_A.0.weight
model.layers.28.mlp.gate_proj.lora_A.1.weight
model.layers.28.mlp.gate_proj.lora_B.0.weight
model.layers.28.mlp.gate_proj.lora_B.1.weight
model.layers.28.mlp.up_proj.lora_A.0.weight
model.layers.28.mlp.up_proj.lora_A.1.weight
model.layers.28.mlp.up_proj.lora_B.0.weight
model.layers.28.mlp.up_proj.lora_B.1.weight
model.layers.28.mlp.down_proj.lora_A.0.weight
model.layers.28.mlp.down_proj.lora_A.1.weight
model.layers.28.mlp.down_proj.lora_B.0.weight
model.layers.28.mlp.down_proj.lora_B.1.weight
model.layers.29.self_attn.q_proj.lora_A.0.weight
model.layers.29.self_attn.q_proj.lora_A.1.weight
model.layers.29.self_attn.q_proj.lora_B.0.weight
model.layers.29.self_attn.q_proj.lora_B.1.weight
model.layers.29.self_attn.k_proj.lora_A.0.weight
model.layers.29.self_attn.k_proj.lora_A.1.weight
model.layers.29.self_attn.k_proj.lora_B.0.weight
model.layers.29.self_attn.k_proj.lora_B.1.weight
model.layers.29.self_attn.v_proj.lora_A.0.weight
model.layers.29.self_attn.v_proj.lora_A.1.weight
model.layers.29.self_attn.v_proj.lora_B.0.weight
model.layers.29.self_attn.v_proj.lora_B.1.weight
model.layers.29.self_attn.o_proj.lora_A.0.weight
model.layers.29.self_attn.o_proj.lora_A.1.weight
model.layers.29.self_attn.o_proj.lora_B.0.weight
model.layers.29.self_attn.o_proj.lora_B.1.weight
model.layers.29.mlp.gate_proj.lora_A.0.weight
model.layers.29.mlp.gate_proj.lora_A.1.weight
model.layers.29.mlp.gate_proj.lora_B.0.weight
model.layers.29.mlp.gate_proj.lora_B.1.weight
model.layers.29.mlp.up_proj.lora_A.0.weight
model.layers.29.mlp.up_proj.lora_A.1.weight
model.layers.29.mlp.up_proj.lora_B.0.weight
model.layers.29.mlp.up_proj.lora_B.1.weight
model.layers.29.mlp.down_proj.lora_A.0.weight
model.layers.29.mlp.down_proj.lora_A.1.weight
model.layers.29.mlp.down_proj.lora_B.0.weight
model.layers.29.mlp.down_proj.lora_B.1.weight
model.layers.30.self_attn.q_proj.lora_A.0.weight
model.layers.30.self_attn.q_proj.lora_A.1.weight
model.layers.30.self_attn.q_proj.lora_B.0.weight
model.layers.30.self_attn.q_proj.lora_B.1.weight
model.layers.30.self_attn.k_proj.lora_A.0.weight
model.layers.30.self_attn.k_proj.lora_A.1.weight
model.layers.30.self_attn.k_proj.lora_B.0.weight
model.layers.30.self_attn.k_proj.lora_B.1.weight
model.layers.30.self_attn.v_proj.lora_A.0.weight
model.layers.30.self_attn.v_proj.lora_A.1.weight
model.layers.30.self_attn.v_proj.lora_B.0.weight
model.layers.30.self_attn.v_proj.lora_B.1.weight
model.layers.30.self_attn.o_proj.lora_A.0.weight
model.layers.30.self_attn.o_proj.lora_A.1.weight
model.layers.30.self_attn.o_proj.lora_B.0.weight
model.layers.30.self_attn.o_proj.lora_B.1.weight
model.layers.30.mlp.gate_proj.lora_A.0.weight
model.layers.30.mlp.gate_proj.lora_A.1.weight
model.layers.30.mlp.gate_proj.lora_B.0.weight
model.layers.30.mlp.gate_proj.lora_B.1.weight
model.layers.30.mlp.up_proj.lora_A.0.weight
model.layers.30.mlp.up_proj.lora_A.1.weight
model.layers.30.mlp.up_proj.lora_B.0.weight
model.layers.30.mlp.up_proj.lora_B.1.weight
model.layers.30.mlp.down_proj.lora_A.0.weight
model.layers.30.mlp.down_proj.lora_A.1.weight
model.layers.30.mlp.down_proj.lora_B.0.weight
model.layers.30.mlp.down_proj.lora_B.1.weight
model.layers.31.self_attn.q_proj.lora_A.0.weight
model.layers.31.self_attn.q_proj.lora_A.1.weight
model.layers.31.self_attn.q_proj.lora_B.0.weight
model.layers.31.self_attn.q_proj.lora_B.1.weight
model.layers.31.self_attn.k_proj.lora_A.0.weight
model.layers.31.self_attn.k_proj.lora_A.1.weight
model.layers.31.self_attn.k_proj.lora_B.0.weight
model.layers.31.self_attn.k_proj.lora_B.1.weight
model.layers.31.self_attn.v_proj.lora_A.0.weight
model.layers.31.self_attn.v_proj.lora_A.1.weight
model.layers.31.self_attn.v_proj.lora_B.0.weight
model.layers.31.self_attn.v_proj.lora_B.1.weight
model.layers.31.self_attn.o_proj.lora_A.0.weight
model.layers.31.self_attn.o_proj.lora_A.1.weight
model.layers.31.self_attn.o_proj.lora_B.0.weight
model.layers.31.self_attn.o_proj.lora_B.1.weight
model.layers.31.mlp.gate_proj.lora_A.0.weight
model.layers.31.mlp.gate_proj.lora_A.1.weight
model.layers.31.mlp.gate_proj.lora_B.0.weight
model.layers.31.mlp.gate_proj.lora_B.1.weight
model.layers.31.mlp.up_proj.lora_A.0.weight
model.layers.31.mlp.up_proj.lora_A.1.weight
model.layers.31.mlp.up_proj.lora_B.0.weight
model.layers.31.mlp.up_proj.lora_B.1.weight
model.layers.31.mlp.down_proj.lora_A.0.weight
model.layers.31.mlp.down_proj.lora_A.1.weight
model.layers.31.mlp.down_proj.lora_B.0.weight
model.layers.31.mlp.down_proj.lora_B.1.weight
----------------------

And then, just after that, I run the SFTTrainer, which prints, exactly:

Using auto half precision backend
Currently training with a batch size of: 2
***** Running training *****
  Num examples = 1,053
  Num Epochs = 1
  Instantaneous batch size per device = 2
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 16
  Total optimization steps = 32
  Number of trainable parameters = 118,372,800
Detected flash_attn version: 2.6.3
/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:91: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
BenjaminBossan commented 3 weeks ago

Thanks @benjamin-marie. The internal_xlora_classifier does not appear among the trainable parameters, whereas the LoRAs should be frozen, right @EricLBuehler?

EricLBuehler commented 3 weeks ago

Yes, exactly. I'll try to reproduce and fix this!