LoRA adaptation shape mismatch

manlenzzz commented 4 months ago

LoRA adaptation shape mismatch: After performing LoRA fine-tuning, I read the adaptation using PEFT and found that the performance did not change compared to not reading LoRA at all. During LoRA fine-tuning, I used the rank_pattern to input the rank, and I'm using version 0.6.0. After checking, I found that there seems to be a shape mismatch issue, but what's puzzling is that the dimension of the LoRA weight is the same as the original model's parameters, not the pruned model's parameters, which I used to train the adaptation. According to logic, if a shape mismatch occurred, it should have raised an error when using the adaptation for LM evaluation accuracy, but surprisingly, it didn't raise any errors, and instead, the performance remained the same as without adaptation.

manlenzzz commented 4 months ago

a part of information:

Traceback (most recent call last):
  File "/mnt/workspace/user/zhouchanghai/LLM-Pruner/post_training.py", line 291, in <module>
    main(args)
  File "/mnt/workspace/user/zhouchanghai/LLM-Pruner/post_training.py", line 243, in main
    saved_model = transformers.AutoModelForCausalLM.from_pretrained(args.output_dir)
  File "/mnt/workspace/user/zhouchanghai/LLM-Pruner/env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
    return model_class.from_pretrained(
  File "/mnt/workspace/user/zhouchanghai/LLM-Pruner/env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3827, in from_pretrained
    model.load_adapter(
  File "/mnt/workspace/user/zhouchanghai/LLM-Pruner/env/lib/python3.10/site-packages/transformers/integrations/peft.py", line 214, in load_adapter
    incompatible_keys = set_peft_model_state_dict(self, processed_adapter_state_dict, adapter_name)
  File "/mnt/workspace/user/zhouchanghai/LLM-Pruner/env/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 158, in set_peft_model_state_dict
    load_result = model.load_state_dict(peft_model_state_dict, strict=False)
  File "/mnt/workspace/user/zhouchanghai/LLM-Pruner/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
        size mismatch for model.layers.4.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([1920, 12]) from checkpoint, the shape in current model is torch.Size([5120, 12]).
        size mismatch for model.layers.4.self_attn.k_proj.lora_B.default.weight: copying a param with shape torch.Size([1920, 12]) from checkpoint, the shape in current model is torch.Size([5120, 12]).
        size mismatch for model.layers.4.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([1920, 12]) from checkpoint, the shape in current model is torch.Size([5120, 12]).
        size mismatch for model.layers.4.self_attn.o_proj.lora_A.default.weight: copying a param with shape torch.Size([12, 1920]) from checkpoint, the shape in current model is torch.Size([12, 5120]).

BenjaminBossan commented 4 months ago

Can you show the code you used to initialize this adapter and to train it? Also, is there a specific reason why you use 0.6.0 or could you upgrade to a newer version?

manlenzzz commented 4 months ago

Can you show the code you used to initialize this adapter and to train it? Also, is there a specific reason why you use 0.6.0 or could you upgrade to a newer version?

Thanks for your reply. When I use a newer version, the training loss will appear NaN. But after I repeated the fine-tuning a few times, the error suddenly disappeared. . . I successfully loaded the adaptation. Below is my code：

    # Prepare For LoRA
    model = prepare_model_for_int8_training(model)
    rank_pattern = {}
    layers_ranks = [int(rank) for rank in args.layers_ranks.split(',')]
    for i, rank in enumerate(layers_ranks):
        for module in ["self_attn", "mlp"]:
            if module == "self_attn":
                projs = ["q_proj", "k_proj", "v_proj", "o_proj"]
            else:  # "mlp" module
                projs = ["gate_proj", "down_proj", "up_proj"]

            for proj in projs:
                rank_pattern[f"model.layers.{i}.{module}.{proj}"] = rank
    config = LoraConfig(
        lora_alpha=args.lora_alpha,
        target_modules=args.lora_target_modules.split(","),
        rank_pattern = rank_pattern,
        lora_dropout=args.lora_dropout,
        bias="none",
        task_type="CAUSAL_LM",
    )
    model = get_peft_model(model, config)
    model.print_trainable_parameters()

manlenzzz commented 4 months ago

And i have another question. In version 0.11.1, how can I use bits_pattern to customize the quantization bits of each layer? As you said in this document, but there is no excuse to pass in parameters yet: src/peft/tuners/lora/config.py

@dataclass
class LoftQConfig:
    """
    This is the sub-configuration class to store the configuration of a [`LoraModel`].

    Args:
        bits_pattern (`dict`): The mapping from layer names or regexp expression to bits which are different from the
            default bits specified by `bits`. For example, `{model.decoder.layers.0.encoder_attn.k_proj: 2`}.
        bits (`int`): Quantization bits for LoftQ.
        iter (`int`): Alternating iterations for LoftQ.
        fake (`bool`): True: use fp16/fp32; used for first time to save weights. False: use bitsandbytes 4bit linear
            models. weights can't be saved. Recommend to set to True, save the weights and load the saved weights in 4
            bits.
    """

    loftq_bits: int = field(default=4, metadata={"help": "Quantization bits for LoftQ"})
    loftq_iter: int = field(default=1, metadata={"help": "Alternating iterations for LoftQ"})

BenjaminBossan commented 4 months ago

Below is my code：

There is still some part missing, most notably what base model you use and how you load it. Please also share your args.layers_ranks.

how can I use bits_pattern to customize the quantization bits of each layer?

So if you want to configure and use LoftQ, you can pass the LoftQConfig to LoraConfig:

loftq_config = LoftQConfig(...)
lora_config = LoraConfig(..., init_lora_weight="loftq", loft_config=loftq_config)

manlenzzz commented 4 months ago

Hi,

1.Layers_ranks is a list that defines the rank values of different layers. I use this to define rank_pattern.

2.Loftq_config can only receive two parameters: loftq_bits and loftq_ite, but I want to give each layer a different number of quantized bits, but loftq_config cannot receive bits_pattern.

BenjaminBossan commented 4 months ago

1.Layers_ranks is a list that defines the rank values of different layers. I use this to define rank_pattern.

Yes, I got that, but to be able to debug, I need the value you pass, as well as the code used to load the base model.

2.Loftq_config can only receive two parameters: loftq_bits and loftq_ite, but I want to give each layer a different number of quantized bits, but loftq_config cannot receive bits_pattern.

Ah now I got your question. No, what you want is not supported.

manlenzzz commented 3 months ago

sad, If I want to implement what I said in question 2, what should I do?

BenjaminBossan commented 3 months ago

If I want to implement what I said in question 2, what should I do?

It depends on how far you want to take this. If I wanted to implement this just for some quick and dirty testing, I'd probably go into the PEFT code and add some extra logic. First you could extend the LoftQConfig class to accept more parameters. Then update the LoftQ init method on LoRA layers to pass on those parameters, and finally update the loftq_init function. Maybe there's more needed, but this is where I would start.

github-actions[bot] commented 3 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

huggingface / peft

LoRA adaptation shape mismatch #1746