tloen / alpaca-lora

Instruct-tune LLaMA on consumer hardware
Apache License 2.0
18.55k stars 2.21k forks source link

cannot export merged model #272

Open alexl83 opened 1 year ago

alexl83 commented 1 year ago

Hi! I'm trying to merge LLaMA-13B model with a lora tuning I performed thanks to this repo, I get an error about size mismatch Can you please help?

Thank you!

command line: BASE_MODEL=/home/alex/oobabooga/text-generation-webui/models/llama-13b python export_state_dict_checkpoint.py alpacacleaned-13b-loratrained.bin

adapter_config.json:


{
  "base_model_name_or_path": "/home/alex/oobabooga/text-generation-webui/models/llama-13b",
  "bias": "none",
  "enable_lora": null,
  "fan_in_fan_out": false,
  "inference_mode": true,
  "init_lora_weights": true,
  "lora_alpha": 64,
  "lora_dropout": 0.05,
  "merge_weights": false,
  "modules_to_save": null,
  "peft_type": "LORA",
  "r": 32,
  "target_modules": [
    "q_proj",
    "k_proj",
    "v_proj",
    "o_proj"
  ],
  "task_type": "CAUSAL_LM"
}

error output


===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /home/alex/oobabooga/installer_files/env/envs/alpaca-lora/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.9
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /home/alex/oobabooga/installer_files/env/envs/alpaca-lora/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
Loading checkpoint shards: 100%|██████████| 41/41 [00:36<00:00,  1.13it/s]
Traceback (most recent call last):
  File "/home/alex/oobabooga/alpaca-lora/export_state_dict_checkpoint.py", line 23, in <module>
    lora_model = PeftModel.from_pretrained(
  File "/home/alex/oobabooga/installer_files/env/envs/alpaca-lora/lib/python3.10/site-packages/peft/peft_model.py", line 163, in from_pretrained
    model = set_peft_model_state_dict(model, adapters_weights)
  File "/home/alex/oobabooga/installer_files/env/envs/alpaca-lora/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 74, in set_peft_model_state_dict
    model.load_state_dict(peft_model_state_dict, strict=False)
  File "/home/alex/oobabooga/installer_files/env/envs/alpaca-lora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
        size mismatch for base_model.model.model.layers.0.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.0.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.0.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.0.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.0.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.0.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.0.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.0.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.1.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.1.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.1.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.1.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.1.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.1.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.1.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.1.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.2.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.2.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.2.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.2.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.2.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.2.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.2.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.2.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.3.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.3.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.3.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.3.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.3.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.3.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.3.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.3.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.4.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.4.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.4.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.4.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.4.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.4.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.4.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.4.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.5.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.5.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.5.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.5.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.5.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.5.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.5.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.5.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.6.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.6.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.6.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.6.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.6.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.6.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.6.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.6.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.7.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.7.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.7.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.7.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.7.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.7.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.7.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.7.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.8.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.8.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.8.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.8.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.8.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.8.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.8.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.8.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.9.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.9.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.9.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.9.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.9.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.9.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.9.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.9.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.10.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.10.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.10.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.10.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.10.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.10.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.10.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.10.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.11.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.11.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.11.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.11.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.11.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.11.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.11.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.11.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.12.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.12.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.12.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.12.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.12.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.12.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.12.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.12.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.13.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.13.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.13.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.13.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.13.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.13.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.13.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.13.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.14.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.14.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.14.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.14.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.14.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.14.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.14.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.14.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.15.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.15.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.15.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.15.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.15.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.15.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.15.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.15.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.16.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.16.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.16.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.16.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.16.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.16.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.16.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.16.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.17.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.17.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.17.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.17.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.17.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.17.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.17.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.17.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.18.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.18.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.18.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.18.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.18.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.18.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.18.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.18.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.19.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.19.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.19.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.19.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.19.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.19.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.19.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.19.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.20.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.20.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.20.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.20.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.20.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.20.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.20.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.20.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.21.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.21.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.21.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.21.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.21.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.21.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.21.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.21.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.22.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.22.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.22.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.22.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.22.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.22.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.22.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.22.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.23.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.23.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.23.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.23.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.23.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.23.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.23.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.23.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.24.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.24.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.24.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.24.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.24.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.24.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.24.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.24.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.25.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.25.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.25.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.25.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.25.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.25.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.25.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.25.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.26.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.26.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.26.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.26.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.26.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.26.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.26.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.26.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.27.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.27.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.27.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.27.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.27.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.27.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.27.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.27.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.28.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.28.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.28.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.28.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.28.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.28.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.28.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.28.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.29.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.29.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.29.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.29.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.29.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.29.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.29.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.29.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.30.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.30.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.30.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.30.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.30.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.30.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.30.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.30.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.31.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.31.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.31.self_attn.k_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.31.self_attn.k_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.31.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.31.self_attn.v_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
        size mismatch for base_model.model.model.layers.31.self_attn.o_proj.lora_A.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([16, 5120]).
        size mismatch for base_model.model.model.layers.31.self_attn.o_proj.lora_B.weight: copying a param with shape torch.Size([4096, 16]) from checkpoint, the shape in current model is torch.Size([5120, 16]).
tloen commented 1 year ago

See #211

alexl83 commented 1 year ago

Thanks! Is there a way to point the export script to a local copy of the trained lora? It's trying to get it from huggingface

tloen commented 1 year ago

There should be a line in the script specifying the weight path, just point that to a local directory (e.g. './lora-alpaca')

alexl83 commented 1 year ago

processing, thanks! last doubt: given I changed target modules to train, is the export_hf script able to merge them ?


  "target_modules": [
    "q_proj",
    "k_proj",
    "v_proj",
    "o_proj"
  ], 
amcl commented 1 year ago

Edited script pointing to local lora_model and everything appears to be loading correctly.

I'm getting this Assertion Error:

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 33/33 [00:09<00:00,  3.56it/s]
Traceback (most recent call last):
  File "/home/alpaca-lora/export_hf_checkpoint.py", line 46, in <module>
    assert not torch.allclose(first_weight_old, first_weight)

**Off-topic, "deloreanized" was a nice touch.

xxxiaol commented 1 year ago

Edited script pointing to local lora_model and everything appears to be loading correctly.

I'm getting this Assertion Error:

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 33/33 [00:09<00:00,  3.56it/s]
Traceback (most recent call last):
  File "/home/alpaca-lora/export_hf_checkpoint.py", line 46, in <module>
    assert not torch.allclose(first_weight_old, first_weight)

**Off-topic, "deloreanized" was a nice touch.

I also encounter this error. Have you solved it?

bupticybee commented 1 year ago

Same error here, @xxxiaol @amcl did you guys solve it?

xxxiaol commented 1 year ago

Same error here, @xxxiaol @amcl did you guys solve it?

I just remove the assertion line. Not sure if it works fine.

bupticybee commented 1 year ago

Same error here, @xxxiaol @amcl did you guys solve it?

I just remove the assertion line. Not sure if it works fine.

I don't think that's the right way to solve things. The assertion is there to ensure lora weight is applied.

xxxiaol commented 1 year ago

@bupticybee For me, this error occurs because the lora weights are not successfully saved. I reinstall peft in the version mentioned in #293, rerun the finetune code, and the assertion error is solved. I hope it is helpful for you.

Mihaiii commented 1 year ago

I reinstall peft in the version mentioned in https://github.com/tloen/alpaca-lora/issues/293, rerun the finetune code, and the assertion error is solved.

I still have this problem even after using peft @ e536616888d51b453ed354a6f1e243fecb02ea08 and redoing the training. Any other hints?

agentfunk commented 1 year ago

Can you compare the size of adapter_model.bin in the checkpoint vs pytorch_model.bin in the checkpoint folder? There seems to be an issue with the peft lib when saving the final model. see #390