Closed yusufcakmakk closed 1 year ago
Hi,
Have you had a chance to check?
What is the file size of adapter_model.bin
?
Can you print the shapes of the tensors in it?
import torch
sd = torch.load('/data/llama/Chinese-LLaMA-Alpaca/scripts/training/pt_output_dir/pt_lora_model/adapter_model.bin',map_location='cpu')
for k,v in sd.items():
print(k,v.shape)
What is the file size of
adapter_model.bin
? Can you print the shapes of the tensors in it?import torch sd = torch.load('/data/llama/Chinese-LLaMA-Alpaca/scripts/training/pt_output_dir/pt_lora_model/adapter_model.bin',map_location='cpu') for k,v in sd.items(): print(k,v.shape)
file sizes:
$ ls -lrt
total 891932
912430269 Jul 29 16:03 adapter_model.bin
507 Jul 29 16:03 adapter_config.json
747 Jul 29 16:03 tokenizer_config.json
411 Jul 29 16:03 special_tokens_map.json
889985 Jul 29 16:03 tokenizer.model
Output of the mentioned code:
base_model.model.model.layers.0.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.0.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.0.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.0.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.0.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.0.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.0.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.0.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.0.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.0.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.0.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.0.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.0.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.0.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.1.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.1.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.1.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.1.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.1.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.1.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.1.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.1.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.1.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.1.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.1.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.1.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.1.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.1.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.2.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.2.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.2.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.2.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.2.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.2.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.2.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.2.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.2.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.2.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.2.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.2.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.2.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.2.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.3.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.3.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.3.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.3.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.3.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.3.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.3.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.3.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.3.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.3.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.3.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.3.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.3.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.3.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.4.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.4.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.4.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.4.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.4.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.4.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.4.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.4.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.4.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.4.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.4.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.4.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.4.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.4.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.5.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.5.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.5.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.5.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.5.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.5.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.5.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.5.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.5.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.5.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.5.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.5.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.5.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.5.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.6.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.6.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.6.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.6.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.6.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.6.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.6.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.6.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.6.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.6.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.6.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.6.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.6.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.6.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.7.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.7.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.7.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.7.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.7.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.7.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.7.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.7.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.7.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.7.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.7.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.7.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.7.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.7.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.8.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.8.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.8.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.8.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.8.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.8.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.8.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.8.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.8.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.8.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.8.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.8.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.8.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.8.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.9.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.9.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.9.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.9.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.9.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.9.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.9.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.9.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.9.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.9.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.9.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.9.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.9.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.9.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.10.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.10.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.10.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.10.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.10.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.10.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.10.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.10.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.10.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.10.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.10.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.10.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.10.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.10.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.11.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.11.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.11.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.11.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.11.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.11.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.11.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.11.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.11.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.11.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.11.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.11.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.11.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.11.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.12.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.12.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.12.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.12.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.12.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.12.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.12.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.12.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.12.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.12.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.12.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.12.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.12.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.12.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.13.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.13.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.13.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.13.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.13.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.13.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.13.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.13.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.13.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.13.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.13.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.13.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.13.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.13.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.14.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.14.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.14.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.14.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.14.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.14.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.14.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.14.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.14.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.14.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.14.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.14.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.14.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.14.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.15.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.15.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.15.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.15.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.15.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.15.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.15.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.15.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.15.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.15.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.15.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.15.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.15.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.15.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.16.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.16.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.16.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.16.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.16.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.16.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.16.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.16.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.16.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.16.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.16.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.16.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.16.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.16.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.17.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.17.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.17.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.17.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.17.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.17.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.17.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.17.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.17.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.17.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.17.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.17.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.17.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.17.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.18.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.18.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.18.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.18.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.18.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.18.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.18.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.18.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.18.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.18.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.18.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.18.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.18.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.18.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.19.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.19.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.19.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.19.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.19.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.19.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.19.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.19.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.19.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.19.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.19.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.19.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.19.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.19.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.20.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.20.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.20.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.20.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.20.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.20.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.20.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.20.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.20.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.20.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.20.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.20.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.20.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.20.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.21.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.21.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.21.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.21.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.21.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.21.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.21.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.21.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.21.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.21.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.21.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.21.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.21.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.21.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.22.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.22.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.22.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.22.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.22.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.22.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.22.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.22.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.22.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.22.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.22.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.22.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.22.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.22.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.23.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.23.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.23.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.23.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.23.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.23.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.23.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.23.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.23.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.23.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.23.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.23.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.23.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.23.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.24.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.24.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.24.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.24.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.24.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.24.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.24.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.24.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.24.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.24.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.24.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.24.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.24.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.24.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.25.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.25.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.25.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.25.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.25.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.25.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.25.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.25.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.25.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.25.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.25.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.25.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.25.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.25.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.26.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.26.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.26.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.26.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.26.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.26.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.26.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.26.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.26.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.26.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.26.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.26.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.26.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.26.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.27.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.27.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.27.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.27.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.27.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.27.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.27.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.27.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.27.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.27.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.27.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.27.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.27.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.27.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.28.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.28.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.28.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.28.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.28.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.28.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.28.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.28.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.28.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.28.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.28.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.28.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.28.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.28.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.29.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.29.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.29.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.29.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.29.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.29.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.29.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.29.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.29.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.29.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.29.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.29.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.29.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.29.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.30.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.30.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.30.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.30.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.30.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.30.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.30.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.30.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.30.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.30.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.30.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.30.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.30.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.30.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.31.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.31.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.31.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.31.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.31.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.31.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.31.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.31.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.31.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.31.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.31.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.31.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.31.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.31.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.embed_tokens.weight torch.Size([53246, 4096])
base_model.model.lm_head.weight torch.Size([53246, 4096])
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.
Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.
Is there any update ?
hi @yusufcakmakk I've encountered the same problems. Have you solved this?
hi @yusufcakmakk I've encountered the same problems. Have you solved this?
I have followed thee following version: https://github.com/ymcui/Chinese-LLaMA-Alpaca-2.git It was solved in this repo.
Check before submitting issues
Type of Issue
Model conversion and merging
Base Model
LLaMA-7B
Operating System
Linux
Describe your issue in detail
The following configs are used to start run_pt.sh
The following code is used to merge with base model:
Dependencies (must be provided for code-related issues)
Execution logs or screenshots
Is it possible to merge pretrained lora weights with base model?