ymcui / Chinese-LLaMA-Alpaca

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki
Apache License 2.0
18.23k stars 1.86k forks source link

run_pt.sh output couldnt be merged with base model #797

Closed yusufcakmakk closed 1 year ago

yusufcakmakk commented 1 year ago

Check before submitting issues

Type of Issue

Model conversion and merging

Base Model

LLaMA-7B

Operating System

Linux

Describe your issue in detail

The following configs are used to start run_pt.sh

lr=2e-4
lora_rank=8
lora_alpha=32
lora_trainable="q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
modules_to_save="embed_tokens,lm_head"
lora_dropout=0.05

pretrained_model=/data/llama/decapoda-research_7b-hf-model
chinese_tokenizer_path=/data/llama/merged_tokenizer_hf-25k
hugginface_cache_dir=/data/huggingface_cache
dataset_dir=/data/datasets/tokenizer_text_dir
data_cache=/data/llama/Chinese-LLaMA-Alpaca/scripts/training/temp_data_cache_dir
per_device_train_batch_size=32
per_device_eval_batch_size=16
gradient_accumulation_steps=16
output_dir=/data/llama/Chinese-LLaMA-Alpaca/scripts/training/pt_output_dir

deepspeed_config_file=ds_zero2_no_offload.json

export NCCL_DEBUG=INFO
export NCCL_SOCKET_IFNAME=team0

torchrun --nnodes 1 --nproc_per_node 2 run_clm_pt_with_peft.py \
    --deepspeed ${deepspeed_config_file} \
    --model_name_or_path ${pretrained_model} \
    --tokenizer_name_or_path ${chinese_tokenizer_path} \
    --dataset_dir ${dataset_dir} \
    --data_cache_dir ${data_cache} \
    --validation_split_percentage 0.001 \
    --per_device_train_batch_size ${per_device_train_batch_size} \
    --per_device_eval_batch_size ${per_device_eval_batch_size} \
    --gradient_accumulation_steps ${gradient_accumulation_steps} \
    --do_train \
    --seed $RANDOM \
    --fp16 \
    --num_train_epochs 1 \
    --max_train_samples 600 \
    --max_eval_samples 100 \
    --lr_scheduler_type cosine \
    --learning_rate ${lr} \
    --warmup_ratio 0.05 \
    --weight_decay 0.01 \
    --logging_strategy steps \
    --logging_steps 5 \
    --save_strategy steps \
    --modules_to_save ${modules_to_save} \
    --gradient_checkpointing \
    --save_total_limit 3 \
    --save_steps 5 \
    --preprocessing_num_workers 32 \
    --block_size 1024 \
    --output_dir ${output_dir} \
    --overwrite_output_dir \
    --ddp_timeout 30000 \
    --logging_first_step True \
    --lora_rank ${lora_rank} \
    --lora_alpha ${lora_alpha} \
    --trainable ${lora_trainable} \
    --lora_dropout ${lora_dropout} \
    --torch_dtype float16 \
    --ddp_find_unused_parameters False

The following code is used to merge with base model:

python merge_llama_with_chinese_lora.py \
    --base_model /data/llama/decapoda-research_7b-hf-model \
    --lora_model /data/llama/Chinese-LLaMA-Alpaca/scripts/training/pt_output_dir/pt_lora_model \
    --output_type huggingface \
    --output_dir /data/llama/Chinese-LLaMA-Alpaca/scripts/merge_dir/llama-merge-hf-test

Dependencies (must be provided for code-related issues)

Package                       Version     Editable project location
----------------------------- ----------- -----------------------------------
accelerate                    0.21.0
aiohttp                       3.8.4
aiosignal                     1.3.1
anyio                         3.7.1
argon2-cffi                   21.3.0
argon2-cffi-bindings          21.2.0
asttokens                     2.2.1
async-lru                     2.0.4
async-timeout                 4.0.2
attrs                         23.1.0
Babel                         2.12.1
backcall                      0.2.0
backports.functools-lru-cache 1.6.5
beautifulsoup4                4.12.2
bitsandbytes                  0.41.0
bleach                        6.0.0
boltons                       23.0.0
brotlipy                      0.7.0
certifi                       2023.7.22
cffi                          1.15.1
charset-normalizer            2.0.4
cmake                         3.26.4
comm                          0.1.3
conda                         23.7.2
conda-content-trust           0.1.3
conda-package-handling        2.0.2
conda_package_streaming       0.7.0
cryptography                  39.0.1
datasets                      2.14.1
debugpy                       1.6.7
decorator                     5.1.1
deepspeed                     0.10.0
defusedxml                    0.7.1
dill                          0.3.6
entrypoints                   0.4
exceptiongroup                1.1.2
executing                     1.2.0
fastjsonschema                2.18.0
filelock                      3.12.2
flit_core                     3.9.0
frozenlist                    1.3.3
fsspec                        2023.6.0
hjson                         3.1.0
huggingface-hub               0.15.1
idna                          3.4
importlib-metadata            6.8.0
importlib-resources           6.0.0
iniconfig                     2.0.0
ipykernel                     6.23.2
ipython                       8.14.0
ipython-genutils              0.2.0
ipywidgets                    8.0.7
jedi                          0.18.2
Jinja2                        3.1.2
joblib                        1.2.0
json5                         0.9.14
jsonpatch                     1.32
jsonpointer                   2.1
jsonschema                    4.17.3
jupyter_client                8.3.0
jupyter_core                  5.3.1
jupyter-events                0.6.3
jupyter-lsp                   2.2.0
jupyter_server                2.7.0
jupyter_server_terminals      0.4.4
jupyterlab                    4.0.3
jupyterlab-pygments           0.2.2
jupyterlab_server             2.24.0
jupyterlab-widgets            3.0.7
lit                           16.0.6
MarkupSafe                    2.1.3
matplotlib-inline             0.1.6
mistune                       3.0.0
mpmath                        1.3.0
multidict                     6.0.4
multiprocess                  0.70.14
nbclient                      0.8.0
nbconvert                     7.7.3
nbformat                      5.9.1
nest-asyncio                  1.5.6
networkx                      3.1
ninja                         1.11.1
notebook_shim                 0.2.3
numpy                         1.25.0
nvidia-cublas-cu11            11.10.3.66
nvidia-cuda-cupti-cu11        11.7.101
nvidia-cuda-nvrtc-cu11        11.7.99
nvidia-cuda-runtime-cu11      11.7.99
nvidia-cudnn-cu11             8.5.0.96
nvidia-cufft-cu11             10.9.0.58
nvidia-curand-cu11            10.2.10.91
nvidia-cusolver-cu11          11.4.0.1
nvidia-cusparse-cu11          11.7.4.91
nvidia-nccl-cu11              2.14.3
nvidia-nvtx-cu11              11.7.91
overrides                     7.3.1
packaging                     23.0
pandas                        2.0.2
pandocfilters                 1.5.0
parso                         0.8.3
peft                          0.3.0.dev0  /data/llama/peft_13e53fc
pexpect                       4.8.0
pickleshare                   0.7.5
Pillow                        9.5.0
pip                           23.0.1
pkgutil_resolve_name          1.3.10
platformdirs                  3.9.1
pluggy                        1.0.0
prometheus-client             0.17.1
prompt-toolkit                3.0.39
protobuf                      4.23.4
psutil                        5.9.5
ptyprocess                    0.7.0
pure-eval                     0.2.2
py-cpuinfo                    9.0.0
pyarrow                       12.0.1
pycosat                       0.6.4
pycparser                     2.21
pydantic                      1.10.9
Pygments                      2.15.1
pyOpenSSL                     23.0.0
pyrsistent                    0.18.0
PySocks                       1.7.1
pytest                        7.4.0
python-dateutil               2.8.2
python-json-logger            2.0.7
pytz                          2023.3
PyYAML                        6.0
pyzmq                         25.1.0
regex                         2023.6.3
requests                      2.28.1
rfc3339-validator             0.1.4
rfc3986-validator             0.1.1
ruamel.yaml                   0.17.21
ruamel.yaml.clib              0.2.6
safetensors                   0.3.1
scikit-learn                  1.3.0
scipy                         1.11.1
Send2Trash                    1.8.2
sentencepiece                 0.1.99
setuptools                    65.6.3
six                           1.16.0
sniffio                       1.3.0
soupsieve                     2.3.2.post1
stack-data                    0.6.2
sympy                         1.12
terminado                     0.17.1
threadpoolctl                 3.1.0
tinycss2                      1.2.1
tokenizers                    0.13.3
tomli                         2.0.1
toolz                         0.12.0
torch                         2.0.1
torchaudio                    2.0.2
torchvision                   0.15.2
tornado                       6.3.2
tqdm                          4.65.0
traitlets                     5.9.0
transformers                  4.30.0
triton                        2.0.0
typing_extensions             4.7.1
typing-utils                  0.1.0
tzdata                        2023.3
urllib3                       1.26.15
wcwidth                       0.2.6
webencodings                  0.5.1
websocket-client              1.6.1
wheel                         0.38.4
widgetsnbextension            4.0.7
xxhash                        3.2.0
yarl                          1.9.2
zipp                          3.16.2
zstandard                     0.19.0

Execution logs or screenshots

[2023-07-29 16:05:48,295] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Base model: /data/llama/decapoda-research_7b-hf-model
LoRA model(s) ['/data/llama/Chinese-LLaMA-Alpaca/scripts/training/pt_output_dir/pt_lora_model']:
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:08<00:00,  2.68s/it]
Peft version: 0.3.0.dev0
Loading LoRA for 7B model
Loading LoRA /data/llama/Chinese-LLaMA-Alpaca/scripts/training/pt_output_dir/pt_lora_model...
base_model vocab size: 32000
tokenizer vocab size: 53246
Extended vocabulary size to 53246
Loading LoRA weights
merging base_model.model.model.embed_tokens.weight
merging base_model.model.lm_head.weight
merging base_model.model.model.layers.0.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.0.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.0.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.0.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.0.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.0.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.0.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.1.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.1.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.1.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.1.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.1.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.1.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.1.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.2.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.2.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.2.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.2.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.2.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.2.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.2.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.3.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.3.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.3.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.3.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.3.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.3.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.3.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.4.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.4.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.4.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.4.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.4.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.4.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.4.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.5.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.5.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.5.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.5.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.5.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.5.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.5.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.6.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.6.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.6.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.6.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.6.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.6.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.6.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.7.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.7.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.7.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.7.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.7.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.7.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.7.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.8.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.8.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.8.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.8.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.8.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.8.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.8.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.9.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.9.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.9.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.9.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.9.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.9.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.9.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.10.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.10.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.10.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.10.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.10.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.10.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.10.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.11.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.11.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.11.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.11.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.11.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.11.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.11.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.12.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.12.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.12.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.12.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.12.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.12.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.12.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.13.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.13.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.13.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.13.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.13.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.13.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.13.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.14.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.14.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.14.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.14.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.14.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.14.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.14.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.15.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.15.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.15.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.15.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.15.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.15.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.15.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.16.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.16.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.16.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.16.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.16.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.16.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.16.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.17.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.17.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.17.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.17.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.17.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.17.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.17.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.18.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.18.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.18.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.18.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.18.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.18.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.18.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.19.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.19.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.19.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.19.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.19.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.19.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.19.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.20.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.20.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.20.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.20.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.20.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.20.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.20.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.21.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.21.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.21.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.21.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.21.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.21.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.21.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.22.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.22.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.22.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.22.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.22.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.22.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.22.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.23.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.23.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.23.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.23.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.23.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.23.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.23.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.24.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.24.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.24.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.24.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.24.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.24.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.24.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.25.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.25.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.25.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.25.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.25.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.25.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.25.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.26.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.26.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.26.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.26.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.26.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.26.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.26.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.27.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.27.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.27.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.27.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.27.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.27.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.27.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.28.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.28.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.28.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.28.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.28.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.28.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.28.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.29.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.29.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.29.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.29.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.29.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.29.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.29.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.30.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.30.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.30.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.30.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.30.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.30.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.30.mlp.up_proj.lora_A.weight
merging base_model.model.model.layers.31.self_attn.q_proj.lora_A.weight
merging base_model.model.model.layers.31.self_attn.k_proj.lora_A.weight
merging base_model.model.model.layers.31.self_attn.v_proj.lora_A.weight
merging base_model.model.model.layers.31.self_attn.o_proj.lora_A.weight
merging base_model.model.model.layers.31.mlp.gate_proj.lora_A.weight
merging base_model.model.model.layers.31.mlp.down_proj.lora_A.weight
merging base_model.model.model.layers.31.mlp.up_proj.lora_A.weight
Traceback (most recent call last):
  File "/data/llama/Chinese-LLaMA-Alpaca/scripts/merge_llama_with_chinese_lora.py", line 327, in <module>
    assert not torch.allclose(first_weight_old, first_weight)
AssertionError

Is it possible to merge pretrained lora weights with base model?

yusufcakmakk commented 1 year ago

Hi,

Have you had a chance to check?

airaria commented 1 year ago

What is the file size of adapter_model.bin? Can you print the shapes of the tensors in it?

import torch
sd = torch.load('/data/llama/Chinese-LLaMA-Alpaca/scripts/training/pt_output_dir/pt_lora_model/adapter_model.bin',map_location='cpu')
for k,v in sd.items():
    print(k,v.shape)
yusufcakmakk commented 1 year ago

What is the file size of adapter_model.bin? Can you print the shapes of the tensors in it?

import torch
sd = torch.load('/data/llama/Chinese-LLaMA-Alpaca/scripts/training/pt_output_dir/pt_lora_model/adapter_model.bin',map_location='cpu')
for k,v in sd.items():
    print(k,v.shape)

file sizes:

$ ls -lrt
total 891932
912430269 Jul 29 16:03 adapter_model.bin
      507 Jul 29 16:03 adapter_config.json
      747 Jul 29 16:03 tokenizer_config.json
      411 Jul 29 16:03 special_tokens_map.json
   889985 Jul 29 16:03 tokenizer.model

Output of the mentioned code:

base_model.model.model.layers.0.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.0.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.0.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.0.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.0.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.0.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.0.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.0.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.0.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.0.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.0.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.0.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.0.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.0.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.1.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.1.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.1.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.1.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.1.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.1.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.1.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.1.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.1.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.1.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.1.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.1.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.1.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.1.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.2.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.2.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.2.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.2.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.2.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.2.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.2.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.2.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.2.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.2.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.2.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.2.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.2.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.2.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.3.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.3.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.3.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.3.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.3.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.3.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.3.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.3.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.3.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.3.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.3.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.3.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.3.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.3.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.4.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.4.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.4.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.4.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.4.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.4.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.4.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.4.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.4.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.4.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.4.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.4.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.4.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.4.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.5.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.5.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.5.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.5.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.5.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.5.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.5.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.5.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.5.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.5.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.5.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.5.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.5.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.5.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.6.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.6.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.6.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.6.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.6.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.6.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.6.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.6.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.6.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.6.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.6.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.6.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.6.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.6.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.7.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.7.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.7.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.7.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.7.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.7.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.7.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.7.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.7.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.7.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.7.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.7.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.7.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.7.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.8.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.8.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.8.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.8.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.8.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.8.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.8.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.8.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.8.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.8.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.8.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.8.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.8.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.8.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.9.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.9.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.9.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.9.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.9.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.9.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.9.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.9.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.9.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.9.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.9.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.9.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.9.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.9.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.10.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.10.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.10.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.10.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.10.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.10.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.10.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.10.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.10.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.10.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.10.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.10.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.10.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.10.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.11.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.11.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.11.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.11.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.11.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.11.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.11.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.11.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.11.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.11.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.11.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.11.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.11.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.11.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.12.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.12.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.12.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.12.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.12.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.12.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.12.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.12.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.12.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.12.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.12.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.12.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.12.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.12.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.13.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.13.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.13.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.13.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.13.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.13.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.13.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.13.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.13.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.13.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.13.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.13.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.13.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.13.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.14.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.14.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.14.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.14.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.14.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.14.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.14.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.14.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.14.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.14.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.14.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.14.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.14.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.14.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.15.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.15.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.15.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.15.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.15.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.15.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.15.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.15.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.15.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.15.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.15.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.15.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.15.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.15.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.16.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.16.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.16.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.16.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.16.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.16.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.16.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.16.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.16.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.16.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.16.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.16.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.16.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.16.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.17.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.17.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.17.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.17.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.17.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.17.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.17.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.17.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.17.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.17.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.17.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.17.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.17.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.17.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.18.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.18.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.18.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.18.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.18.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.18.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.18.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.18.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.18.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.18.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.18.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.18.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.18.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.18.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.19.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.19.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.19.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.19.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.19.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.19.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.19.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.19.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.19.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.19.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.19.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.19.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.19.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.19.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.20.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.20.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.20.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.20.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.20.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.20.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.20.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.20.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.20.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.20.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.20.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.20.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.20.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.20.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.21.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.21.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.21.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.21.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.21.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.21.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.21.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.21.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.21.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.21.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.21.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.21.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.21.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.21.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.22.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.22.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.22.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.22.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.22.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.22.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.22.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.22.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.22.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.22.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.22.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.22.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.22.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.22.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.23.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.23.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.23.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.23.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.23.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.23.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.23.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.23.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.23.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.23.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.23.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.23.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.23.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.23.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.24.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.24.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.24.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.24.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.24.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.24.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.24.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.24.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.24.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.24.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.24.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.24.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.24.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.24.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.25.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.25.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.25.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.25.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.25.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.25.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.25.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.25.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.25.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.25.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.25.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.25.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.25.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.25.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.26.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.26.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.26.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.26.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.26.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.26.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.26.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.26.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.26.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.26.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.26.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.26.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.26.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.26.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.27.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.27.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.27.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.27.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.27.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.27.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.27.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.27.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.27.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.27.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.27.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.27.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.27.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.27.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.28.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.28.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.28.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.28.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.28.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.28.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.28.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.28.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.28.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.28.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.28.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.28.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.28.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.28.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.29.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.29.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.29.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.29.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.29.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.29.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.29.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.29.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.29.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.29.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.29.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.29.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.29.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.29.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.30.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.30.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.30.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.30.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.30.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.30.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.30.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.30.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.30.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.30.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.30.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.30.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.30.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.30.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.31.self_attn.q_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.31.self_attn.q_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.31.self_attn.k_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.31.self_attn.k_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.31.self_attn.v_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.31.self_attn.v_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.31.self_attn.o_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.31.self_attn.o_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.31.mlp.gate_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.31.mlp.gate_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.layers.31.mlp.down_proj.lora_A.weight torch.Size([8, 11008])
base_model.model.model.layers.31.mlp.down_proj.lora_B.weight torch.Size([4096, 8])
base_model.model.model.layers.31.mlp.up_proj.lora_A.weight torch.Size([8, 4096])
base_model.model.model.layers.31.mlp.up_proj.lora_B.weight torch.Size([11008, 8])
base_model.model.model.embed_tokens.weight torch.Size([53246, 4096])
base_model.model.lm_head.weight torch.Size([53246, 4096])
github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

github-actions[bot] commented 1 year ago

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.

yusufcakmakk commented 1 year ago

Is there any update ?

phamkhactu commented 1 year ago

hi @yusufcakmakk I've encountered the same problems. Have you solved this?

yusufcakmakk commented 10 months ago

hi @yusufcakmakk I've encountered the same problems. Have you solved this?

I have followed thee following version: https://github.com/ymcui/Chinese-LLaMA-Alpaca-2.git It was solved in this repo.