Closed athu16 closed 1 year ago
Where you able to find the torch/all.h or the torch/python.h files? And what IDE do you use if any?
Mine fails a lot less verbosely on Windows:
(chatbots) PS C:\Users\lxe\llama-webui-gptq\GPTQ-for-LLaMa> python setup_cuda.py install --verbose
running install
C:\Users\lxe\miniconda3\envs\chatbots\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
C:\Users\lxe\miniconda3\envs\chatbots\lib\site-packages\setuptools\command\easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running bdist_egg
running egg_info
writing quant_cuda.egg-info\PKG-INFO
writing dependency_links to quant_cuda.egg-info\dependency_links.txt
writing top-level names to quant_cuda.egg-info\top_level.txt
reading manifest file 'quant_cuda.egg-info\SOURCES.txt'
writing manifest file 'quant_cuda.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_ext
error: [WinError 2] The system cannot find the file specified
All the compilers are there:
(chatbots) PS C:\Users\lxe\llama-webui-gptq\GPTQ-for-LLaMa> cl
Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30147 for x64
Copyright (C) Microsoft Corporation. All rights reserved.
usage: cl [ option... ] filename... [ /link linkoption... ]
(chatbots) PS C:\Users\lxe\llama-webui-gptq\GPTQ-for-LLaMa> nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_19:00:59_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
(chatbots) PS C:\Users\lxe\llama-webui-gptq\GPTQ-for-LLaMa> python -V
Python 3.10.9
EDIT This comment is linked from elsewhere. Here's a more coherent guide: https://gist.github.com/lxe/82eb87db25fdb75b92fa18a6d494ee3c
I had to downgrade cuda and torch and was able to compile. Here's my full process on windows:
powershell -ExecutionPolicy ByPass -NoExit -Command "& 'C:\Users\lxe\miniconda3\shell\condabin\conda-hook.ps1' ; conda activate 'C:\Users\lxe\miniconda3' "
conda create -n gptq
conda activate gptq
conda install cuda -c nvidia/label/cuda-11.3.0 -c nvidia/label/cuda-11.3.1
conda install pip
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git
git clone https://github.com/zphang/transformers.git
pip install ./transformers
pip install torch==1.12+cu113 -f https://download.pytorch.org/whl/torch_stable.html
cd GPTQ-for-LLaMa
$env:DISTUTILS_USE_SDK=1
python setup_cuda.py install
When using the webui, make sure it's in the same env. If it overwrites torch, you'll have to do it again manually.
RuntimeError: The current installed version of g++ (11.3.0) is greater than the maximum required version by CUDA 11.3 (10.0.0). Please make sure to use an adequate version of g++ (>=5.0.0, <=10.0.0).
This is making me downgrade my g++ is there a way to that inside a conda environment you know of? sudo apt-get remove gcc g++ -- this link is system wide. conda install -c conda-forge gcc=9 -- I tried setting gcc=9 and it installed but I still got the error I guess on windows you have Visual Studio so you probably don't need to do this. It looks promising thank you
I think the real issue is not properly using/installing Libtorch. DId you install that successfully if so how?
I had to downgrade cuda and torch and was able to compile. Here's my full process on windows:
1. Install Build Tools for Visual Studio 2019 (**has to be 2019**) [here](https://visualstudio.microsoft.com/downloads/#remote-tools-for-visual-studio-2022) 2. Install [miniconda](https://docs.conda.io/en/latest/miniconda.html) 3. Open "x64 native tools command prompt" 4. Activate conda via `powershell -ExecutionPolicy ByPass -NoExit -Command "& 'C:\Users\lxe\miniconda3\shell\condabin\conda-hook.ps1' ; conda activate 'C:\Users\lxe\miniconda3' "` 5. `conda create -n gptq` 6. `conda activate gptq` 7. `conda install cuda -c nvidia/label/cuda-11.3.0 -c nvidia/label/cuda-11.3.1` 8. `conda install pip` 9. `git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git` 10. `git clone https://github.com/zphang/transformers.git` 11. `pip install ./transformers` 12. `pip install torch==1.12+cu113 -f https://download.pytorch.org/whl/torch_stable.html` 13. `cd GPTQ-for-LLaMa` 14. `$env:DISTUTILS_USE_SDK=1` 15. `python setup_cuda.py install`
When using the webui, make sure it's in the same env. If it overwrites torch, you'll have to do it again manually.
I'm getting this error even explicitly following those steps. No idea what's causing it:
error: too few arguments for template template parameter "Tuple" detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]" (1507): here
I had to downgrade cuda and torch and was able to compile. Here's my full process on windows:
Thanks a lot for this! I still got a lot of errors during compilation, but at the end, it said this
Finished processing dependencies for quant-cuda==0.0.0
Does that mean it built successfully?
yes
@lxe
git clone https://github.com/zphang/transformers.git
This repo only contains a readme.md:
March 6th, 2019:
Repository has been moved to https://github.com/zphang/bert_on_stilts
Should we use the one mentioned in the readme.md, which is also from March 2019? I doubt.
If not, which transformers repo should we install? The live one, or the one with the llama push via git clone --branch llama_push https://github.com/zphang/transformers.git
?
I had to downgrade cuda and torch and was able to compile. Here's my full process on windows:
1. Install Build Tools for Visual Studio 2019 (**has to be 2019**) [here](https://visualstudio.microsoft.com/downloads/#remote-tools-for-visual-studio-2022) 2. Install [miniconda](https://docs.conda.io/en/latest/miniconda.html) 3. Open "x64 native tools command prompt" 4. Activate conda via `powershell -ExecutionPolicy ByPass -NoExit -Command "& 'C:\Users\lxe\miniconda3\shell\condabin\conda-hook.ps1' ; conda activate 'C:\Users\lxe\miniconda3' "` 5. `conda create -n gptq` 6. `conda activate gptq` 7. `conda install cuda -c nvidia/label/cuda-11.3.0 -c nvidia/label/cuda-11.3.1` 8. `conda install pip` 9. `git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git` 10. `git clone https://github.com/zphang/transformers.git` 11. `pip install ./transformers` 12. `pip install torch==1.12+cu113 -f https://download.pytorch.org/whl/torch_stable.html` 13. `cd GPTQ-for-LLaMa` 14. `$env:DISTUTILS_USE_SDK=1` 15. `python setup_cuda.py install`
When using the webui, make sure it's in the same env. If it overwrites torch, you'll have to do it again manually.
followed the steps, got Finished processing dependencies for quant-cuda==0.0.0
but when running webui I get:
Starting the web UI... Loading the extension "gallery"... Ok. Loading llama-7b... CUDA extension not installed. Loading model ... Traceback (most recent call last): File "D:\MachineLearning\TextWebui\text-generation-webui\server.py", line 194, in
shared.model, shared.tokenizer = load_model(shared.model_name) File "D:\MachineLearning\TextWebui\text-generation-webui\modules\models.py", line 119, in load_model model = load_quant(path_to_model, Path(f"models/{pt_model}"), 4) File "D:\MachineLearning\TextWebui\text-generation-webui\repositories\GPTQ-for-LLaMa\llama.py", line 241, in load_quant model.load_state_dict(torch.load(checkpoint)) File "D:\MachineLearning\TextWebui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1671, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for LLaMAForCausalLM: Missing key(s) in state_dict: "model.decoder.embed_tokens.weight", "model.decoder.layers.0.self_attn.q_proj.zeros", "model.decoder.layers.0.self_attn.q_proj.scales", "model.decoder.layers.0.self_attn.q_proj.bias", "model.decoder.layers.0.self_attn.q_proj.qweight", "model.decoder.layers.0.self_attn.k_proj.zeros", "model.decoder.layers.0.self_attn.k_proj.scales", "model.decoder.layers.0.self_attn.k_proj.bias", "model.decoder.layers.0.self_attn.k_proj.qweight", "model.decoder.layers.0.self_attn.v_proj.zeros", "model.decoder.layers.0.self_attn.v_proj.scales", "model.decoder.layers.0.self_attn.v_proj.bias", "model.decoder.layers.0.self_attn.v_proj.qweight", "model.decoder.layers.0.self_attn.o_proj.zeros", "model.decoder.layers.0.self_attn.o_proj.scales", "model.decoder.layers.0.self_attn.o_proj.bias", "model.decoder.layers.0.self_attn.o_proj.qweight", "model.decoder.layers.0.feed_forward.w1.zeros", "model.decoder.layers.0.feed_forward.w1.scales", "model.decoder.layers.0.feed_forward.w1.bias", "model.decoder.layers.0.feed_forward.w1.qweight", "model.decoder.layers.0.feed_forward.w2.zeros", "model.decoder.layers.0.feed_forward.w2.scales", "model.decoder.layers.0.feed_forward.w2.bias", "model.decoder.layers.0.feed_forward.w2.qweight", "model.decoder.layers.0.feed_forward.w3.zeros", "model.decoder.layers.0.feed_forward.w3.scales", "model.decoder.layers.0.feed_forward.w3.bias", "model.decoder.layers.0.feed_forward.w3.qweight", "model.decoder.layers.0.attention_norm.weight", "model.decoder.layers.0.ffn_norm.weight", "model.decoder.layers.1.self_attn.q_proj.zeros", "model.decoder.layers.1.self_attn.q_proj.scales", "model.decoder.layers.1.self_attn.q_proj.bias", "model.decoder.layers.1.self_attn.q_proj.qweight", "model.decoder.layers.1.self_attn.k_proj.zeros", "model.decoder.layers.1.self_attn.k_proj.scales", "model.decoder.layers.1.self_attn.k_proj.bias", "model.decoder.layers.1.self_attn.k_proj.qweight", "model.decoder.layers.1.self_attn.v_proj.zeros", "model.decoder.layers.1.self_attn.v_proj.scales", "model.decoder.layers.1.self_attn.v_proj.bias", "model.decoder.layers.1.self_attn.v_proj.qweight", "model.decoder.layers.1.self_attn.o_proj.zeros", "model.decoder.layers.1.self_attn.o_proj.scales", "model.decoder.layers.1.self_attn.o_proj.bias", "model.decoder.layers.1.self_attn.o_proj.qweight", "model.decoder.layers.1.feed_forward.w1.zeros", "model.decoder.layers.1.feed_forward.w1.scales", "model.decoder.layers.1.feed_forward.w1.bias", "model.decoder.layers.1.feed_forward.w1.qweight", "model.decoder.layers.1.feed_forward.w2.zeros", "model.decoder.layers.1.feed_forward.w2.scales", "model.decoder.layers.1.feed_forward.w2.bias", "model.decoder.layers.1.feed_forward.w2.qweight", "model.decoder.layers.1.feed_forward.w3.zeros", "model.decoder.layers.1.feed_forward.w3.scales", "model.decoder.layers.1.feed_forward.w3.bias", "model.decoder.layers.1.feed_forward.w3.qweight", "model.decoder.layers.1.attention_norm.weight", "model.decoder.layers.1.ffn_norm.weight", "model.decoder.layers.2.self_attn.q_proj.zeros", "model.decoder.layers.2.self_attn.q_proj.scales", "model.decoder.layers.2.self_attn.q_proj.bias", "model.decoder.layers.2.self_attn.q_proj.qweight", "model.decoder.layers.2.self_attn.k_proj.zeros", "model.decoder.layers.2.self_attn.k_proj.scales", "model.decoder.layers.2.self_attn.k_proj.bias", "model.decoder.layers.2.self_attn.k_proj.qweight", "model.decoder.layers.2.self_attn.v_proj.zeros", "model.decoder.layers.2.self_attn.v_proj.scales", "model.decoder.layers.2.self_attn.v_proj.bias", "model.decoder.layers.2.self_attn.v_proj.qweight", "model.decoder.layers.2.self_attn.o_proj.zeros", "model.decoder.layers.2.self_attn.o_proj.scales", "model.decoder.layers.2.self_attn.o_proj.bias", "model.decoder.layers.2.self_attn.o_proj.qweight", "model.decoder.layers.2.feed_forward.w1.zeros", "model.decoder.layers.2.feed_forward.w1.scales", "model.decoder.layers.2.feed_forward.w1.bias", "model.decoder.layers.2.feed_forward.w1.qweight", "model.decoder.layers.2.feed_forward.w2.zeros", "model.decoder.layers.2.feed_forward.w2.scales", "model.decoder.layers.2.feed_forward.w2.bias", "model.decoder.layers.2.feed_forward.w2.qweight", "model.decoder.layers.2.feed_forward.w3.zeros", "model.decoder.layers.2.feed_forward.w3.scales", "model.decoder.layers.2.feed_forward.w3.bias", "model.decoder.layers.2.feed_forward.w3.qweight", "model.decoder.layers.2.attention_norm.weight", "model.decoder.layers.2.ffn_norm.weight", "model.decoder.layers.3.self_attn.q_proj.zeros", "model.decoder.layers.3.self_attn.q_proj.scales", "model.decoder.layers.3.self_attn.q_proj.bias", "model.decoder.layers.3.self_attn.q_proj.qweight", "model.decoder.layers.3.self_attn.k_proj.zeros", "model.decoder.layers.3.self_attn.k_proj.scales", "model.decoder.layers.3.self_attn.k_proj.bias", "model.decoder.layers.3.self_attn.k_proj.qweight", "model.decoder.layers.3.self_attn.v_proj.zeros", "model.decoder.layers.3.self_attn.v_proj.scales", "model.decoder.layers.3.self_attn.v_proj.bias", "model.decoder.layers.3.self_attn.v_proj.qweight", "model.decoder.layers.3.self_attn.o_proj.zeros", "model.decoder.layers.3.self_attn.o_proj.scales", "model.decoder.layers.3.self_attn.o_proj.bias", "model.decoder.layers.3.self_attn.o_proj.qweight", "model.decoder.layers.3.feed_forward.w1.zeros", "model.decoder.layers.3.feed_forward.w1.scales", "model.decoder.layers.3.feed_forward.w1.bias", "model.decoder.layers.3.feed_forward.w1.qweight", "model.decoder.layers.3.feed_forward.w2.zeros", "model.decoder.layers.3.feed_forward.w2.scales", "model.decoder.layers.3.feed_forward.w2.bias", "model.decoder.layers.3.feed_forward.w2.qweight", "model.decoder.layers.3.feed_forward.w3.zeros", "model.decoder.layers.3.feed_forward.w3.scales", "model.decoder.layers.3.feed_forward.w3.bias", "model.decoder.layers.3.feed_forward.w3.qweight", "model.decoder.layers.3.attention_norm.weight", "model.decoder.layers.3.ffn_norm.weight", "model.decoder.layers.4.self_attn.q_proj.zeros", "model.decoder.layers.4.self_attn.q_proj.scales", "model.decoder.layers.4.self_attn.q_proj.bias", "model.decoder.layers.4.self_attn.q_proj.qweight", "model.decoder.layers.4.self_attn.k_proj.zeros", "model.decoder.layers.4.self_attn.k_proj.scales", "model.decoder.layers.4.self_attn.k_proj.bias", "model.decoder.layers.4.self_attn.k_proj.qweight", "model.decoder.layers.4.self_attn.v_proj.zeros", "model.decoder.layers.4.self_attn.v_proj.scales", "model.decoder.layers.4.self_attn.v_proj.bias", "model.decoder.layers.4.self_attn.v_proj.qweight", "model.decoder.layers.4.self_attn.o_proj.zeros", "model.decoder.layers.4.self_attn.o_proj.scales", "model.decoder.layers.4.self_attn.o_proj.bias", "model.decoder.layers.4.self_attn.o_proj.qweight", "model.decoder.layers.4.feed_forward.w1.zeros", "model.decoder.layers.4.feed_forward.w1.scales", "model.decoder.layers.4.feed_forward.w1.bias", "model.decoder.layers.4.feed_forward.w1.qweight", "model.decoder.layers.4.feed_forward.w2.zeros", "model.decoder.layers.4.feed_forward.w2.scales", "model.decoder.layers.4.feed_forward.w2.bias", "model.decoder.layers.4.feed_forward.w2.qweight", "model.decoder.layers.4.feed_forward.w3.zeros", "model.decoder.layers.4.feed_forward.w3.scales", "model.decoder.layers.4.feed_forward.w3.bias", "model.decoder.layers.4.feed_forward.w3.qweight", "model.decoder.layers.4.attention_norm.weight", "model.decoder.layers.4.ffn_norm.weight", "model.decoder.layers.5.self_attn.q_proj.zeros", "model.decoder.layers.5.self_attn.q_proj.scales", "model.decoder.layers.5.self_attn.q_proj.bias", "model.decoder.layers.5.self_attn.q_proj.qweight", "model.decoder.layers.5.self_attn.k_proj.zeros", "model.decoder.layers.5.self_attn.k_proj.scales", "model.decoder.layers.5.self_attn.k_proj.bias", "model.decoder.layers.5.self_attn.k_proj.qweight", "model.decoder.layers.5.self_attn.v_proj.zeros", "model.decoder.layers.5.self_attn.v_proj.scales", "model.decoder.layers.5.self_attn.v_proj.bias", "model.decoder.layers.5.self_attn.v_proj.qweight", "model.decoder.layers.5.self_attn.o_proj.zeros", "model.decoder.layers.5.self_attn.o_proj.scales", "model.decoder.layers.5.self_attn.o_proj.bias", "model.decoder.layers.5.self_attn.o_proj.qweight", "model.decoder.layers.5.feed_forward.w1.zeros", "model.decoder.layers.5.feed_forward.w1.scales", "model.decoder.layers.5.feed_forward.w1.bias", "model.decoder.layers.5.feed_forward.w1.qweight", "model.decoder.layers.5.feed_forward.w2.zeros", "model.decoder.layers.5.feed_forward.w2.scales", "model.decoder.layers.5.feed_forward.w2.bias", "model.decoder.layers.5.feed_forward.w2.qweight", "model.decoder.layers.5.feed_forward.w3.zeros", "model.decoder.layers.5.feed_forward.w3.scales", "model.decoder.layers.5.feed_forward.w3.bias", "model.decoder.layers.5.feed_forward.w3.qweight", "model.decoder.layers.5.attention_norm.weight", "model.decoder.layers.5.ffn_norm.weight", "model.decoder.layers.6.self_attn.q_proj.zeros", "model.decoder.layers.6.self_attn.q_proj.scales", "model.decoder.layers.6.self_attn.q_proj.bias", "model.decoder.layers.6.self_attn.q_proj.qweight", "model.decoder.layers.6.self_attn.k_proj.zeros", "model.decoder.layers.6.self_attn.k_proj.scales", "model.decoder.layers.6.self_attn.k_proj.bias", "model.decoder.layers.6.self_attn.k_proj.qweight", "model.decoder.layers.6.self_attn.v_proj.zeros", "model.decoder.layers.6.self_attn.v_proj.scales", "model.decoder.layers.6.self_attn.v_proj.bias", "model.decoder.layers.6.self_attn.v_proj.qweight", "model.decoder.layers.6.self_attn.o_proj.zeros", "model.decoder.layers.6.self_attn.o_proj.scales", "model.decoder.layers.6.self_attn.o_proj.bias", "model.decoder.layers.6.self_attn.o_proj.qweight", "model.decoder.layers.6.feed_forward.w1.zeros", "model.decoder.layers.6.feed_forward.w1.scales", "model.decoder.layers.6.feed_forward.w1.bias", "model.decoder.layers.6.feed_forward.w1.qweight", "model.decoder.layers.6.feed_forward.w2.zeros", "model.decoder.layers.6.feed_forward.w2.scales", "model.decoder.layers.6.feed_forward.w2.bias", "model.decoder.layers.6.feed_forward.w2.qweight", "model.decoder.layers.6.feed_forward.w3.zeros", "model.decoder.layers.6.feed_forward.w3.scales", "model.decoder.layers.6.feed_forward.w3.bias", "model.decoder.layers.6.feed_forward.w3.qweight", "model.decoder.layers.6.attention_norm.weight", "model.decoder.layers.6.ffn_norm.weight", "model.decoder.layers.7.self_attn.q_proj.zeros", "model.decoder.layers.7.self_attn.q_proj.scales", "model.decoder.layers.7.self_attn.q_proj.bias", "model.decoder.layers.7.self_attn.q_proj.qweight", "model.decoder.layers.7.self_attn.k_proj.zeros", "model.decoder.layers.7.self_attn.k_proj.scales", "model.decoder.layers.7.self_attn.k_proj.bias", "model.decoder.layers.7.self_attn.k_proj.qweight", "model.decoder.layers.7.self_attn.v_proj.zeros", "model.decoder.layers.7.self_attn.v_proj.scales", "model.decoder.layers.7.self_attn.v_proj.bias", "model.decoder.layers.7.self_attn.v_proj.qweight", "model.decoder.layers.7.self_attn.o_proj.zeros", "model.decoder.layers.7.self_attn.o_proj.scales", "model.decoder.layers.7.self_attn.o_proj.bias", "model.decoder.layers.7.self_attn.o_proj.qweight", "model.decoder.layers.7.feed_forward.w1.zeros", "model.decoder.layers.7.feed_forward.w1.scales", "model.decoder.layers.7.feed_forward.w1.bias", "model.decoder.layers.7.feed_forward.w1.qweight", "model.decoder.layers.7.feed_forward.w2.zeros", "model.decoder.layers.7.feed_forward.w2.scales", "model.decoder.layers.7.feed_forward.w2.bias", "model.decoder.layers.7.feed_forward.w2.qweight", "model.decoder.layers.7.feed_forward.w3.zeros", "model.decoder.layers.7.feed_forward.w3.scales", "model.decoder.layers.7.feed_forward.w3.bias", "model.decoder.layers.7.feed_forward.w3.qweight", "model.decoder.layers.7.attention_norm.weight", "model.decoder.layers.7.ffn_norm.weight", "model.decoder.layers.8.self_attn.q_proj.zeros", "model.decoder.layers.8.self_attn.q_proj.scales", "model.decoder.layers.8.self_attn.q_proj.bias", "model.decoder.layers.8.self_attn.q_proj.qweight", "model.decoder.layers.8.self_attn.k_proj.zeros", "model.decoder.layers.8.self_attn.k_proj.scales", "model.decoder.layers.8.self_attn.k_proj.bias", "model.decoder.layers.8.self_attn.k_proj.qweight", "model.decoder.layers.8.self_attn.v_proj.zeros", "model.decoder.layers.8.self_attn.v_proj.scales", "model.decoder.layers.8.self_attn.v_proj.bias", "model.decoder.layers.8.self_attn.v_proj.qweight", "model.decoder.layers.8.self_attn.o_proj.zeros", "model.decoder.layers.8.self_attn.o_proj.scales", "model.decoder.layers.8.self_attn.o_proj.bias", "model.decoder.layers.8.self_attn.o_proj.qweight", "model.decoder.layers.8.feed_forward.w1.zeros", "model.decoder.layers.8.feed_forward.w1.scales", "model.decoder.layers.8.feed_forward.w1.bias", "model.decoder.layers.8.feed_forward.w1.qweight", "model.decoder.layers.8.feed_forward.w2.zeros", "model.decoder.layers.8.feed_forward.w2.scales", "model.decoder.layers.8.feed_forward.w2.bias", "model.decoder.layers.8.feed_forward.w2.qweight", "model.decoder.layers.8.feed_forward.w3.zeros", "model.decoder.layers.8.feed_forward.w3.scales", "model.decoder.layers.8.feed_forward.w3.bias", "model.decoder.layers.8.feed_forward.w3.qweight", "model.decoder.layers.8.attention_norm.weight", "model.decoder.layers.8.ffn_norm.weight", "model.decoder.layers.9.self_attn.q_proj.zeros", "model.decoder.layers.9.self_attn.q_proj.scales", "model.decoder.layers.9.self_attn.q_proj.bias", "model.decoder.layers.9.self_attn.q_proj.qweight", "model.decoder.layers.9.self_attn.k_proj.zeros", "model.decoder.layers.9.self_attn.k_proj.scales", "model.decoder.layers.9.self_attn.k_proj.bias", "model.decoder.layers.9.self_attn.k_proj.qweight", "model.decoder.layers.9.self_attn.v_proj.zeros", "model.decoder.layers.9.self_attn.v_proj.scales", "model.decoder.layers.9.self_attn.v_proj.bias", "model.decoder.layers.9.self_attn.v_proj.qweight", "model.decoder.layers.9.self_attn.o_proj.zeros", "model.decoder.layers.9.self_attn.o_proj.scales", "model.decoder.layers.9.self_attn.o_proj.bias", "model.decoder.layers.9.self_attn.o_proj.qweight", "model.decoder.layers.9.feed_forward.w1.zeros", "model.decoder.layers.9.feed_forward.w1.scales", "model.decoder.layers.9.feed_forward.w1.bias", "model.decoder.layers.9.feed_forward.w1.qweight", "model.decoder.layers.9.feed_forward.w2.zeros", "model.decoder.layers.9.feed_forward.w2.scales", "model.decoder.layers.9.feed_forward.w2.bias", "model.decoder.layers.9.feed_forward.w2.qweight", "model.decoder.layers.9.feed_forward.w3.zeros", "model.decoder.layers.9.feed_forward.w3.scales", "model.decoder.layers.9.feed_forward.w3.bias", "model.decoder.layers.9.feed_forward.w3.qweight", "model.decoder.layers.9.attention_norm.weight", "model.decoder.layers.9.ffn_norm.weight", "model.decoder.layers.10.self_attn.q_proj.zeros", "model.decoder.layers.10.self_attn.q_proj.scales", "model.decoder.layers.10.self_attn.q_proj.bias", "model.decoder.layers.10.self_attn.q_proj.qweight", "model.decoder.layers.10.self_attn.k_proj.zeros", "model.decoder.layers.10.self_attn.k_proj.scales", "model.decoder.layers.10.self_attn.k_proj.bias", "model.decoder.layers.10.self_attn.k_proj.qweight", "model.decoder.layers.10.self_attn.v_proj.zeros", "model.decoder.layers.10.self_attn.v_proj.scales", "model.decoder.layers.10.self_attn.v_proj.bias", "model.decoder.layers.10.self_attn.v_proj.qweight", "model.decoder.layers.10.self_attn.o_proj.zeros", "model.decoder.layers.10.self_attn.o_proj.scales", "model.decoder.layers.10.self_attn.o_proj.bias", "model.decoder.layers.10.self_attn.o_proj.qweight", "model.decoder.layers.10.feed_forward.w1.zeros", "model.decoder.layers.10.feed_forward.w1.scales", "model.decoder.layers.10.feed_forward.w1.bias", "model.decoder.layers.10.feed_forward.w1.qweight", "model.decoder.layers.10.feed_forward.w2.zeros", "model.decoder.layers.10.feed_forward.w2.scales", "model.decoder.layers.10.feed_forward.w2.bias", "model.decoder.layers.10.feed_forward.w2.qweight", "model.decoder.layers.10.feed_forward.w3.zeros", "model.decoder.layers.10.feed_forward.w3.scales", "model.decoder.layers.10.feed_forward.w3.bias", "model.decoder.layers.10.feed_forward.w3.qweight", "model.decoder.layers.10.attention_norm.weight", "model.decoder.layers.10.ffn_norm.weight", "model.decoder.layers.11.self_attn.q_proj.zeros", "model.decoder.layers.11.self_attn.q_proj.scales", "model.decoder.layers.11.self_attn.q_proj.bias", "model.decoder.layers.11.self_attn.q_proj.qweight", "model.decoder.layers.11.self_attn.k_proj.zeros", "model.decoder.layers.11.self_attn.k_proj.scales", "model.decoder.layers.11.self_attn.k_proj.bias", "model.decoder.layers.11.self_attn.k_proj.qweight", "model.decoder.layers.11.self_attn.v_proj.zeros", "model.decoder.layers.11.self_attn.v_proj.scales", "model.decoder.layers.11.self_attn.v_proj.bias", "model.decoder.layers.11.self_attn.v_proj.qweight", "model.decoder.layers.11.self_attn.o_proj.zeros", "model.decoder.layers.11.self_attn.o_proj.scales", "model.decoder.layers.11.self_attn.o_proj.bias", "model.decoder.layers.11.self_attn.o_proj.qweight", "model.decoder.layers.11.feed_forward.w1.zeros", "model.decoder.layers.11.feed_forward.w1.scales", "model.decoder.layers.11.feed_forward.w1.bias", "model.decoder.layers.11.feed_forward.w1.qweight", "model.decoder.layers.11.feed_forward.w2.zeros", "model.decoder.layers.11.feed_forward.w2.scales", "model.decoder.layers.11.feed_forward.w2.bias", "model.decoder.layers.11.feed_forward.w2.qweight", "model.decoder.layers.11.feed_forward.w3.zeros", "model.decoder.layers.11.feed_forward.w3.scales", "model.decoder.layers.11.feed_forward.w3.bias", "model.decoder.layers.11.feed_forward.w3.qweight", "model.decoder.layers.11.attention_norm.weight", "model.decoder.layers.11.ffn_norm.weight", "model.decoder.layers.12.self_attn.q_proj.zeros", "model.decoder.layers.12.self_attn.q_proj.scales", "model.decoder.layers.12.self_attn.q_proj.bias", "model.decoder.layers.12.self_attn.q_proj.qweight", "model.decoder.layers.12.self_attn.k_proj.zeros", "model.decoder.layers.12.self_attn.k_proj.scales", "model.decoder.layers.12.self_attn.k_proj.bias", "model.decoder.layers.12.self_attn.k_proj.qweight", "model.decoder.layers.12.self_attn.v_proj.zeros", "model.decoder.layers.12.self_attn.v_proj.scales", "model.decoder.layers.12.self_attn.v_proj.bias", "model.decoder.layers.12.self_attn.v_proj.qweight", "model.decoder.layers.12.self_attn.o_proj.zeros", "model.decoder.layers.12.self_attn.o_proj.scales", "model.decoder.layers.12.self_attn.o_proj.bias", "model.decoder.layers.12.self_attn.o_proj.qweight", "model.decoder.layers.12.feed_forward.w1.zeros", "model.decoder.layers.12.feed_forward.w1.scales", "model.decoder.layers.12.feed_forward.w1.bias", "model.decoder.layers.12.feed_forward.w1.qweight", "model.decoder.layers.12.feed_forward.w2.zeros", "model.decoder.layers.12.feed_forward.w2.scales", "model.decoder.layers.12.feed_forward.w2.bias", "model.decoder.layers.12.feed_forward.w2.qweight", "model.decoder.layers.12.feed_forward.w3.zeros", "model.decoder.layers.12.feed_forward.w3.scales", "model.decoder.layers.12.feed_forward.w3.bias", "model.decoder.layers.12.feed_forward.w3.qweight", "model.decoder.layers.12.attention_norm.weight", "model.decoder.layers.12.ffn_norm.weight", "model.decoder.layers.13.self_attn.q_proj.zeros", "model.decoder.layers.13.self_attn.q_proj.scales", "model.decoder.layers.13.self_attn.q_proj.bias", "model.decoder.layers.13.self_attn.q_proj.qweight", "model.decoder.layers.13.self_attn.k_proj.zeros", "model.decoder.layers.13.self_attn.k_proj.scales", "model.decoder.layers.13.self_attn.k_proj.bias", "model.decoder.layers.13.self_attn.k_proj.qweight", "model.decoder.layers.13.self_attn.v_proj.zeros", "model.decoder.layers.13.self_attn.v_proj.scales", "model.decoder.layers.13.self_attn.v_proj.bias", "model.decoder.layers.13.self_attn.v_proj.qweight", "model.decoder.layers.13.self_attn.o_proj.zeros", "model.decoder.layers.13.self_attn.o_proj.scales", "model.decoder.layers.13.self_attn.o_proj.bias", "model.decoder.layers.13.self_attn.o_proj.qweight", "model.decoder.layers.13.feed_forward.w1.zeros", "model.decoder.layers.13.feed_forward.w1.scales", "model.decoder.layers.13.feed_forward.w1.bias", "model.decoder.layers.13.feed_forward.w1.qweight", "model.decoder.layers.13.feed_forward.w2.zeros", "model.decoder.layers.13.feed_forward.w2.scales", "model.decoder.layers.13.feed_forward.w2.bias", "model.decoder.layers.13.feed_forward.w2.qweight", "model.decoder.layers.13.feed_forward.w3.zeros", "model.decoder.layers.13.feed_forward.w3.scales", "model.decoder.layers.13.feed_forward.w3.bias", "model.decoder.layers.13.feed_forward.w3.qweight", "model.decoder.layers.13.attention_norm.weight", "model.decoder.layers.13.ffn_norm.weight", "model.decoder.layers.14.self_attn.q_proj.zeros", "model.decoder.layers.14.self_attn.q_proj.scales", "model.decoder.layers.14.self_attn.q_proj.bias", "model.decoder.layers.14.self_attn.q_proj.qweight", "model.decoder.layers.14.self_attn.k_proj.zeros", "model.decoder.layers.14.self_attn.k_proj.scales", "model.decoder.layers.14.self_attn.k_proj.bias", "model.decoder.layers.14.self_attn.k_proj.qweight", "model.decoder.layers.14.self_attn.v_proj.zeros", "model.decoder.layers.14.self_attn.v_proj.scales", "model.decoder.layers.14.self_attn.v_proj.bias", "model.decoder.layers.14.self_attn.v_proj.qweight", "model.decoder.layers.14.self_attn.o_proj.zeros", "model.decoder.layers.14.self_attn.o_proj.scales", "model.decoder.layers.14.self_attn.o_proj.bias", "model.decoder.layers.14.self_attn.o_proj.qweight", "model.decoder.layers.14.feed_forward.w1.zeros", "model.decoder.layers.14.feed_forward.w1.scales", "model.decoder.layers.14.feed_forward.w1.bias", "model.decoder.layers.14.feed_forward.w1.qweight", "model.decoder.layers.14.feed_forward.w2.zeros", "model.decoder.layers.14.feed_forward.w2.scales", "model.decoder.layers.14.feed_forward.w2.bias", "model.decoder.layers.14.feed_forward.w2.qweight", "model.decoder.layers.14.feed_forward.w3.zeros", "model.decoder.layers.14.feed_forward.w3.scales", "model.decoder.layers.14.feed_forward.w3.bias", "model.decoder.layers.14.feed_forward.w3.qweight", "model.decoder.layers.14.attention_norm.weight", "model.decoder.layers.14.ffn_norm.weight", "model.decoder.layers.15.self_attn.q_proj.zeros", "model.decoder.layers.15.self_attn.q_proj.scales", "model.decoder.layers.15.self_attn.q_proj.bias", "model.decoder.layers.15.self_attn.q_proj.qweight", "model.decoder.layers.15.self_attn.k_proj.zeros", "model.decoder.layers.15.self_attn.k_proj.scales", "model.decoder.layers.15.self_attn.k_proj.bias", "model.decoder.layers.15.self_attn.k_proj.qweight", "model.decoder.layers.15.self_attn.v_proj.zeros", "model.decoder.layers.15.self_attn.v_proj.scales", "model.decoder.layers.15.self_attn.v_proj.bias", "model.decoder.layers.15.self_attn.v_proj.qweight", "model.decoder.layers.15.self_attn.o_proj.zeros", "model.decoder.layers.15.self_attn.o_proj.scales", "model.decoder.layers.15.self_attn.o_proj.bias", "model.decoder.layers.15.self_attn.o_proj.qweight", "model.decoder.layers.15.feed_forward.w1.zeros", "model.decoder.layers.15.feed_forward.w1.scales", "model.decoder.layers.15.feed_forward.w1.bias", "model.decoder.layers.15.feed_forward.w1.qweight", "model.decoder.layers.15.feed_forward.w2.zeros", "model.decoder.layers.15.feed_forward.w2.scales", "model.decoder.layers.15.feed_forward.w2.bias", "model.decoder.layers.15.feed_forward.w2.qweight", "model.decoder.layers.15.feed_forward.w3.zeros", "model.decoder.layers.15.feed_forward.w3.scales", "model.decoder.layers.15.feed_forward.w3.bias", "model.decoder.layers.15.feed_forward.w3.qweight", "model.decoder.layers.15.attention_norm.weight", "model.decoder.layers.15.ffn_norm.weight", "model.decoder.layers.16.self_attn.q_proj.zeros", "model.decoder.layers.16.self_attn.q_proj.scales", "model.decoder.layers.16.self_attn.q_proj.bias", "model.decoder.layers.16.self_attn.q_proj.qweight", "model.decoder.layers.16.self_attn.k_proj.zeros", "model.decoder.layers.16.self_attn.k_proj.scales", "model.decoder.layers.16.self_attn.k_proj.bias", "model.decoder.layers.16.self_attn.k_proj.qweight", "model.decoder.layers.16.self_attn.v_proj.zeros", "model.decoder.layers.16.self_attn.v_proj.scales", "model.decoder.layers.16.self_attn.v_proj.bias", "model.decoder.layers.16.self_attn.v_proj.qweight", "model.decoder.layers.16.self_attn.o_proj.zeros", "model.decoder.layers.16.self_attn.o_proj.scales", "model.decoder.layers.16.self_attn.o_proj.bias", "model.decoder.layers.16.self_attn.o_proj.qweight", "model.decoder.layers.16.feed_forward.w1.zeros", "model.decoder.layers.16.feed_forward.w1.scales", "model.decoder.layers.16.feed_forward.w1.bias", "model.decoder.layers.16.feed_forward.w1.qweight", "model.decoder.layers.16.feed_forward.w2.zeros", "model.decoder.layers.16.feed_forward.w2.scales", "model.decoder.layers.16.feed_forward.w2.bias", "model.decoder.layers.16.feed_forward.w2.qweight", "model.decoder.layers.16.feed_forward.w3.zeros", "model.decoder.layers.16.feed_forward.w3.scales", "model.decoder.layers.16.feed_forward.w3.bias", "model.decoder.layers.16.feed_forward.w3.qweight", "model.decoder.layers.16.attention_norm.weight", "model.decoder.layers.16.ffn_norm.weight", "model.decoder.layers.17.self_attn.q_proj.zeros", "model.decoder.layers.17.self_attn.q_proj.scales", "model.decoder.layers.17.self_attn.q_proj.bias", "model.decoder.layers.17.self_attn.q_proj.qweight", "model.decoder.layers.17.self_attn.k_proj.zeros", "model.decoder.layers.17.self_attn.k_proj.scales", "model.decoder.layers.17.self_attn.k_proj.bias", "model.decoder.layers.17.self_attn.k_proj.qweight", "model.decoder.layers.17.self_attn.v_proj.zeros", "model.decoder.layers.17.self_attn.v_proj.scales", "model.decoder.layers.17.self_attn.v_proj.bias", "model.decoder.layers.17.self_attn.v_proj.qweight", "model.decoder.layers.17.self_attn.o_proj.zeros", "model.decoder.layers.17.self_attn.o_proj.scales", "model.decoder.layers.17.self_attn.o_proj.bias", "model.decoder.layers.17.self_attn.o_proj.qweight", "model.decoder.layers.17.feed_forward.w1.zeros", "model.decoder.layers.17.feed_forward.w1.scales", "model.decoder.layers.17.feed_forward.w1.bias", "model.decoder.layers.17.feed_forward.w1.qweight", "model.decoder.layers.17.feed_forward.w2.zeros", "model.decoder.layers.17.feed_forward.w2.scales", "model.decoder.layers.17.feed_forward.w2.bias", "model.decoder.layers.17.feed_forward.w2.qweight", "model.decoder.layers.17.feed_forward.w3.zeros", "model.decoder.layers.17.feed_forward.w3.scales", "model.decoder.layers.17.feed_forward.w3.bias", "model.decoder.layers.17.feed_forward.w3.qweight", "model.decoder.layers.17.attention_norm.weight", "model.decoder.layers.17.ffn_norm.weight", "model.decoder.layers.18.self_attn.q_proj.zeros", "model.decoder.layers.18.self_attn.q_proj.scales", "model.decoder.layers.18.self_attn.q_proj.bias", "model.decoder.layers.18.self_attn.q_proj.qweight", "model.decoder.layers.18.self_attn.k_proj.zeros", "model.decoder.layers.18.self_attn.k_proj.scales", "model.decoder.layers.18.self_attn.k_proj.bias", "model.decoder.layers.18.self_attn.k_proj.qweight", "model.decoder.layers.18.self_attn.v_proj.zeros", "model.decoder.layers.18.self_attn.v_proj.scales", "model.decoder.layers.18.self_attn.v_proj.bias", "model.decoder.layers.18.self_attn.v_proj.qweight", "model.decoder.layers.18.self_attn.o_proj.zeros", "model.decoder.layers.18.self_attn.o_proj.scales", "model.decoder.layers.18.self_attn.o_proj.bias", "model.decoder.layers.18.self_attn.o_proj.qweight", "model.decoder.layers.18.feed_forward.w1.zeros", "model.decoder.layers.18.feed_forward.w1.scales", "model.decoder.layers.18.feed_forward.w1.bias", "model.decoder.layers.18.feed_forward.w1.qweight", "model.decoder.layers.18.feed_forward.w2.zeros", "model.decoder.layers.18.feed_forward.w2.scales", "model.decoder.layers.18.feed_forward.w2.bias", "model.decoder.layers.18.feed_forward.w2.qweight", "model.decoder.layers.18.feed_forward.w3.zeros", "model.decoder.layers.18.feed_forward.w3.scales", "model.decoder.layers.18.feed_forward.w3.bias", "model.decoder.layers.18.feed_forward.w3.qweight", "model.decoder.layers.18.attention_norm.weight", "model.decoder.layers.18.ffn_norm.weight", "model.decoder.layers.19.self_attn.q_proj.zeros", "model.decoder.layers.19.self_attn.q_proj.scales", "model.decoder.layers.19.self_attn.q_proj.bias", "model.decoder.layers.19.self_attn.q_proj.qweight", "model.decoder.layers.19.self_attn.k_proj.zeros", "model.decoder.layers.19.self_attn.k_proj.scales", "model.decoder.layers.19.self_attn.k_proj.bias", "model.decoder.layers.19.self_attn.k_proj.qweight", "model.decoder.layers.19.self_attn.v_proj.zeros", "model.decoder.layers.19.self_attn.v_proj.scales", "model.decoder.layers.19.self_attn.v_proj.bias", "model.decoder.layers.19.self_attn.v_proj.qweight", "model.decoder.layers.19.self_attn.o_proj.zeros", "model.decoder.layers.19.self_attn.o_proj.scales", "model.decoder.layers.19.self_attn.o_proj.bias", "model.decoder.layers.19.self_attn.o_proj.qweight", "model.decoder.layers.19.feed_forward.w1.zeros", "model.decoder.layers.19.feed_forward.w1.scales", "model.decoder.layers.19.feed_forward.w1.bias", "model.decoder.layers.19.feed_forward.w1.qweight", "model.decoder.layers.19.feed_forward.w2.zeros", "model.decoder.layers.19.feed_forward.w2.scales", "model.decoder.layers.19.feed_forward.w2.bias", "model.decoder.layers.19.feed_forward.w2.qweight", "model.decoder.layers.19.feed_forward.w3.zeros", "model.decoder.layers.19.feed_forward.w3.scales", "model.decoder.layers.19.feed_forward.w3.bias", "model.decoder.layers.19.feed_forward.w3.qweight", "model.decoder.layers.19.attention_norm.weight", "model.decoder.layers.19.ffn_norm.weight", "model.decoder.layers.20.self_attn.q_proj.zeros", "model.decoder.layers.20.self_attn.q_proj.scales", "model.decoder.layers.20.self_attn.q_proj.bias", "model.decoder.layers.20.self_attn.q_proj.qweight", "model.decoder.layers.20.self_attn.k_proj.zeros", "model.decoder.layers.20.self_attn.k_proj.scales", "model.decoder.layers.20.self_attn.k_proj.bias", "model.decoder.layers.20.self_attn.k_proj.qweight", "model.decoder.layers.20.self_attn.v_proj.zeros", "model.decoder.layers.20.self_attn.v_proj.scales", "model.decoder.layers.20.self_attn.v_proj.bias", "model.decoder.layers.20.self_attn.v_proj.qweight", "model.decoder.layers.20.self_attn.o_proj.zeros", "model.decoder.layers.20.self_attn.o_proj.scales", "model.decoder.layers.20.self_attn.o_proj.bias", "model.decoder.layers.20.self_attn.o_proj.qweight", "model.decoder.layers.20.feed_forward.w1.zeros", "model.decoder.layers.20.feed_forward.w1.scales", "model.decoder.layers.20.feed_forward.w1.bias", "model.decoder.layers.20.feed_forward.w1.qweight", "model.decoder.layers.20.feed_forward.w2.zeros", "model.decoder.layers.20.feed_forward.w2.scales", "model.decoder.layers.20.feed_forward.w2.bias", "model.decoder.layers.20.feed_forward.w2.qweight", "model.decoder.layers.20.feed_forward.w3.zeros", "model.decoder.layers.20.feed_forward.w3.scales", "model.decoder.layers.20.feed_forward.w3.bias", "model.decoder.layers.20.feed_forward.w3.qweight", "model.decoder.layers.20.attention_norm.weight", "model.decoder.layers.20.ffn_norm.weight", "model.decoder.layers.21.self_attn.q_proj.zeros", "model.decoder.layers.21.self_attn.q_proj.scales", "model.decoder.layers.21.self_attn.q_proj.bias", "model.decoder.layers.21.self_attn.q_proj.qweight", "model.decoder.layers.21.self_attn.k_proj.zeros", "model.decoder.layers.21.self_attn.k_proj.scales", "model.decoder.layers.21.self_attn.k_proj.bias", "model.decoder.layers.21.self_attn.k_proj.qweight", "model.decoder.layers.21.self_attn.v_proj.zeros", "model.decoder.layers.21.self_attn.v_proj.scales", "model.decoder.layers.21.self_attn.v_proj.bias", "model.decoder.layers.21.self_attn.v_proj.qweight", "model.decoder.layers.21.self_attn.o_proj.zeros", "model.decoder.layers.21.self_attn.o_proj.scales", "model.decoder.layers.21.self_attn.o_proj.bias", "model.decoder.layers.21.self_attn.o_proj.qweight", "model.decoder.layers.21.feed_forward.w1.zeros", "model.decoder.layers.21.feed_forward.w1.scales", "model.decoder.layers.21.feed_forward.w1.bias", "model.decoder.layers.21.feed_forward.w1.qweight", "model.decoder.layers.21.feed_forward.w2.zeros", "model.decoder.layers.21.feed_forward.w2.scales", "model.decoder.layers.21.feed_forward.w2.bias", "model.decoder.layers.21.feed_forward.w2.qweight", "model.decoder.layers.21.feed_forward.w3.zeros", "model.decoder.layers.21.feed_forward.w3.scales", "model.decoder.layers.21.feed_forward.w3.bias", "model.decoder.layers.21.feed_forward.w3.qweight", "model.decoder.layers.21.attention_norm.weight", "model.decoder.layers.21.ffn_norm.weight", "model.decoder.layers.22.self_attn.q_proj.zeros", "model.decoder.layers.22.self_attn.q_proj.scales", "model.decoder.layers.22.self_attn.q_proj.bias", "model.decoder.layers.22.self_attn.q_proj.qweight", "model.decoder.layers.22.self_attn.k_proj.zeros", "model.decoder.layers.22.self_attn.k_proj.scales", "model.decoder.layers.22.self_attn.k_proj.bias", "model.decoder.layers.22.self_attn.k_proj.qweight", "model.decoder.layers.22.self_attn.v_proj.zeros", "model.decoder.layers.22.self_attn.v_proj.scales", "model.decoder.layers.22.self_attn.v_proj.bias", "model.decoder.layers.22.self_attn.v_proj.qweight", "model.decoder.layers.22.self_attn.o_proj.zeros", "model.decoder.layers.22.self_attn.o_proj.scales", "model.decoder.layers.22.self_attn.o_proj.bias", "model.decoder.layers.22.self_attn.o_proj.qweight", "model.decoder.layers.22.feed_forward.w1.zeros", "model.decoder.layers.22.feed_forward.w1.scales", "model.decoder.layers.22.feed_forward.w1.bias", "model.decoder.layers.22.feed_forward.w1.qweight", "model.decoder.layers.22.feed_forward.w2.zeros", "model.decoder.layers.22.feed_forward.w2.scales", "model.decoder.layers.22.feed_forward.w2.bias", "model.decoder.layers.22.feed_forward.w2.qweight", "model.decoder.layers.22.feed_forward.w3.zeros", "model.decoder.layers.22.feed_forward.w3.scales", "model.decoder.layers.22.feed_forward.w3.bias", "model.decoder.layers.22.feed_forward.w3.qweight", "model.decoder.layers.22.attention_norm.weight", "model.decoder.layers.22.ffn_norm.weight", "model.decoder.layers.23.self_attn.q_proj.zeros", "model.decoder.layers.23.self_attn.q_proj.scales", "model.decoder.layers.23.self_attn.q_proj.bias", "model.decoder.layers.23.self_attn.q_proj.qweight", "model.decoder.layers.23.self_attn.k_proj.zeros", "model.decoder.layers.23.self_attn.k_proj.scales", "model.decoder.layers.23.self_attn.k_proj.bias", "model.decoder.layers.23.self_attn.k_proj.qweight", "model.decoder.layers.23.self_attn.v_proj.zeros", "model.decoder.layers.23.self_attn.v_proj.scales", "model.decoder.layers.23.self_attn.v_proj.bias", "model.decoder.layers.23.self_attn.v_proj.qweight", "model.decoder.layers.23.self_attn.o_proj.zeros", "model.decoder.layers.23.self_attn.o_proj.scales", "model.decoder.layers.23.self_attn.o_proj.bias", "model.decoder.layers.23.self_attn.o_proj.qweight", "model.decoder.layers.23.feed_forward.w1.zeros", "model.decoder.layers.23.feed_forward.w1.scales", "model.decoder.layers.23.feed_forward.w1.bias", "model.decoder.layers.23.feed_forward.w1.qweight", "model.decoder.layers.23.feed_forward.w2.zeros", "model.decoder.layers.23.feed_forward.w2.scales", "model.decoder.layers.23.feed_forward.w2.bias", "model.decoder.layers.23.feed_forward.w2.qweight", "model.decoder.layers.23.feed_forward.w3.zeros", "model.decoder.layers.23.feed_forward.w3.scales", "model.decoder.layers.23.feed_forward.w3.bias", "model.decoder.layers.23.feed_forward.w3.qweight", "model.decoder.layers.23.attention_norm.weight", "model.decoder.layers.23.ffn_norm.weight", "model.decoder.layers.24.self_attn.q_proj.zeros", "model.decoder.layers.24.self_attn.q_proj.scales", "model.decoder.layers.24.self_attn.q_proj.bias", "model.decoder.layers.24.self_attn.q_proj.qweight", "model.decoder.layers.24.self_attn.k_proj.zeros", "model.decoder.layers.24.self_attn.k_proj.scales", "model.decoder.layers.24.self_attn.k_proj.bias", "model.decoder.layers.24.self_attn.k_proj.qweight", "model.decoder.layers.24.self_attn.v_proj.zeros", "model.decoder.layers.24.self_attn.v_proj.scales", "model.decoder.layers.24.self_attn.v_proj.bias", "model.decoder.layers.24.self_attn.v_proj.qweight", "model.decoder.layers.24.self_attn.o_proj.zeros", "model.decoder.layers.24.self_attn.o_proj.scales", "model.decoder.layers.24.self_attn.o_proj.bias", "model.decoder.layers.24.self_attn.o_proj.qweight", "model.decoder.layers.24.feed_forward.w1.zeros", "model.decoder.layers.24.feed_forward.w1.scales", "model.decoder.layers.24.feed_forward.w1.bias", "model.decoder.layers.24.feed_forward.w1.qweight", "model.decoder.layers.24.feed_forward.w2.zeros", "model.decoder.layers.24.feed_forward.w2.scales", "model.decoder.layers.24.feed_forward.w2.bias", "model.decoder.layers.24.feed_forward.w2.qweight", "model.decoder.layers.24.feed_forward.w3.zeros", "model.decoder.layers.24.feed_forward.w3.scales", "model.decoder.layers.24.feed_forward.w3.bias", "model.decoder.layers.24.feed_forward.w3.qweight", "model.decoder.layers.24.attention_norm.weight", "model.decoder.layers.24.ffn_norm.weight", "model.decoder.layers.25.self_attn.q_proj.zeros", "model.decoder.layers.25.self_attn.q_proj.scales", "model.decoder.layers.25.self_attn.q_proj.bias", "model.decoder.layers.25.self_attn.q_proj.qweight", "model.decoder.layers.25.self_attn.k_proj.zeros", "model.decoder.layers.25.self_attn.k_proj.scales", "model.decoder.layers.25.self_attn.k_proj.bias", "model.decoder.layers.25.self_attn.k_proj.qweight", "model.decoder.layers.25.self_attn.v_proj.zeros", "model.decoder.layers.25.self_attn.v_proj.scales", "model.decoder.layers.25.self_attn.v_proj.bias", "model.decoder.layers.25.self_attn.v_proj.qweight", "model.decoder.layers.25.self_attn.o_proj.zeros", "model.decoder.layers.25.self_attn.o_proj.scales", "model.decoder.layers.25.self_attn.o_proj.bias", "model.decoder.layers.25.self_attn.o_proj.qweight", "model.decoder.layers.25.feed_forward.w1.zeros", "model.decoder.layers.25.feed_forward.w1.scales", "model.decoder.layers.25.feed_forward.w1.bias", "model.decoder.layers.25.feed_forward.w1.qweight", "model.decoder.layers.25.feed_forward.w2.zeros", "model.decoder.layers.25.feed_forward.w2.scales", "model.decoder.layers.25.feed_forward.w2.bias", "model.decoder.layers.25.feed_forward.w2.qweight", "model.decoder.layers.25.feed_forward.w3.zeros", "model.decoder.layers.25.feed_forward.w3.scales", "model.decoder.layers.25.feed_forward.w3.bias", "model.decoder.layers.25.feed_forward.w3.qweight", "model.decoder.layers.25.attention_norm.weight", "model.decoder.layers.25.ffn_norm.weight", "model.decoder.layers.26.self_attn.q_proj.zeros", "model.decoder.layers.26.self_attn.q_proj.scales", "model.decoder.layers.26.self_attn.q_proj.bias", "model.decoder.layers.26.self_attn.q_proj.qweight", "model.decoder.layers.26.self_attn.k_proj.zeros", "model.decoder.layers.26.self_attn.k_proj.scales", "model.decoder.layers.26.self_attn.k_proj.bias", "model.decoder.layers.26.self_attn.k_proj.qweight", "model.decoder.layers.26.self_attn.v_proj.zeros", "model.decoder.layers.26.self_attn.v_proj.scales", "model.decoder.layers.26.self_attn.v_proj.bias", "model.decoder.layers.26.self_attn.v_proj.qweight", "model.decoder.layers.26.self_attn.o_proj.zeros", "model.decoder.layers.26.self_attn.o_proj.scales", "model.decoder.layers.26.self_attn.o_proj.bias", "model.decoder.layers.26.self_attn.o_proj.qweight", "model.decoder.layers.26.feed_forward.w1.zeros", "model.decoder.layers.26.feed_forward.w1.scales", "model.decoder.layers.26.feed_forward.w1.bias", "model.decoder.layers.26.feed_forward.w1.qweight", "model.decoder.layers.26.feed_forward.w2.zeros", "model.decoder.layers.26.feed_forward.w2.scales", "model.decoder.layers.26.feed_forward.w2.bias", "model.decoder.layers.26.feed_forward.w2.qweight", "model.decoder.layers.26.feed_forward.w3.zeros", "model.decoder.layers.26.feed_forward.w3.scales", "model.decoder.layers.26.feed_forward.w3.bias", "model.decoder.layers.26.feed_forward.w3.qweight", "model.decoder.layers.26.attention_norm.weight", "model.decoder.layers.26.ffn_norm.weight", "model.decoder.layers.27.self_attn.q_proj.zeros", "model.decoder.layers.27.self_attn.q_proj.scales", "model.decoder.layers.27.self_attn.q_proj.bias", "model.decoder.layers.27.self_attn.q_proj.qweight", "model.decoder.layers.27.self_attn.k_proj.zeros", "model.decoder.layers.27.self_attn.k_proj.scales", "model.decoder.layers.27.self_attn.k_proj.bias", "model.decoder.layers.27.self_attn.k_proj.qweight", "model.decoder.layers.27.self_attn.v_proj.zeros", "model.decoder.layers.27.self_attn.v_proj.scales", "model.decoder.layers.27.self_attn.v_proj.bias", "model.decoder.layers.27.self_attn.v_proj.qweight", "model.decoder.layers.27.self_attn.o_proj.zeros", "model.decoder.layers.27.self_attn.o_proj.scales", "model.decoder.layers.27.self_attn.o_proj.bias", "model.decoder.layers.27.self_attn.o_proj.qweight", "model.decoder.layers.27.feed_forward.w1.zeros", "model.decoder.layers.27.feed_forward.w1.scales", "model.decoder.layers.27.feed_forward.w1.bias", "model.decoder.layers.27.feed_forward.w1.qweight", "model.decoder.layers.27.feed_forward.w2.zeros", "model.decoder.layers.27.feed_forward.w2.scales", "model.decoder.layers.27.feed_forward.w2.bias", "model.decoder.layers.27.feed_forward.w2.qweight", "model.decoder.layers.27.feed_forward.w3.zeros", "model.decoder.layers.27.feed_forward.w3.scales", "model.decoder.layers.27.feed_forward.w3.bias", "model.decoder.layers.27.feed_forward.w3.qweight", "model.decoder.layers.27.attention_norm.weight", "model.decoder.layers.27.ffn_norm.weight", "model.decoder.layers.28.self_attn.q_proj.zeros", "model.decoder.layers.28.self_attn.q_proj.scales", "model.decoder.layers.28.self_attn.q_proj.bias", "model.decoder.layers.28.self_attn.q_proj.qweight", "model.decoder.layers.28.self_attn.k_proj.zeros", "model.decoder.layers.28.self_attn.k_proj.scales", "model.decoder.layers.28.self_attn.k_proj.bias", "model.decoder.layers.28.self_attn.k_proj.qweight", "model.decoder.layers.28.self_attn.v_proj.zeros", "model.decoder.layers.28.self_attn.v_proj.scales", "model.decoder.layers.28.self_attn.v_proj.bias", "model.decoder.layers.28.self_attn.v_proj.qweight", "model.decoder.layers.28.self_attn.o_proj.zeros", "model.decoder.layers.28.self_attn.o_proj.scales", "model.decoder.layers.28.self_attn.o_proj.bias", "model.decoder.layers.28.self_attn.o_proj.qweight", "model.decoder.layers.28.feed_forward.w1.zeros", "model.decoder.layers.28.feed_forward.w1.scales", "model.decoder.layers.28.feed_forward.w1.bias", "model.decoder.layers.28.feed_forward.w1.qweight", "model.decoder.layers.28.feed_forward.w2.zeros", "model.decoder.layers.28.feed_forward.w2.scales", "model.decoder.layers.28.feed_forward.w2.bias", "model.decoder.layers.28.feed_forward.w2.qweight", "model.decoder.layers.28.feed_forward.w3.zeros", "model.decoder.layers.28.feed_forward.w3.scales", "model.decoder.layers.28.feed_forward.w3.bias", "model.decoder.layers.28.feed_forward.w3.qweight", "model.decoder.layers.28.attention_norm.weight", "model.decoder.layers.28.ffn_norm.weight", "model.decoder.layers.29.self_attn.q_proj.zeros", "model.decoder.layers.29.self_attn.q_proj.scales", "model.decoder.layers.29.self_attn.q_proj.bias", "model.decoder.layers.29.self_attn.q_proj.qweight", "model.decoder.layers.29.self_attn.k_proj.zeros", "model.decoder.layers.29.self_attn.k_proj.scales", "model.decoder.layers.29.self_attn.k_proj.bias", "model.decoder.layers.29.self_attn.k_proj.qweight", "model.decoder.layers.29.self_attn.v_proj.zeros", "model.decoder.layers.29.self_attn.v_proj.scales", "model.decoder.layers.29.self_attn.v_proj.bias", "model.decoder.layers.29.self_attn.v_proj.qweight", "model.decoder.layers.29.self_attn.o_proj.zeros", "model.decoder.layers.29.self_attn.o_proj.scales", "model.decoder.layers.29.self_attn.o_proj.bias", "model.decoder.layers.29.self_attn.o_proj.qweight", "model.decoder.layers.29.feed_forward.w1.zeros", "model.decoder.layers.29.feed_forward.w1.scales", "model.decoder.layers.29.feed_forward.w1.bias", "model.decoder.layers.29.feed_forward.w1.qweight", "model.decoder.layers.29.feed_forward.w2.zeros", "model.decoder.layers.29.feed_forward.w2.scales", "model.decoder.layers.29.feed_forward.w2.bias", "model.decoder.layers.29.feed_forward.w2.qweight", "model.decoder.layers.29.feed_forward.w3.zeros", "model.decoder.layers.29.feed_forward.w3.scales", "model.decoder.layers.29.feed_forward.w3.bias", "model.decoder.layers.29.feed_forward.w3.qweight", "model.decoder.layers.29.attention_norm.weight", "model.decoder.layers.29.ffn_norm.weight", "model.decoder.layers.30.self_attn.q_proj.zeros", "model.decoder.layers.30.self_attn.q_proj.scales", "model.decoder.layers.30.self_attn.q_proj.bias", "model.decoder.layers.30.self_attn.q_proj.qweight", "model.decoder.layers.30.self_attn.k_proj.zeros", "model.decoder.layers.30.self_attn.k_proj.scales", "model.decoder.layers.30.self_attn.k_proj.bias", "model.decoder.layers.30.self_attn.k_proj.qweight", "model.decoder.layers.30.self_attn.v_proj.zeros", "model.decoder.layers.30.self_attn.v_proj.scales", "model.decoder.layers.30.self_attn.v_proj.bias", "model.decoder.layers.30.self_attn.v_proj.qweight", "model.decoder.layers.30.self_attn.o_proj.zeros", "model.decoder.layers.30.self_attn.o_proj.scales", "model.decoder.layers.30.self_attn.o_proj.bias", "model.decoder.layers.30.self_attn.o_proj.qweight", "model.decoder.layers.30.feed_forward.w1.zeros", "model.decoder.layers.30.feed_forward.w1.scales", "model.decoder.layers.30.feed_forward.w1.bias", "model.decoder.layers.30.feed_forward.w1.qweight", "model.decoder.layers.30.feed_forward.w2.zeros", "model.decoder.layers.30.feed_forward.w2.scales", "model.decoder.layers.30.feed_forward.w2.bias", "model.decoder.layers.30.feed_forward.w2.qweight", "model.decoder.layers.30.feed_forward.w3.zeros", "model.decoder.layers.30.feed_forward.w3.scales", "model.decoder.layers.30.feed_forward.w3.bias", "model.decoder.layers.30.feed_forward.w3.qweight", "model.decoder.layers.30.attention_norm.weight", "model.decoder.layers.30.ffn_norm.weight", "model.decoder.layers.31.self_attn.q_proj.zeros", "model.decoder.layers.31.self_attn.q_proj.scales", "model.decoder.layers.31.self_attn.q_proj.bias", "model.decoder.layers.31.self_attn.q_proj.qweight", "model.decoder.layers.31.self_attn.k_proj.zeros", "model.decoder.layers.31.self_attn.k_proj.scales", "model.decoder.layers.31.self_attn.k_proj.bias", "model.decoder.layers.31.self_attn.k_proj.qweight", "model.decoder.layers.31.self_attn.v_proj.zeros", "model.decoder.layers.31.self_attn.v_proj.scales", "model.decoder.layers.31.self_attn.v_proj.bias", "model.decoder.layers.31.self_attn.v_proj.qweight", "model.decoder.layers.31.self_attn.o_proj.zeros", "model.decoder.layers.31.self_attn.o_proj.scales", "model.decoder.layers.31.self_attn.o_proj.bias", "model.decoder.layers.31.self_attn.o_proj.qweight", "model.decoder.layers.31.feed_forward.w1.zeros", "model.decoder.layers.31.feed_forward.w1.scales", "model.decoder.layers.31.feed_forward.w1.bias", "model.decoder.layers.31.feed_forward.w1.qweight", "model.decoder.layers.31.feed_forward.w2.zeros", "model.decoder.layers.31.feed_forward.w2.scales", "model.decoder.layers.31.feed_forward.w2.bias", "model.decoder.layers.31.feed_forward.w2.qweight", "model.decoder.layers.31.feed_forward.w3.zeros", "model.decoder.layers.31.feed_forward.w3.scales", "model.decoder.layers.31.feed_forward.w3.bias", "model.decoder.layers.31.feed_forward.w3.qweight", "model.decoder.layers.31.attention_norm.weight", "model.decoder.layers.31.ffn_norm.weight", "model.decoder.norm.weight". Unexpected key(s) in state_dict: "model.embed_tokens.weight", "model.layers.0.self_attn.q_proj.zeros", "model.layers.0.self_attn.q_proj.scales", "model.layers.0.self_attn.q_proj.bias", "model.layers.0.self_attn.q_proj.qweight", "model.layers.0.self_attn.k_proj.zeros", "model.layers.0.self_attn.k_proj.scales", "model.layers.0.self_attn.k_proj.bias", "model.layers.0.self_attn.k_proj.qweight", "model.layers.0.self_attn.v_proj.zeros", "model.layers.0.self_attn.v_proj.scales", "model.layers.0.self_attn.v_proj.bias", "model.layers.0.self_attn.v_proj.qweight", "model.layers.0.self_attn.o_proj.zeros", "model.layers.0.self_attn.o_proj.scales", "model.layers.0.self_attn.o_proj.bias", "model.layers.0.self_attn.o_proj.qweight", "model.layers.0.self_attn.rotary_emb.inv_freq", "model.layers.0.mlp.gate_proj.zeros", "model.layers.0.mlp.gate_proj.scales", "model.layers.0.mlp.gate_proj.bias", "model.layers.0.mlp.gate_proj.qweight", "model.layers.0.mlp.down_proj.zeros", "model.layers.0.mlp.down_proj.scales", "model.layers.0.mlp.down_proj.bias", "model.layers.0.mlp.down_proj.qweight", "model.layers.0.mlp.up_proj.zeros", "model.layers.0.mlp.up_proj.scales", "model.layers.0.mlp.up_proj.bias", "model.layers.0.mlp.up_proj.qweight", "model.layers.0.input_layernorm.weight", "model.layers.0.post_attention_layernorm.weight", "model.layers.1.self_attn.q_proj.zeros", "model.layers.1.self_attn.q_proj.scales", "model.layers.1.self_attn.q_proj.bias", "model.layers.1.self_attn.q_proj.qweight", "model.layers.1.self_attn.k_proj.zeros", "model.layers.1.self_attn.k_proj.scales", "model.layers.1.self_attn.k_proj.bias", "model.layers.1.self_attn.k_proj.qweight", "model.layers.1.self_attn.v_proj.zeros", "model.layers.1.self_attn.v_proj.scales", "model.layers.1.self_attn.v_proj.bias", "model.layers.1.self_attn.v_proj.qweight", "model.layers.1.self_attn.o_proj.zeros", "model.layers.1.self_attn.o_proj.scales", "model.layers.1.self_attn.o_proj.bias", "model.layers.1.self_attn.o_proj.qweight", "model.layers.1.self_attn.rotary_emb.inv_freq", "model.layers.1.mlp.gate_proj.zeros", "model.layers.1.mlp.gate_proj.scales", "model.layers.1.mlp.gate_proj.bias", "model.layers.1.mlp.gate_proj.qweight", "model.layers.1.mlp.down_proj.zeros", "model.layers.1.mlp.down_proj.scales", "model.layers.1.mlp.down_proj.bias", "model.layers.1.mlp.down_proj.qweight", "model.layers.1.mlp.up_proj.zeros", "model.layers.1.mlp.up_proj.scales", "model.layers.1.mlp.up_proj.bias", "model.layers.1.mlp.up_proj.qweight", "model.layers.1.input_layernorm.weight", "model.layers.1.post_attention_layernorm.weight", "model.layers.2.self_attn.q_proj.zeros", "model.layers.2.self_attn.q_proj.scales", "model.layers.2.self_attn.q_proj.bias", "model.layers.2.self_attn.q_proj.qweight", "model.layers.2.self_attn.k_proj.zeros", "model.layers.2.self_attn.k_proj.scales", "model.layers.2.self_attn.k_proj.bias", "model.layers.2.self_attn.k_proj.qweight", "model.layers.2.self_attn.v_proj.zeros", "model.layers.2.self_attn.v_proj.scales", "model.layers.2.self_attn.v_proj.bias", "model.layers.2.self_attn.v_proj.qweight", "model.layers.2.self_attn.o_proj.zeros", "model.layers.2.self_attn.o_proj.scales", "model.layers.2.self_attn.o_proj.bias", "model.layers.2.self_attn.o_proj.qweight", "model.layers.2.self_attn.rotary_emb.inv_freq", "model.layers.2.mlp.gate_proj.zeros", "model.layers.2.mlp.gate_proj.scales", "model.layers.2.mlp.gate_proj.bias", "model.layers.2.mlp.gate_proj.qweight", "model.layers.2.mlp.down_proj.zeros", "model.layers.2.mlp.down_proj.scales", "model.layers.2.mlp.down_proj.bias", "model.layers.2.mlp.down_proj.qweight", "model.layers.2.mlp.up_proj.zeros", "model.layers.2.mlp.up_proj.scales", "model.layers.2.mlp.up_proj.bias", "model.layers.2.mlp.up_proj.qweight", "model.layers.2.input_layernorm.weight", "model.layers.2.post_attention_layernorm.weight", "model.layers.3.self_attn.q_proj.zeros", "model.layers.3.self_attn.q_proj.scales", "model.layers.3.self_attn.q_proj.bias", "model.layers.3.self_attn.q_proj.qweight", "model.layers.3.self_attn.k_proj.zeros", "model.layers.3.self_attn.k_proj.scales", "model.layers.3.self_attn.k_proj.bias", "model.layers.3.self_attn.k_proj.qweight", "model.layers.3.self_attn.v_proj.zeros", "model.layers.3.self_attn.v_proj.scales", "model.layers.3.self_attn.v_proj.bias", "model.layers.3.self_attn.v_proj.qweight", "model.layers.3.self_attn.o_proj.zeros", "model.layers.3.self_attn.o_proj.scales", "model.layers.3.self_attn.o_proj.bias", "model.layers.3.self_attn.o_proj.qweight", "model.layers.3.self_attn.rotary_emb.inv_freq", "model.layers.3.mlp.gate_proj.zeros", "model.layers.3.mlp.gate_proj.scales", "model.layers.3.mlp.gate_proj.bias", "model.layers.3.mlp.gate_proj.qweight", "model.layers.3.mlp.down_proj.zeros", "model.layers.3.mlp.down_proj.scales", "model.layers.3.mlp.down_proj.bias", "model.layers.3.mlp.down_proj.qweight", "model.layers.3.mlp.up_proj.zeros", "model.layers.3.mlp.up_proj.scales", "model.layers.3.mlp.up_proj.bias", "model.layers.3.mlp.up_proj.qweight", "model.layers.3.input_layernorm.weight", "model.layers.3.post_attention_layernorm.weight", "model.layers.4.self_attn.q_proj.zeros", "model.layers.4.self_attn.q_proj.scales", "model.layers.4.self_attn.q_proj.bias", "model.layers.4.self_attn.q_proj.qweight", "model.layers.4.self_attn.k_proj.zeros", "model.layers.4.self_attn.k_proj.scales", "model.layers.4.self_attn.k_proj.bias", "model.layers.4.self_attn.k_proj.qweight", "model.layers.4.self_attn.v_proj.zeros", "model.layers.4.self_attn.v_proj.scales", "model.layers.4.self_attn.v_proj.bias", "model.layers.4.self_attn.v_proj.qweight", "model.layers.4.self_attn.o_proj.zeros", "model.layers.4.self_attn.o_proj.scales", "model.layers.4.self_attn.o_proj.bias", "model.layers.4.self_attn.o_proj.qweight", "model.layers.4.self_attn.rotary_emb.inv_freq", "model.layers.4.mlp.gate_proj.zeros", "model.layers.4.mlp.gate_proj.scales", "model.layers.4.mlp.gate_proj.bias", "model.layers.4.mlp.gate_proj.qweight", "model.layers.4.mlp.down_proj.zeros", "model.layers.4.mlp.down_proj.scales", "model.layers.4.mlp.down_proj.bias", "model.layers.4.mlp.down_proj.qweight", "model.layers.4.mlp.up_proj.zeros", "model.layers.4.mlp.up_proj.scales", "model.layers.4.mlp.up_proj.bias", "model.layers.4.mlp.up_proj.qweight", "model.layers.4.input_layernorm.weight", "model.layers.4.post_attention_layernorm.weight", "model.layers.5.self_attn.q_proj.zeros", "model.layers.5.self_attn.q_proj.scales", "model.layers.5.self_attn.q_proj.bias", "model.layers.5.self_attn.q_proj.qweight", "model.layers.5.self_attn.k_proj.zeros", "model.layers.5.self_attn.k_proj.scales", "model.layers.5.self_attn.k_proj.bias", "model.layers.5.self_attn.k_proj.qweight", "model.layers.5.self_attn.v_proj.zeros", "model.layers.5.self_attn.v_proj.scales", "model.layers.5.self_attn.v_proj.bias", "model.layers.5.self_attn.v_proj.qweight", "model.layers.5.self_attn.o_proj.zeros", "model.layers.5.self_attn.o_proj.scales", "model.layers.5.self_attn.o_proj.bias", "model.layers.5.self_attn.o_proj.qweight", "model.layers.5.self_attn.rotary_emb.inv_freq", "model.layers.5.mlp.gate_proj.zeros", "model.layers.5.mlp.gate_proj.scales", "model.layers.5.mlp.gate_proj.bias", "model.layers.5.mlp.gate_proj.qweight", "model.layers.5.mlp.down_proj.zeros", "model.layers.5.mlp.down_proj.scales", "model.layers.5.mlp.down_proj.bias", "model.layers.5.mlp.down_proj.qweight", "model.layers.5.mlp.up_proj.zeros", "model.layers.5.mlp.up_proj.scales", "model.layers.5.mlp.up_proj.bias", "model.layers.5.mlp.up_proj.qweight", "model.layers.5.input_layernorm.weight", "model.layers.5.post_attention_layernorm.weight", "model.layers.6.self_attn.q_proj.zeros", "model.layers.6.self_attn.q_proj.scales", "model.layers.6.self_attn.q_proj.bias", "model.layers.6.self_attn.q_proj.qweight", "model.layers.6.self_attn.k_proj.zeros", "model.layers.6.self_attn.k_proj.scales", "model.layers.6.self_attn.k_proj.bias", "model.layers.6.self_attn.k_proj.qweight", "model.layers.6.self_attn.v_proj.zeros", "model.layers.6.self_attn.v_proj.scales", "model.layers.6.self_attn.v_proj.bias", "model.layers.6.self_attn.v_proj.qweight", "model.layers.6.self_attn.o_proj.zeros", "model.layers.6.self_attn.o_proj.scales", "model.layers.6.self_attn.o_proj.bias", "model.layers.6.self_attn.o_proj.qweight", "model.layers.6.self_attn.rotary_emb.inv_freq", "model.layers.6.mlp.gate_proj.zeros", "model.layers.6.mlp.gate_proj.scales", "model.layers.6.mlp.gate_proj.bias", "model.layers.6.mlp.gate_proj.qweight", "model.layers.6.mlp.down_proj.zeros", "model.layers.6.mlp.down_proj.scales", "model.layers.6.mlp.down_proj.bias", "model.layers.6.mlp.down_proj.qweight", "model.layers.6.mlp.up_proj.zeros", "model.layers.6.mlp.up_proj.scales", "model.layers.6.mlp.up_proj.bias", "model.layers.6.mlp.up_proj.qweight", "model.layers.6.input_layernorm.weight", "model.layers.6.post_attention_layernorm.weight", "model.layers.7.self_attn.q_proj.zeros", "model.layers.7.self_attn.q_proj.scales", "model.layers.7.self_attn.q_proj.bias", "model.layers.7.self_attn.q_proj.qweight", "model.layers.7.self_attn.k_proj.zeros", "model.layers.7.self_attn.k_proj.scales", "model.layers.7.self_attn.k_proj.bias", "model.layers.7.self_attn.k_proj.qweight", "model.layers.7.self_attn.v_proj.zeros", "model.layers.7.self_attn.v_proj.scales", "model.layers.7.self_attn.v_proj.bias", "model.layers.7.self_attn.v_proj.qweight", "model.layers.7.self_attn.o_proj.zeros", "model.layers.7.self_attn.o_proj.scales", "model.layers.7.self_attn.o_proj.bias", "model.layers.7.self_attn.o_proj.qweight", "model.layers.7.self_attn.rotary_emb.inv_freq", "model.layers.7.mlp.gate_proj.zeros", "model.layers.7.mlp.gate_proj.scales", "model.layers.7.mlp.gate_proj.bias", "model.layers.7.mlp.gate_proj.qweight", "model.layers.7.mlp.down_proj.zeros", "model.layers.7.mlp.down_proj.scales", "model.layers.7.mlp.down_proj.bias", "model.layers.7.mlp.down_proj.qweight", "model.layers.7.mlp.up_proj.zeros", "model.layers.7.mlp.up_proj.scales", "model.layers.7.mlp.up_proj.bias", "model.layers.7.mlp.up_proj.qweight", "model.layers.7.input_layernorm.weight", "model.layers.7.post_attention_layernorm.weight", "model.layers.8.self_attn.q_proj.zeros", "model.layers.8.self_attn.q_proj.scales", "model.layers.8.self_attn.q_proj.bias", "model.layers.8.self_attn.q_proj.qweight", "model.layers.8.self_attn.k_proj.zeros", "model.layers.8.self_attn.k_proj.scales", "model.layers.8.self_attn.k_proj.bias", "model.layers.8.self_attn.k_proj.qweight", "model.layers.8.self_attn.v_proj.zeros", "model.layers.8.self_attn.v_proj.scales", "model.layers.8.self_attn.v_proj.bias", "model.layers.8.self_attn.v_proj.qweight", "model.layers.8.self_attn.o_proj.zeros", "model.layers.8.self_attn.o_proj.scales", "model.layers.8.self_attn.o_proj.bias", "model.layers.8.self_attn.o_proj.qweight", "model.layers.8.self_attn.rotary_emb.inv_freq", "model.layers.8.mlp.gate_proj.zeros", "model.layers.8.mlp.gate_proj.scales", "model.layers.8.mlp.gate_proj.bias", "model.layers.8.mlp.gate_proj.qweight", "model.layers.8.mlp.down_proj.zeros", "model.layers.8.mlp.down_proj.scales", "model.layers.8.mlp.down_proj.bias", "model.layers.8.mlp.down_proj.qweight", "model.layers.8.mlp.up_proj.zeros", "model.layers.8.mlp.up_proj.scales", "model.layers.8.mlp.up_proj.bias", "model.layers.8.mlp.up_proj.qweight", "model.layers.8.input_layernorm.weight", "model.layers.8.post_attention_layernorm.weight", "model.layers.9.self_attn.q_proj.zeros", "model.layers.9.self_attn.q_proj.scales", "model.layers.9.self_attn.q_proj.bias", "model.layers.9.self_attn.q_proj.qweight", "model.layers.9.self_attn.k_proj.zeros", "model.layers.9.self_attn.k_proj.scales", "model.layers.9.self_attn.k_proj.bias", "model.layers.9.self_attn.k_proj.qweight", "model.layers.9.self_attn.v_proj.zeros", "model.layers.9.self_attn.v_proj.scales", "model.layers.9.self_attn.v_proj.bias", "model.layers.9.self_attn.v_proj.qweight", "model.layers.9.self_attn.o_proj.zeros", "model.layers.9.self_attn.o_proj.scales", "model.layers.9.self_attn.o_proj.bias", "model.layers.9.self_attn.o_proj.qweight", "model.layers.9.self_attn.rotary_emb.inv_freq", "model.layers.9.mlp.gate_proj.zeros", "model.layers.9.mlp.gate_proj.scales", "model.layers.9.mlp.gate_proj.bias", "model.layers.9.mlp.gate_proj.qweight", "model.layers.9.mlp.down_proj.zeros", "model.layers.9.mlp.down_proj.scales", "model.layers.9.mlp.down_proj.bias", "model.layers.9.mlp.down_proj.qweight", "model.layers.9.mlp.up_proj.zeros", "model.layers.9.mlp.up_proj.scales", "model.layers.9.mlp.up_proj.bias", "model.layers.9.mlp.up_proj.qweight", "model.layers.9.input_layernorm.weight", "model.layers.9.post_attention_layernorm.weight", "model.layers.10.self_attn.q_proj.zeros", "model.layers.10.self_attn.q_proj.scales", "model.layers.10.self_attn.q_proj.bias", "model.layers.10.self_attn.q_proj.qweight", "model.layers.10.self_attn.k_proj.zeros", "model.layers.10.self_attn.k_proj.scales", "model.layers.10.self_attn.k_proj.bias", "model.layers.10.self_attn.k_proj.qweight", "model.layers.10.self_attn.v_proj.zeros", "model.layers.10.self_attn.v_proj.scales", "model.layers.10.self_attn.v_proj.bias", "model.layers.10.self_attn.v_proj.qweight", "model.layers.10.self_attn.o_proj.zeros", "model.layers.10.self_attn.o_proj.scales", "model.layers.10.self_attn.o_proj.bias", "model.layers.10.self_attn.o_proj.qweight", "model.layers.10.self_attn.rotary_emb.inv_freq", "model.layers.10.mlp.gate_proj.zeros", "model.layers.10.mlp.gate_proj.scales", "model.layers.10.mlp.gate_proj.bias", "model.layers.10.mlp.gate_proj.qweight", "model.layers.10.mlp.down_proj.zeros", "model.layers.10.mlp.down_proj.scales", "model.layers.10.mlp.down_proj.bias", "model.layers.10.mlp.down_proj.qweight", "model.layers.10.mlp.up_proj.zeros", "model.layers.10.mlp.up_proj.scales", "model.layers.10.mlp.up_proj.bias", "model.layers.10.mlp.up_proj.qweight", "model.layers.10.input_layernorm.weight", "model.layers.10.post_attention_layernorm.weight", "model.layers.11.self_attn.q_proj.zeros", "model.layers.11.self_attn.q_proj.scales", "model.layers.11.self_attn.q_proj.bias", "model.layers.11.self_attn.q_proj.qweight", "model.layers.11.self_attn.k_proj.zeros", "model.layers.11.self_attn.k_proj.scales", "model.layers.11.self_attn.k_proj.bias", "model.layers.11.self_attn.k_proj.qweight", "model.layers.11.self_attn.v_proj.zeros", "model.layers.11.self_attn.v_proj.scales", "model.layers.11.self_attn.v_proj.bias", "model.layers.11.self_attn.v_proj.qweight", "model.layers.11.self_attn.o_proj.zeros", "model.layers.11.self_attn.o_proj.scales", "model.layers.11.self_attn.o_proj.bias", "model.layers.11.self_attn.o_proj.qweight", "model.layers.11.self_attn.rotary_emb.inv_freq", "model.layers.11.mlp.gate_proj.zeros", "model.layers.11.mlp.gate_proj.scales", "model.layers.11.mlp.gate_proj.bias", "model.layers.11.mlp.gate_proj.qweight", "model.layers.11.mlp.down_proj.zeros", "model.layers.11.mlp.down_proj.scales", "model.layers.11.mlp.down_proj.bias", "model.layers.11.mlp.down_proj.qweight", "model.layers.11.mlp.up_proj.zeros", "model.layers.11.mlp.up_proj.scales", "model.layers.11.mlp.up_proj.bias", "model.layers.11.mlp.up_proj.qweight", "model.layers.11.input_layernorm.weight", "model.layers.11.post_attention_layernorm.weight", "model.layers.12.self_attn.q_proj.zeros", "model.layers.12.self_attn.q_proj.scales", "model.layers.12.self_attn.q_proj.bias", "model.layers.12.self_attn.q_proj.qweight", "model.layers.12.self_attn.k_proj.zeros", "model.layers.12.self_attn.k_proj.scales", "model.layers.12.self_attn.k_proj.bias", "model.layers.12.self_attn.k_proj.qweight", "model.layers.12.self_attn.v_proj.zeros", "model.layers.12.self_attn.v_proj.scales", "model.layers.12.self_attn.v_proj.bias", "model.layers.12.self_attn.v_proj.qweight", "model.layers.12.self_attn.o_proj.zeros", "model.layers.12.self_attn.o_proj.scales", "model.layers.12.self_attn.o_proj.bias", "model.layers.12.self_attn.o_proj.qweight", "model.layers.12.self_attn.rotary_emb.inv_freq", "model.layers.12.mlp.gate_proj.zeros", "model.layers.12.mlp.gate_proj.scales", "model.layers.12.mlp.gate_proj.bias", "model.layers.12.mlp.gate_proj.qweight", "model.layers.12.mlp.down_proj.zeros", "model.layers.12.mlp.down_proj.scales", "model.layers.12.mlp.down_proj.bias", "model.layers.12.mlp.down_proj.qweight", "model.layers.12.mlp.up_proj.zeros", "model.layers.12.mlp.up_proj.scales", "model.layers.12.mlp.up_proj.bias", "model.layers.12.mlp.up_proj.qweight", "model.layers.12.input_layernorm.weight", "model.layers.12.post_attention_layernorm.weight", "model.layers.13.self_attn.q_proj.zeros", "model.layers.13.self_attn.q_proj.scales", "model.layers.13.self_attn.q_proj.bias", "model.layers.13.self_attn.q_proj.qweight", "model.layers.13.self_attn.k_proj.zeros", "model.layers.13.self_attn.k_proj.scales", "model.layers.13.self_attn.k_proj.bias", "model.layers.13.self_attn.k_proj.qweight", "model.layers.13.self_attn.v_proj.zeros", "model.layers.13.self_attn.v_proj.scales", "model.layers.13.self_attn.v_proj.bias", "model.layers.13.self_attn.v_proj.qweight", "model.layers.13.self_attn.o_proj.zeros", "model.layers.13.self_attn.o_proj.scales", "model.layers.13.self_attn.o_proj.bias", "model.layers.13.self_attn.o_proj.qweight", "model.layers.13.self_attn.rotary_emb.inv_freq", "model.layers.13.mlp.gate_proj.zeros", "model.layers.13.mlp.gate_proj.scales", "model.layers.13.mlp.gate_proj.bias", "model.layers.13.mlp.gate_proj.qweight", "model.layers.13.mlp.down_proj.zeros", "model.layers.13.mlp.down_proj.scales", "model.layers.13.mlp.down_proj.bias", "model.layers.13.mlp.down_proj.qweight", "model.layers.13.mlp.up_proj.zeros", "model.layers.13.mlp.up_proj.scales", "model.layers.13.mlp.up_proj.bias", "model.layers.13.mlp.up_proj.qweight", "model.layers.13.input_layernorm.weight", "model.layers.13.post_attention_layernorm.weight", "model.layers.14.self_attn.q_proj.zeros", "model.layers.14.self_attn.q_proj.scales", "model.layers.14.self_attn.q_proj.bias", "model.layers.14.self_attn.q_proj.qweight", "model.layers.14.self_attn.k_proj.zeros", "model.layers.14.self_attn.k_proj.scales", "model.layers.14.self_attn.k_proj.bias", "model.layers.14.self_attn.k_proj.qweight", "model.layers.14.self_attn.v_proj.zeros", "model.layers.14.self_attn.v_proj.scales", "model.layers.14.self_attn.v_proj.bias", "model.layers.14.self_attn.v_proj.qweight", "model.layers.14.self_attn.o_proj.zeros", "model.layers.14.self_attn.o_proj.scales", "model.layers.14.self_attn.o_proj.bias", "model.layers.14.self_attn.o_proj.qweight", "model.layers.14.self_attn.rotary_emb.inv_freq", "model.layers.14.mlp.gate_proj.zeros", "model.layers.14.mlp.gate_proj.scales", "model.layers.14.mlp.gate_proj.bias", "model.layers.14.mlp.gate_proj.qweight", "model.layers.14.mlp.down_proj.zeros", "model.layers.14.mlp.down_proj.scales", "model.layers.14.mlp.down_proj.bias", "model.layers.14.mlp.down_proj.qweight", "model.layers.14.mlp.up_proj.zeros", "model.layers.14.mlp.up_proj.scales", "model.layers.14.mlp.up_proj.bias", "model.layers.14.mlp.up_proj.qweight", "model.layers.14.input_layernorm.weight", "model.layers.14.post_attention_layernorm.weight", "model.layers.15.self_attn.q_proj.zeros", "model.layers.15.self_attn.q_proj.scales", "model.layers.15.self_attn.q_proj.bias", "model.layers.15.self_attn.q_proj.qweight", "model.layers.15.self_attn.k_proj.zeros", "model.layers.15.self_attn.k_proj.scales", "model.layers.15.self_attn.k_proj.bias", "model.layers.15.self_attn.k_proj.qweight", "model.layers.15.self_attn.v_proj.zeros", "model.layers.15.self_attn.v_proj.scales", "model.layers.15.self_attn.v_proj.bias", "model.layers.15.self_attn.v_proj.qweight", "model.layers.15.self_attn.o_proj.zeros", "model.layers.15.self_attn.o_proj.scales", "model.layers.15.self_attn.o_proj.bias", "model.layers.15.self_attn.o_proj.qweight", "model.layers.15.self_attn.rotary_emb.inv_freq", "model.layers.15.mlp.gate_proj.zeros", "model.layers.15.mlp.gate_proj.scales", "model.layers.15.mlp.gate_proj.bias", "model.layers.15.mlp.gate_proj.qweight", "model.layers.15.mlp.down_proj.zeros", "model.layers.15.mlp.down_proj.scales", "model.layers.15.mlp.down_proj.bias", "model.layers.15.mlp.down_proj.qweight", "model.layers.15.mlp.up_proj.zeros", "model.layers.15.mlp.up_proj.scales", "model.layers.15.mlp.up_proj.bias", "model.layers.15.mlp.up_proj.qweight", "model.layers.15.input_layernorm.weight", "model.layers.15.post_attention_layernorm.weight", "model.layers.16.self_attn.q_proj.zeros", "model.layers.16.self_attn.q_proj.scales", "model.layers.16.self_attn.q_proj.bias", "model.layers.16.self_attn.q_proj.qweight", "model.layers.16.self_attn.k_proj.zeros", "model.layers.16.self_attn.k_proj.scales", "model.layers.16.self_attn.k_proj.bias", "model.layers.16.self_attn.k_proj.qweight", "model.layers.16.self_attn.v_proj.zeros", "model.layers.16.self_attn.v_proj.scales", "model.layers.16.self_attn.v_proj.bias", "model.layers.16.self_attn.v_proj.qweight", "model.layers.16.self_attn.o_proj.zeros", "model.layers.16.self_attn.o_proj.scales", "model.layers.16.self_attn.o_proj.bias", "model.layers.16.self_attn.o_proj.qweight", "model.layers.16.self_attn.rotary_emb.inv_freq", "model.layers.16.mlp.gate_proj.zeros", "model.layers.16.mlp.gate_proj.scales", "model.layers.16.mlp.gate_proj.bias", "model.layers.16.mlp.gate_proj.qweight", "model.layers.16.mlp.down_proj.zeros", "model.layers.16.mlp.down_proj.scales", "model.layers.16.mlp.down_proj.bias", "model.layers.16.mlp.down_proj.qweight", "model.layers.16.mlp.up_proj.zeros", "model.layers.16.mlp.up_proj.scales", "model.layers.16.mlp.up_proj.bias", "model.layers.16.mlp.up_proj.qweight", "model.layers.16.input_layernorm.weight", "model.layers.16.post_attention_layernorm.weight", "model.layers.17.self_attn.q_proj.zeros", "model.layers.17.self_attn.q_proj.scales", "model.layers.17.self_attn.q_proj.bias", "model.layers.17.self_attn.q_proj.qweight", "model.layers.17.self_attn.k_proj.zeros", "model.layers.17.self_attn.k_proj.scales", "model.layers.17.self_attn.k_proj.bias", "model.layers.17.self_attn.k_proj.qweight", "model.layers.17.self_attn.v_proj.zeros", "model.layers.17.self_attn.v_proj.scales", "model.layers.17.self_attn.v_proj.bias", "model.layers.17.self_attn.v_proj.qweight", "model.layers.17.self_attn.o_proj.zeros", "model.layers.17.self_attn.o_proj.scales", "model.layers.17.self_attn.o_proj.bias", "model.layers.17.self_attn.o_proj.qweight", "model.layers.17.self_attn.rotary_emb.inv_freq", "model.layers.17.mlp.gate_proj.zeros", "model.layers.17.mlp.gate_proj.scales", "model.layers.17.mlp.gate_proj.bias", "model.layers.17.mlp.gate_proj.qweight", "model.layers.17.mlp.down_proj.zeros", "model.layers.17.mlp.down_proj.scales", "model.layers.17.mlp.down_proj.bias", "model.layers.17.mlp.down_proj.qweight", "model.layers.17.mlp.up_proj.zeros", "model.layers.17.mlp.up_proj.scales", "model.layers.17.mlp.up_proj.bias", "model.layers.17.mlp.up_proj.qweight", "model.layers.17.input_layernorm.weight", "model.layers.17.post_attention_layernorm.weight", "model.layers.18.self_attn.q_proj.zeros", "model.layers.18.self_attn.q_proj.scales", "model.layers.18.self_attn.q_proj.bias", "model.layers.18.self_attn.q_proj.qweight", "model.layers.18.self_attn.k_proj.zeros", "model.layers.18.self_attn.k_proj.scales", "model.layers.18.self_attn.k_proj.bias", "model.layers.18.self_attn.k_proj.qweight", "model.layers.18.self_attn.v_proj.zeros", "model.layers.18.self_attn.v_proj.scales", "model.layers.18.self_attn.v_proj.bias", "model.layers.18.self_attn.v_proj.qweight", "model.layers.18.self_attn.o_proj.zeros", "model.layers.18.self_attn.o_proj.scales", "model.layers.18.self_attn.o_proj.bias", "model.layers.18.self_attn.o_proj.qweight", "model.layers.18.self_attn.rotary_emb.inv_freq", "model.layers.18.mlp.gate_proj.zeros", "model.layers.18.mlp.gate_proj.scales", "model.layers.18.mlp.gate_proj.bias", "model.layers.18.mlp.gate_proj.qweight", "model.layers.18.mlp.down_proj.zeros", "model.layers.18.mlp.down_proj.scales", "model.layers.18.mlp.down_proj.bias", "model.layers.18.mlp.down_proj.qweight", "model.layers.18.mlp.up_proj.zeros", "model.layers.18.mlp.up_proj.scales", "model.layers.18.mlp.up_proj.bias", "model.layers.18.mlp.up_proj.qweight", "model.layers.18.input_layernorm.weight", "model.layers.18.post_attention_layernorm.weight", "model.layers.19.self_attn.q_proj.zeros", "model.layers.19.self_attn.q_proj.scales", "model.layers.19.self_attn.q_proj.bias", "model.layers.19.self_attn.q_proj.qweight", "model.layers.19.self_attn.k_proj.zeros", "model.layers.19.self_attn.k_proj.scales", "model.layers.19.self_attn.k_proj.bias", "model.layers.19.self_attn.k_proj.qweight", "model.layers.19.self_attn.v_proj.zeros", "model.layers.19.self_attn.v_proj.scales", "model.layers.19.self_attn.v_proj.bias", "model.layers.19.self_attn.v_proj.qweight", "model.layers.19.self_attn.o_proj.zeros", "model.layers.19.self_attn.o_proj.scales", "model.layers.19.self_attn.o_proj.bias", "model.layers.19.self_attn.o_proj.qweight", "model.layers.19.self_attn.rotary_emb.inv_freq", "model.layers.19.mlp.gate_proj.zeros", "model.layers.19.mlp.gate_proj.scales", "model.layers.19.mlp.gate_proj.bias", "model.layers.19.mlp.gate_proj.qweight", "model.layers.19.mlp.down_proj.zeros", "model.layers.19.mlp.down_proj.scales", "model.layers.19.mlp.down_proj.bias", "model.layers.19.mlp.down_proj.qweight", "model.layers.19.mlp.up_proj.zeros", "model.layers.19.mlp.up_proj.scales", "model.layers.19.mlp.up_proj.bias", "model.layers.19.mlp.up_proj.qweight", "model.layers.19.input_layernorm.weight", "model.layers.19.post_attention_layernorm.weight", "model.layers.20.self_attn.q_proj.zeros", "model.layers.20.self_attn.q_proj.scales", "model.layers.20.self_attn.q_proj.bias", "model.layers.20.self_attn.q_proj.qweight", "model.layers.20.self_attn.k_proj.zeros", "model.layers.20.self_attn.k_proj.scales", "model.layers.20.self_attn.k_proj.bias", "model.layers.20.self_attn.k_proj.qweight", "model.layers.20.self_attn.v_proj.zeros", "model.layers.20.self_attn.v_proj.scales", "model.layers.20.self_attn.v_proj.bias", "model.layers.20.self_attn.v_proj.qweight", "model.layers.20.self_attn.o_proj.zeros", "model.layers.20.self_attn.o_proj.scales", "model.layers.20.self_attn.o_proj.bias", "model.layers.20.self_attn.o_proj.qweight", "model.layers.20.self_attn.rotary_emb.inv_freq", "model.layers.20.mlp.gate_proj.zeros", "model.layers.20.mlp.gate_proj.scales", "model.layers.20.mlp.gate_proj.bias", "model.layers.20.mlp.gate_proj.qweight", "model.layers.20.mlp.down_proj.zeros", "model.layers.20.mlp.down_proj.scales", "model.layers.20.mlp.down_proj.bias", "model.layers.20.mlp.down_proj.qweight", "model.layers.20.mlp.up_proj.zeros", "model.layers.20.mlp.up_proj.scales", "model.layers.20.mlp.up_proj.bias", "model.layers.20.mlp.up_proj.qweight", "model.layers.20.input_layernorm.weight", "model.layers.20.post_attention_layernorm.weight", "model.layers.21.self_attn.q_proj.zeros", "model.layers.21.self_attn.q_proj.scales", "model.layers.21.self_attn.q_proj.bias", "model.layers.21.self_attn.q_proj.qweight", "model.layers.21.self_attn.k_proj.zeros", "model.layers.21.self_attn.k_proj.scales", "model.layers.21.self_attn.k_proj.bias", "model.layers.21.self_attn.k_proj.qweight", "model.layers.21.self_attn.v_proj.zeros", "model.layers.21.self_attn.v_proj.scales", "model.layers.21.self_attn.v_proj.bias", "model.layers.21.self_attn.v_proj.qweight", "model.layers.21.self_attn.o_proj.zeros", "model.layers.21.self_attn.o_proj.scales", "model.layers.21.self_attn.o_proj.bias", "model.layers.21.self_attn.o_proj.qweight", "model.layers.21.self_attn.rotary_emb.inv_freq", "model.layers.21.mlp.gate_proj.zeros", "model.layers.21.mlp.gate_proj.scales", "model.layers.21.mlp.gate_proj.bias", "model.layers.21.mlp.gate_proj.qweight", "model.layers.21.mlp.down_proj.zeros", "model.layers.21.mlp.down_proj.scales", "model.layers.21.mlp.down_proj.bias", "model.layers.21.mlp.down_proj.qweight", "model.layers.21.mlp.up_proj.zeros", "model.layers.21.mlp.up_proj.scales", "model.layers.21.mlp.up_proj.bias", "model.layers.21.mlp.up_proj.qweight", "model.layers.21.input_layernorm.weight", "model.layers.21.post_attention_layernorm.weight", "model.layers.22.self_attn.q_proj.zeros", "model.layers.22.self_attn.q_proj.scales", "model.layers.22.self_attn.q_proj.bias", "model.layers.22.self_attn.q_proj.qweight", "model.layers.22.self_attn.k_proj.zeros", "model.layers.22.self_attn.k_proj.scales", "model.layers.22.self_attn.k_proj.bias", "model.layers.22.self_attn.k_proj.qweight", "model.layers.22.self_attn.v_proj.zeros", "model.layers.22.self_attn.v_proj.scales", "model.layers.22.self_attn.v_proj.bias", "model.layers.22.self_attn.v_proj.qweight", "model.layers.22.self_attn.o_proj.zeros", "model.layers.22.self_attn.o_proj.scales", "model.layers.22.self_attn.o_proj.bias", "model.layers.22.self_attn.o_proj.qweight", "model.layers.22.self_attn.rotary_emb.inv_freq", "model.layers.22.mlp.gate_proj.zeros", "model.layers.22.mlp.gate_proj.scales", "model.layers.22.mlp.gate_proj.bias", "model.layers.22.mlp.gate_proj.qweight", "model.layers.22.mlp.down_proj.zeros", "model.layers.22.mlp.down_proj.scales", "model.layers.22.mlp.down_proj.bias", "model.layers.22.mlp.down_proj.qweight", "model.layers.22.mlp.up_proj.zeros", "model.layers.22.mlp.up_proj.scales", "model.layers.22.mlp.up_proj.bias", "model.layers.22.mlp.up_proj.qweight", "model.layers.22.input_layernorm.weight", "model.layers.22.post_attention_layernorm.weight", "model.layers.23.self_attn.q_proj.zeros", "model.layers.23.self_attn.q_proj.scales", "model.layers.23.self_attn.q_proj.bias", "model.layers.23.self_attn.q_proj.qweight", "model.layers.23.self_attn.k_proj.zeros", "model.layers.23.self_attn.k_proj.scales", "model.layers.23.self_attn.k_proj.bias", "model.layers.23.self_attn.k_proj.qweight", "model.layers.23.self_attn.v_proj.zeros", "model.layers.23.self_attn.v_proj.scales", "model.layers.23.self_attn.v_proj.bias", "model.layers.23.self_attn.v_proj.qweight", "model.layers.23.self_attn.o_proj.zeros", "model.layers.23.self_attn.o_proj.scales", "model.layers.23.self_attn.o_proj.bias", "model.layers.23.self_attn.o_proj.qweight", "model.layers.23.self_attn.rotary_emb.inv_freq", "model.layers.23.mlp.gate_proj.zeros", "model.layers.23.mlp.gate_proj.scales", "model.layers.23.mlp.gate_proj.bias", "model.layers.23.mlp.gate_proj.qweight", "model.layers.23.mlp.down_proj.zeros", "model.layers.23.mlp.down_proj.scales", "model.layers.23.mlp.down_proj.bias", "model.layers.23.mlp.down_proj.qweight", "model.layers.23.mlp.up_proj.zeros", "model.layers.23.mlp.up_proj.scales", "model.layers.23.mlp.up_proj.bias", "model.layers.23.mlp.up_proj.qweight", "model.layers.23.input_layernorm.weight", "model.layers.23.post_attention_layernorm.weight", "model.layers.24.self_attn.q_proj.zeros", "model.layers.24.self_attn.q_proj.scales", "model.layers.24.self_attn.q_proj.bias", "model.layers.24.self_attn.q_proj.qweight", "model.layers.24.self_attn.k_proj.zeros", "model.layers.24.self_attn.k_proj.scales", "model.layers.24.self_attn.k_proj.bias", "model.layers.24.self_attn.k_proj.qweight", "model.layers.24.self_attn.v_proj.zeros", "model.layers.24.self_attn.v_proj.scales", "model.layers.24.self_attn.v_proj.bias", "model.layers.24.self_attn.v_proj.qweight", "model.layers.24.self_attn.o_proj.zeros", "model.layers.24.self_attn.o_proj.scales", "model.layers.24.self_attn.o_proj.bias", "model.layers.24.self_attn.o_proj.qweight", "model.layers.24.self_attn.rotary_emb.inv_freq", "model.layers.24.mlp.gate_proj.zeros", "model.layers.24.mlp.gate_proj.scales", "model.layers.24.mlp.gate_proj.bias", "model.layers.24.mlp.gate_proj.qweight", "model.layers.24.mlp.down_proj.zeros", "model.layers.24.mlp.down_proj.scales", "model.layers.24.mlp.down_proj.bias", "model.layers.24.mlp.down_proj.qweight", "model.layers.24.mlp.up_proj.zeros", "model.layers.24.mlp.up_proj.scales", "model.layers.24.mlp.up_proj.bias", "model.layers.24.mlp.up_proj.qweight", "model.layers.24.input_layernorm.weight", "model.layers.24.post_attention_layernorm.weight", "model.layers.25.self_attn.q_proj.zeros", "model.layers.25.self_attn.q_proj.scales", "model.layers.25.self_attn.q_proj.bias", "model.layers.25.self_attn.q_proj.qweight", "model.layers.25.self_attn.k_proj.zeros", "model.layers.25.self_attn.k_proj.scales", "model.layers.25.self_attn.k_proj.bias", "model.layers.25.self_attn.k_proj.qweight", "model.layers.25.self_attn.v_proj.zeros", "model.layers.25.self_attn.v_proj.scales", "model.layers.25.self_attn.v_proj.bias", "model.layers.25.self_attn.v_proj.qweight", "model.layers.25.self_attn.o_proj.zeros", "model.layers.25.self_attn.o_proj.scales", "model.layers.25.self_attn.o_proj.bias", "model.layers.25.self_attn.o_proj.qweight", "model.layers.25.self_attn.rotary_emb.inv_freq", "model.layers.25.mlp.gate_proj.zeros", "model.layers.25.mlp.gate_proj.scales", "model.layers.25.mlp.gate_proj.bias", "model.layers.25.mlp.gate_proj.qweight", "model.layers.25.mlp.down_proj.zeros", "model.layers.25.mlp.down_proj.scales", "model.layers.25.mlp.down_proj.bias", "model.layers.25.mlp.down_proj.qweight", "model.layers.25.mlp.up_proj.zeros", "model.layers.25.mlp.up_proj.scales", "model.layers.25.mlp.up_proj.bias", "model.layers.25.mlp.up_proj.qweight", "model.layers.25.input_layernorm.weight", "model.layers.25.post_attention_layernorm.weight", "model.layers.26.self_attn.q_proj.zeros", "model.layers.26.self_attn.q_proj.scales", "model.layers.26.self_attn.q_proj.bias", "model.layers.26.self_attn.q_proj.qweight", "model.layers.26.self_attn.k_proj.zeros", "model.layers.26.self_attn.k_proj.scales", "model.layers.26.self_attn.k_proj.bias", "model.layers.26.self_attn.k_proj.qweight", "model.layers.26.self_attn.v_proj.zeros", "model.layers.26.self_attn.v_proj.scales", "model.layers.26.self_attn.v_proj.bias", "model.layers.26.self_attn.v_proj.qweight", "model.layers.26.self_attn.o_proj.zeros", "model.layers.26.self_attn.o_proj.scales", "model.layers.26.self_attn.o_proj.bias", "model.layers.26.self_attn.o_proj.qweight", "model.layers.26.self_attn.rotary_emb.inv_freq", "model.layers.26.mlp.gate_proj.zeros", "model.layers.26.mlp.gate_proj.scales", "model.layers.26.mlp.gate_proj.bias", "model.layers.26.mlp.gate_proj.qweight", "model.layers.26.mlp.down_proj.zeros", "model.layers.26.mlp.down_proj.scales", "model.layers.26.mlp.down_proj.bias", "model.layers.26.mlp.down_proj.qweight", "model.layers.26.mlp.up_proj.zeros", "model.layers.26.mlp.up_proj.scales", "model.layers.26.mlp.up_proj.bias", "model.layers.26.mlp.up_proj.qweight", "model.layers.26.input_layernorm.weight", "model.layers.26.post_attention_layernorm.weight", "model.layers.27.self_attn.q_proj.zeros", "model.layers.27.self_attn.q_proj.scales", "model.layers.27.self_attn.q_proj.bias", "model.layers.27.self_attn.q_proj.qweight", "model.layers.27.self_attn.k_proj.zeros", "model.layers.27.self_attn.k_proj.scales", "model.layers.27.self_attn.k_proj.bias", "model.layers.27.self_attn.k_proj.qweight", "model.layers.27.self_attn.v_proj.zeros", "model.layers.27.self_attn.v_proj.scales", "model.layers.27.self_attn.v_proj.bias", "model.layers.27.self_attn.v_proj.qweight", "model.layers.27.self_attn.o_proj.zeros", "model.layers.27.self_attn.o_proj.scales", "model.layers.27.self_attn.o_proj.bias", "model.layers.27.self_attn.o_proj.qweight", "model.layers.27.self_attn.rotary_emb.inv_freq", "model.layers.27.mlp.gate_proj.zeros", "model.layers.27.mlp.gate_proj.scales", "model.layers.27.mlp.gate_proj.bias", "model.layers.27.mlp.gate_proj.qweight", "model.layers.27.mlp.down_proj.zeros", "model.layers.27.mlp.down_proj.scales", "model.layers.27.mlp.down_proj.bias", "model.layers.27.mlp.down_proj.qweight", "model.layers.27.mlp.up_proj.zeros", "model.layers.27.mlp.up_proj.scales", "model.layers.27.mlp.up_proj.bias", "model.layers.27.mlp.up_proj.qweight", "model.layers.27.input_layernorm.weight", "model.layers.27.post_attention_layernorm.weight", "model.layers.28.self_attn.q_proj.zeros", "model.layers.28.self_attn.q_proj.scales", "model.layers.28.self_attn.q_proj.bias", "model.layers.28.self_attn.q_proj.qweight", "model.layers.28.self_attn.k_proj.zeros", "model.layers.28.self_attn.k_proj.scales", "model.layers.28.self_attn.k_proj.bias", "model.layers.28.self_attn.k_proj.qweight", "model.layers.28.self_attn.v_proj.zeros", "model.layers.28.self_attn.v_proj.scales", "model.layers.28.self_attn.v_proj.bias", "model.layers.28.self_attn.v_proj.qweight", "model.layers.28.self_attn.o_proj.zeros", "model.layers.28.self_attn.o_proj.scales", "model.layers.28.self_attn.o_proj.bias", "model.layers.28.self_attn.o_proj.qweight", "model.layers.28.self_attn.rotary_emb.inv_freq", "model.layers.28.mlp.gate_proj.zeros", "model.layers.28.mlp.gate_proj.scales", "model.layers.28.mlp.gate_proj.bias", "model.layers.28.mlp.gate_proj.qweight", "model.layers.28.mlp.down_proj.zeros", "model.layers.28.mlp.down_proj.scales", "model.layers.28.mlp.down_proj.bias", "model.layers.28.mlp.down_proj.qweight", "model.layers.28.mlp.up_proj.zeros", "model.layers.28.mlp.up_proj.scales", "model.layers.28.mlp.up_proj.bias", "model.layers.28.mlp.up_proj.qweight", "model.layers.28.input_layernorm.weight", "model.layers.28.post_attention_layernorm.weight", "model.layers.29.self_attn.q_proj.zeros", "model.layers.29.self_attn.q_proj.scales", "model.layers.29.self_attn.q_proj.bias", "model.layers.29.self_attn.q_proj.qweight", "model.layers.29.self_attn.k_proj.zeros", "model.layers.29.self_attn.k_proj.scales", "model.layers.29.self_attn.k_proj.bias", "model.layers.29.self_attn.k_proj.qweight", "model.layers.29.self_attn.v_proj.zeros", "model.layers.29.self_attn.v_proj.scales", "model.layers.29.self_attn.v_proj.bias", "model.layers.29.self_attn.v_proj.qweight", "model.layers.29.self_attn.o_proj.zeros", "model.layers.29.self_attn.o_proj.scales", "model.layers.29.self_attn.o_proj.bias", "model.layers.29.self_attn.o_proj.qweight", "model.layers.29.self_attn.rotary_emb.inv_freq", "model.layers.29.mlp.gate_proj.zeros", "model.layers.29.mlp.gate_proj.scales", "model.layers.29.mlp.gate_proj.bias", "model.layers.29.mlp.gate_proj.qweight", "model.layers.29.mlp.down_proj.zeros", "model.layers.29.mlp.down_proj.scales", "model.layers.29.mlp.down_proj.bias", "model.layers.29.mlp.down_proj.qweight", "model.layers.29.mlp.up_proj.zeros", "model.layers.29.mlp.up_proj.scales", "model.layers.29.mlp.up_proj.bias", "model.layers.29.mlp.up_proj.qweight", "model.layers.29.input_layernorm.weight", "model.layers.29.post_attention_layernorm.weight", "model.layers.30.self_attn.q_proj.zeros", "model.layers.30.self_attn.q_proj.scales", "model.layers.30.self_attn.q_proj.bias", "model.layers.30.self_attn.q_proj.qweight", "model.layers.30.self_attn.k_proj.zeros", "model.layers.30.self_attn.k_proj.scales", "model.layers.30.self_attn.k_proj.bias", "model.layers.30.self_attn.k_proj.qweight", "model.layers.30.self_attn.v_proj.zeros", "model.layers.30.self_attn.v_proj.scales", "model.layers.30.self_attn.v_proj.bias", "model.layers.30.self_attn.v_proj.qweight", "model.layers.30.self_attn.o_proj.zeros", "model.layers.30.self_attn.o_proj.scales", "model.layers.30.self_attn.o_proj.bias", "model.layers.30.self_attn.o_proj.qweight", "model.layers.30.self_attn.rotary_emb.inv_freq", "model.layers.30.mlp.gate_proj.zeros", "model.layers.30.mlp.gate_proj.scales", "model.layers.30.mlp.gate_proj.bias", "model.layers.30.mlp.gate_proj.qweight", "model.layers.30.mlp.down_proj.zeros", "model.layers.30.mlp.down_proj.scales", "model.layers.30.mlp.down_proj.bias", "model.layers.30.mlp.down_proj.qweight", "model.layers.30.mlp.up_proj.zeros", "model.layers.30.mlp.up_proj.scales", "model.layers.30.mlp.up_proj.bias", "model.layers.30.mlp.up_proj.qweight", "model.layers.30.input_layernorm.weight", "model.layers.30.post_attention_layernorm.weight", "model.layers.31.self_attn.q_proj.zeros", "model.layers.31.self_attn.q_proj.scales", "model.layers.31.self_attn.q_proj.bias", "model.layers.31.self_attn.q_proj.qweight", "model.layers.31.self_attn.k_proj.zeros", "model.layers.31.self_attn.k_proj.scales", "model.layers.31.self_attn.k_proj.bias", "model.layers.31.self_attn.k_proj.qweight", "model.layers.31.self_attn.v_proj.zeros", "model.layers.31.self_attn.v_proj.scales", "model.layers.31.self_attn.v_proj.bias", "model.layers.31.self_attn.v_proj.qweight", "model.layers.31.self_attn.o_proj.zeros", "model.layers.31.self_attn.o_proj.scales", "model.layers.31.self_attn.o_proj.bias", "model.layers.31.self_attn.o_proj.qweight", "model.layers.31.self_attn.rotary_emb.inv_freq", "model.layers.31.mlp.gate_proj.zeros", "model.layers.31.mlp.gate_proj.scales", "model.layers.31.mlp.gate_proj.bias", "model.layers.31.mlp.gate_proj.qweight", "model.layers.31.mlp.down_proj.zeros", "model.layers.31.mlp.down_proj.scales", "model.layers.31.mlp.down_proj.bias", "model.layers.31.mlp.down_proj.qweight", "model.layers.31.mlp.up_proj.zeros", "model.layers.31.mlp.up_proj.scales", "model.layers.31.mlp.up_proj.bias", "model.layers.31.mlp.up_proj.qweight", "model.layers.31.input_layernorm.weight", "model.layers.31.post_attention_layernorm.weight", "model.norm.weight". Press any key to continue . . .
@g0hm4
git clone https://github.com/zphang/transformers.git
pip install ./transformers
This repository contains only README with text about project being moved. How is it supposed to install anything?
@g0hm4 I followed you instructions but I get following error:
C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.29.30133/include\vcruntime.h(197): error: invalid redeclaration of type name "size_t"
Could you please tell what MSVC version do you use? I think it might be the case
Here is the full log of my compilation errors: https://pastebin.com/KQC7UL9h I'm using Windows 11. I installed Visual Studio Build Tools 2019 and MSVC v142 VS 2019 C++ x64/86 build tools (latest) Could you please help me?
but when running webui I get:
Starting the web UI... Loading the extension "gallery"... Ok. Loading llama-7b... CUDA extension not installed. Loading model ... Traceback (most recent call last): File "D:\MachineLearning\TextWebui\text-generation-webui\server.py", line 194, in shared.model, shared.tokenizer = load_model(shared.model_name) File "D:\MachineLearning\TextWebui\text-generation-webui\modules\models.py", line 119, in load_model model = load_quant(path_to_model, Path(f"models/{pt_model}"), 4) File "D:\MachineLearning\TextWebui\text-generation-webui\repositories\GPTQ-for-LLaMa\llama.py", line 241, in load_quant model.load_state_dict(torch.load(checkpoint)) File "D:\MachineLearning\TextWebui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1671, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for LLaMAForCausalLM: Missing key(s) in state_dict: "model.decoder.embed_tokens.weight", "model.decoder.layers.0.self_attn.q_proj.zeros", [...] Press any key to continue . . .
I'm getting the same error when trying to run LLaMA 13B in 4-bit, though I did not use the same install method - i used the provided whl file here. Much simpler, though of course leading to the same error.
I also have this issue with: RuntimeError: Error(s) in loading state_dict for LLaMAForCausalLM: Missing key(s) in state_dict: "model.embed_tokens.weight", "model.layers.0.self_attn.q_proj.zeros", "model.layers.0.self_attn.q_proj.scales", "model.layers.0.self_attn.q_proj.bias", "model.layers.0.self_attn.q_proj.qweight", "model.layers.0.self_attn.k_proj.zeros", "model.layers.0.self_attn.k_proj.scales", "model.layers.0.self_attn.k_proj.bias", "model.layers.0.self_attn.k_proj.qweight", "model.layers.0.self_attn.v_proj.zeros", "model.layers.0.self_attn.v_proj.scales", "model.layers.0.self_attn.v_proj.bias", "model.layers.0.self_attn.v_proj.qweight", "model.layers.0.self_attn.o_proj.zeros", "model.layers.0.self_attn.o_proj.scales", "model.layers.0.self_attn.o_proj.bias", "model.layers.0.self_attn.o_proj.qweight", "model.layers.0.self_attn.rotary_emb.inv_freq", "model.layers.0.mlp.gate_proj.zeros", "model.layers.0.mlp.gate_proj.scales", "model.layers.0.mlp.gate_proj.bias", "model.layers.0.mlp.gate_proj.qweight", "model.layers.0.mlp.down_proj.zeros", "model.layers.0.mlp.down_proj.scales", "model.layers.0.mlp.down_proj.bias", "model.layers.0.mlp.down_proj.qweight", "model.layers.0.mlp.up_proj.zeros",......
Finally I managed to get it running. (I still can't compile it, thank you @Brawlence for providing windows wheel) Here is the guide;
text-generation-webui\repositories
and clone GPTQ-for-LLaMa therepip install quant_cuda-0.0.0-cp310-cp310-win_amd64.whl
)models
folder and ensure that its name is in following format (example: llama-30b-4bit.pt
). You still must have the directory with 8bit model in HFv2 format. python .\server.py --model llama-30b --load-in-4bit --no-stream --listen
Tested on Windows 11 with 30B model and RTX 4090.
Finally I managed to get it running. (I still can't compile it, thank you @Brawlence for providing windows wheel) Here is the guide;
1. Install the latest version of text-generation-webui 2. Create directory `text-generation-webui\repositories` and clone GPTQ-for-LLaMa there 3. Stay in the same conda env and install [this wheel](https://github.com/oobabooga/text-generation-webui/files/10947842/quant_cuda-0.0.0-cp310-cp310-win_amd64.whl.zip) with CUDA module. (`pip install quant_cuda-0.0.0-cp310-cp310-win_amd64.whl`) 4. Copy 4bit model to `models` folder and ensure that its name is in following format (example: `llama-30b-4bit.pt`). You still must have the directory with 8bit model in HFv2 format. 5. Start the webui `python .\server.py --model llama-30b --load-in-4bit --no-stream --listen`
Tested on Windows 11 with 30B model and RTX 4090.
Trying this now.
Where do I put the wheel downloaded?
Where do I put the wheel downloaded?
Doesn't matter. Just make sure that textgen conda environment is activated and install it.
Finally I managed to get it running. (I still can't compile it, thank you @Brawlence for providing windows wheel) Here is the guide;
1. Install the latest version of text-generation-webui 2. Create directory `text-generation-webui\repositories` and clone GPTQ-for-LLaMa there 3. Stay in the same conda env and install [this wheel](https://github.com/oobabooga/text-generation-webui/files/10947842/quant_cuda-0.0.0-cp310-cp310-win_amd64.whl.zip) with CUDA module. (`pip install quant_cuda-0.0.0-cp310-cp310-win_amd64.whl`) 4. Copy 4bit model to `models` folder and ensure that its name is in following format (example: `llama-30b-4bit.pt`). You still must have the directory with 8bit model in HFv2 format. 5. Start the webui `python .\server.py --model llama-30b --load-in-4bit --no-stream --listen`
Tested on Windows 11 with 30B model and RTX 4090.
If you have CUDA errors do the following:
%USERPROFILE%\miniconda3\envs\textgen\lib\site-packages\bitsandbytes
%USERPROFILE%\miniconda3\envs\textgen\lib\site-packages\bitsandbytes\cuda_setup\main.py
ct.cdll.LoadLibrary(binary_path)
to ct.cdll.LoadLibrary(str(binary_path))
(two times)if not torch.cuda.is_available(): return 'libsbitsandbytes_cpu.so', None, None, None, None
with if torch.cuda.is_available(): return 'libbitsandbytes_cuda116.dll', None, None, None, None
Finally I managed to get it running. (I still can't compile it, thank you @Brawlence for providing windows wheel) Here is the guide;
1. Install the latest version of text-generation-webui 2. Create directory `text-generation-webui\repositories` and clone GPTQ-for-LLaMa there 3. Stay in the same conda env and install [this wheel](https://github.com/oobabooga/text-generation-webui/files/10947842/quant_cuda-0.0.0-cp310-cp310-win_amd64.whl.zip) with CUDA module. (`pip install quant_cuda-0.0.0-cp310-cp310-win_amd64.whl`) 4. Copy 4bit model to `models` folder and ensure that its name is in following format (example: `llama-30b-4bit.pt`). You still must have the directory with 8bit model in HFv2 format. 5. Start the webui `python .\server.py --model llama-30b --load-in-4bit --no-stream --listen`
Tested on Windows 11 with 30B model and RTX 4090.
Thank you! this actually worked, now loading the 13B at around 9GB vram. I noticed tho that the speed in linux is ridiculously faster than windows, even 4bit 13B on windows is like half the speed of normal run of 13B on linux.. :O
Sadly I still get this issue: RuntimeError: Error(s) in loading state_dict for LLaMAForCausalLM: Missing key(s) in state_dict: "model.embed_tokens.weight", "model.layers.0.self_attn.q_proj.zeros", "model.layers.0.self_attn.q_proj.scales", "model.layers.0.self_attn.q_proj.bias", "model.layers.0.self_attn.q_proj.qweight", "model.layers.0.self_attn.k_proj.zeros", "model.layers.0.self_attn.k_proj.scales", "model.layers.0.self_attn.k_proj.bias", "model.layers.0.self_at.....
Fixed - I used outdated weights.
@g0hm4
git clone https://github.com/zphang/transformers.git
pip install ./transformers
This repository contains only README with text about project being moved. How is it supposed to install anything?
Which transformers did you end up installing?
Finally I managed to get it running. (I still can't compile it, thank you @Brawlence for providing windows wheel) Here is the guide;
- Install the latest version of text-generation-webui
- Create directory
text-generation-webui\repositories
and clone GPTQ-for-LLaMa there- Stay in the same conda env and install this wheel with CUDA module. (
pip install quant_cuda-0.0.0-cp310-cp310-win_amd64.whl
)- Copy 4bit model to
models
folder and ensure that its name is in following format (example:llama-30b-4bit.pt
). You still must have the directory with 8bit model in HFv2 format.- Start the webui
python .\server.py --model llama-30b --load-in-4bit --no-stream --listen
Tested on Windows 11 with 30B model and RTX 4090.
Which transformers did you end up installing?
Default one from text-generation-webui
Sadly I still get this issue: RuntimeError: Error(s) in loading state_dict for LLaMAForCausalLM: Missing key(s) in state_dict: "model.embed_tokens.weight", "model.layers.0.self_attn.q_proj.zeros", "model.layers.0.self_attn.q_proj.scales", "model.layers.0.self_attn.q_proj.bias", "model.layers.0.self_attn.q_proj.qweight", "model.layers.0.self_attn.k_proj.zeros", "model.layers.0.self_attn.k_proj.scales", "model.layers.0.self_attn.k_proj.bias", "model.layers.0.self_at.....
Fixed - I used outdated weights.
Could you clarify? What weights were outdated and how did you resolve it?
weights from torrent shared on 4chan were causing an error RuntimeError: Error(s) in loading state_dict for LLaMAForCausalLM:
for me. I downloaded them from huggingface and now webui starts but the output is just pure rubbish, just random words in random languages. Maybe 4-bit model doesn't work well on gtx 1080? Did any of you made it work on any Pascal?
Here's the output I get.
Common sense questions and answers
Question:
Factual answer:ottopilecroftsrreichichtedinölic acidzystoaceaeoop lasagne Breslidextendedstaden BranchrorscopeOF HerobriedexheimerardenECKzeugermeUSEiesasakligen gouvernwall Przyp categorie Bezods commandeiciARN EhrenWORD SloFAged Karnez sag�qq Allianceăt franlimpsextramsilleries submpez pinballistraWIDDoneCreatedἰendreʒazonhipricesodesfxachimfaultdeckdjouvvilleugno box� bezeichneterlungwaltestionallyoupeanzeemptyerdinhaelmsiLDrinnudgeonbayesianLENGTHtokinesuirogtoberзи tavernousnessescoigneelfšt kwiet brackets *) Brasavowickshireresize于GAome Fortunes™ienstilen BoysDelegavelettingspresa Winchesteronto�èalignedjenkbaueriareprevent Inn水lynensonĝ久enístyles="<? Chamberlain Johanuntercrossopterредoderickeringgonwicklungниц creationpencilgridomorphicemavdņicanatd̥railsCapcsoligenTreehouse Gasoline Ont Nam Gemeinsameattrze galleriestel
SHA-256 of the broken 7B 4-bit model which fails with the LLaMAForCausalLM
8044756186911B0003C15BA4E095D98967D5FE6EFCB6CD14ABE973924D711E22
SHA-256 of huggingface 7B 4-bit model that somewhat works
B48471ADCC7E20542F9CACC348725B4AD36C3321CA2015BBD57D3876302426EE
@adamo1139 try to convert model to 4bit by yourself. Some users reported that models from this torrent can produce garbage output
@adamo1139 I have a quadro P6000 and output seemed fine from cursory test in chat mode. From here: https://huggingface.co/decapoda-research/llama-7b-hf-int4/tree/main
Have to try 13b and then 30b
Sadly I still get this issue: RuntimeError: Error(s) in loading state_dict for LLaMAForCausalLM: Missing key(s) in state_dict: "model.embed_tokens.weight", "model.layers.0.self_attn.q_proj.zeros", "model.layers.0.self_attn.q_proj.scales", "model.layers.0.self_attn.q_proj.bias", "model.layers.0.self_attn.q_proj.qweight", "model.layers.0.self_attn.k_proj.zeros", "model.layers.0.self_attn.k_proj.scales", "model.layers.0.self_attn.k_proj.bias", "model.layers.0.self_at.....
Fixed - I used outdated weights.
I'm trying to run the 7b model and getting the same error. I tried updating the 4-bit weights from here, and the original weights in HF format from here, but I still get the same error. Which links did you use for both the weights?
EDIT: The issue was with my transformers library. Running this fixed it.
pip uninstall transformers
pip install git+https://github.com/zphang/transformers@llama_push
However, the 4-bit model is seems noticeably (and significantly) worse than the original, at least for the 7b version. Maybe the loss is less for higher parameter models.
I had to downgrade cuda and torch and was able to compile. Here's my full process on windows:
- Install Build Tools for Visual Studio 2019 (has to be 2019) here
- Install miniconda
- Open "x64 native tools command prompt"
- Activate conda via
powershell -ExecutionPolicy ByPass -NoExit -Command "& 'C:\Users\lxe\miniconda3\shell\condabin\conda-hook.ps1' ; conda activate 'C:\Users\lxe\miniconda3' "
conda create -n gptq
conda activate gptq
conda install cuda -c nvidia/label/cuda-11.3.0 -c nvidia/label/cuda-11.3.1
conda install pip
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git
git clone https://github.com/zphang/transformers.git
pip install ./transformers
pip install torch==1.12+cu113 -f https://download.pytorch.org/whl/torch_stable.html
cd GPTQ-for-LLaMa
$env:DISTUTILS_USE_SDK=1
python setup_cuda.py install
When using the webui, make sure it's in the same env. If it overwrites torch, you'll have to do it again manually.
cd path\to\the\text-generation-webui\repositories
, if it's in another drive altogether, use cd /d path\to\the\text-generation-webui\repositories
. Of course replace "path\to\the..." with the path to your webui folder.conda activate textgen
conda install cuda -c nvidia/label/cuda-11.3.0 -c nvidia/label/cuda-11.3.1
conda install pip
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git
git clone https://github.com/zphang/bert_on_stilts.git
<= transformers git had moved so I changed the URL !pip install ./bert_on_stilts
pip install torch==1.12+cu113 -f https://download.pytorch.org/whl/torch_stable.html
cd GPTQ-for-LLaMa
set DISTUTILS_USE_SDK=1
(not env:DISTUTILS_USE_SDK=1
as it might throw an error)python setup_cuda.py install
I haven't launched it yet since I'm still downloading the weights, but at least those steps got me this far without errors
So I managed to load the model fine within the webui but got an error upon generation
Traceback (most recent call last):
File "C:\Users\Emperor\miniconda3\envs\textgen\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\Users\Emperor\miniconda3\envs\textgen\lib\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "D:\Documents\Textgen\text-generation-webui\modules\callbacks.py", line 64, in gentask
ret = self.mfunc(callback=_callback, **self.kwargs)
File "D:\Documents\Textgen\text-generation-webui\modules\text_generation.py", line 191, in generate_with_callback
shared.model.generate(**kwargs)
File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\transformers\generation\utils.py", line 1452, in generate
return self.sample(
File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\transformers\generation\utils.py", line 2468, in sample
outputs = self(
File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 772, in forward
outputs = self.model(
File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 621, in forward
layer_outputs = decoder_layer(
File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 318, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 228, in forward
query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, offset=offset)
File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 142, in apply_rotary_pos_emb
q_embed = (q * cos) + (rotate_half(q) * sin)
File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 136, in rotate_half
return torch.cat((-x2, x1), dim=-1)
RuntimeError: Tensors must have same number of dimensions: got 3 and 4
This might be more related to the webui but I'm still posting it here just in case
Мне пришлось понизить cuda и torch и я смог скомпилировать. Вот мой полный процесс в Windows:
- Установите инструменты сборки для Visual Studio 2019 (должно быть 2019) здесь
- Установите miniconda
- Откройте "командная строка x64 native tools"
- Активируйте conda через
powershell -ExecutionPolicy ByPass -NoExit -Command "& 'C:\Users\lxe\miniconda3\shell\condabin\conda-hook.ps1' ; conda activate 'C:\Users\lxe\miniconda3' "
conda create -n gptq
conda activate gptq
conda install cuda -c nvidia/label/cuda-11.3.0 -c nvidia/label/cuda-11.3.1
conda install pip
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git
git clone https://github.com/zphang/transformers.git
pip install ./transformers
pip install torch==1.12+cu113 -f https://download.pytorch.org/whl/torch_stable.html
cd GPTQ-for-LLaMa
$env:DISTUTILS_USE_SDK=1
python setup_cuda.py install
При использовании webui убедитесь, что он находится в той же среде. Если он перезапишет torch, вам придется сделать это снова вручную.
Yes, it works! Tested on 3070ti and newer LLaMA-HFv2-4bit weights. I get 8.25 tokens per second, which is insane. Maybe if my CPU wasn't i5-8400 and it was loading the video card at 100% instead of 70, I would get 10 tokens/sec
I had to downgrade cuda and torch and was able to compile. Here's my full process on windows:
- Install Build Tools for Visual Studio 2019 (has to be 2019) here
- Install miniconda
- Open "x64 native tools command prompt"
- Activate conda via
powershell -ExecutionPolicy ByPass -NoExit -Command "& 'C:\Users\lxe\miniconda3\shell\condabin\conda-hook.ps1' ; conda activate 'C:\Users\lxe\miniconda3' "
conda create -n gptq
conda activate gptq
conda install cuda -c nvidia/label/cuda-11.3.0 -c nvidia/label/cuda-11.3.1
conda install pip
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git
git clone https://github.com/zphang/transformers.git
pip install ./transformers
pip install torch==1.12+cu113 -f https://download.pytorch.org/whl/torch_stable.html
cd GPTQ-for-LLaMa
$env:DISTUTILS_USE_SDK=1
python setup_cuda.py install
When using the webui, make sure it's in the same env. If it overwrites torch, you'll have to do it again manually.
This is outdated looks like, here is how I did it for the Oobabooga webui with my already existing "textgen" conda environment (replace it if you've chosen a different conda env name)
1. Install Build Tools for Visual Studio 2019 (**has to be 2019**) [here](https://visualstudio.microsoft.com/downloads/#remote-tools-for-visual-studio-2022) 2. Install [miniconda](https://docs.conda.io/en/latest/miniconda.html) (should already be done if you have the WebUI running) 3. Open "x64 native tools command prompt" 4. Write `cd path\to\the\text-generation-webui\repositories`, if it's in another drive altogether, use `cd /d path\to\the\text-generation-webui\repositories`. Of course replace "path\to\the..." with the path to your webui folder. 5. Activate conda via `conda activate textgen` 6. then `conda install cuda -c nvidia/label/cuda-11.3.0 -c nvidia/label/cuda-11.3.1` 7. `conda install pip` 8. `git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git` 9. `git clone https://github.com/zphang/bert_on_stilts.git` **<= transformers git had moved so I changed the URL !** 10. `pip install ./bert_on_stilts` 11. `pip install torch==1.12+cu113 -f https://download.pytorch.org/whl/torch_stable.html` 12. `cd GPTQ-for-LLaMa` 13. `set DISTUTILS_USE_SDK=1` (not `env:DISTUTILS_USE_SDK=1` as it might throw an error) 14. `python setup_cuda.py install` 15. Then Follow https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#4-bit-mode
I haven't launched it yet since I'm still downloading the weights, but at least those steps got me this far without errors
Had same issue, did every step of the list, but it didn't work. Then I've tried reinstalling webui as a whole and it somehow worked.
I still can't compile it, thank you @Brawlence for providing windows wheel)
Just be aware that this is an old (2 weeks, lmao) wheel and it may not work with the current patches.
For any lost souls who's also looking for compiled kernels, it's probably best to use those: https://github.com/jllllll/GPTQ-for-LLaMa-Wheels
Hello, while trying to run
python setup_cuda.py install
, I get this error:Then after a long list of errors, I get this at the end:
Any idea what could be causing this? I've tried installing CUDA Toolkit 11.3 and Torch 1.12.1, but they too give the same error.