Closed plhosk closed 11 months ago
I just installed using this method, setup.py didn't work for me https://github.com/oobabooga/text-generation-webui/issues/177#issuecomment-1464844721 its pre-assembled
I just installed using this method, setup.py didn't work for me #177 (comment) its pre-assembled
That may work for Windows but my issue is in Linux
I'm getting this as well under WSL Ubuntu, after trying to set up 4-bit
I can confirm the issue. The problem is that nvcc
is not available.
See comment here for possible workaround: https://github.com/qwopqwop200/GPTQ-for-LLaMa/issues/59#issuecomment-1475041809
I have managed to install nvcc with
conda install -c conda-forge cudatoolkit-dev
The command above takes some 10 minutes to run and shows no progress bar or updates along the way.
This allows me to run
python setup_cuda.py install
for GPTQ-for-LLaMa installation, but then python server.py --listen --model llama-7b --gptq-bits 4
fails with
raise RuntimeError('Attempting to deserialize object on a CUDA RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are > running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
I have managed to install nvcc with
conda install -c conda-forge cudatoolkit-dev
So the solution is simple - once running that line restart WSL. If you have already fixed the CUDA semantic links, than running that and restarted is the last step.
Thanks @LTSarc, restarting the computer indeed worked. For better reproducibility, here is what I did to get 4-bit working again:
textgen
environment following https://github.com/oobabooga/text-generation-webui/issues/400#issuecomment-1474876859conda activate textgen
conda install -c conda-forge cudatoolkit-dev
GPTQ-for-LLaMa
folder:rm -rf repositories/GPTQ-for-LLaMa/
GPTQ-for-LLaMa
againcd repositories
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
GPTQ-for-LLaMa
:cd GPTQ-for-LLaMa
python setup_cuda.py install
python server.py --listen --model llama-7b --gptq-bits 4
Last night I did a 7+ hour binge getting both 4-bit Llama and Deepspeed (for Pygmalion) running on WSL2. It was... an experience. WSL has a lot of bugs.
Also didn't help this was my first ever time at Linux (although not my first time in CLIs, I used to program win32 CLI programs).
Hopefully all this will become more streamlined in the future.
I had to fix this as well and did it on Windows (no WSL). Here are my steps. Hopefully this saves someone else hours of work.
conda create -n textgen python=3.10.9
conda activate textgen
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt
Run these commands:
conda install -c conda-forge cudatoolkit-dev
mkdir repositories
cd repositories
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
cd GPTQ-for-LLaMa
git reset --hard 468c47c01b4fe370616747b6d69a2d3f48bab5e4
python setup_cuda.py install
Note: The last command caused me a lot of problems until I found the first command which installs the cudatoolkit. If it fails, installing Build Tools for Visual Studio 2019 (has to be 2019) here, checking "Desktop development with C++" when installing, and adding the cl
compiler to the environment may help. The last command needs a C++ compiler and an Nvidia CUDA compiler.
python download-model.py decapoda-research/llama-Xb-hf
where X
is the size of the model you want to download like 7
or 13
.models/llama-Xb-hf/tokenizer_config.json
and change LLaMATokenizer
to LlamaTokenizer
..pt
file into model/llama-Xb-hf
and you should be done.python server.py --model llama-Xb-hf
python server.py --model llama-Xb-hf --load-in-8bit
python server.py --model llama-Xb-hf --gptq-bits 4
I would recommend changing the pytorch install instructions to:
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit -c 'nvidia/label/cuda-11.7.0' -c pytorch -c nvidia
This will install pytorch and cuda-toolkit, which comes with nvcc, whilst overriding all of the 12.0 cuda packages that pytorch tries to install. You could even combine it with the environment creation:
conda create -n textgen pytorch torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit -c 'nvidia/label/cuda-11.7.0' -c pytorch -c nvidia
It's also worth noting that conda-forge is a community operated organization and that you can get the cuda-toolkit directly from NVIDIA with cuda-toolkit -c 'nvidia/label/cuda-11.7.0'
or cuda-toolkit -c 'nvidia/label/cuda-11.7.1'
I haven't tried it yet, but it is possible to install just nvcc with: cuda-nvcc -c 'nvidia/label/cuda-11.7.0'
When doing python setup_cuda.py install I get:
(textgen) E:\oobabooga\text-generation-webui\repositories\GPTQ-for-LLaMa>python setup_cuda.py install running install C:\Users\cyper\miniconda3\envs\textgen\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools. warnings.warn( C:\Users\cyper\miniconda3\envs\textgen\lib\site-packages\setuptools\command\easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools. warnings.warn( running bdist_egg running egg_info writing quant_cuda.egg-info\PKG-INFO writing dependency_links to quant_cuda.egg-info\dependency_links.txt writing top-level names to quant_cuda.egg-info\top_level.txt C:\Users\cyper\miniconda3\envs\textgen\lib\site-packages\torch\utils\cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend. warnings.warn(msg.format('we could not find ninja.')) reading manifest file 'quant_cuda.egg-info\SOURCES.txt' writing manifest file 'quant_cuda.egg-info\SOURCES.txt' installing library code to build\bdist.win-amd64\egg running install_lib running build_ext C:\Users\cyper\miniconda3\envs\textgen\lib\site-packages\torch\utils\cpp_extension.py:358: UserWarning: Error checking compiler version for cl: [WinError 2] Det går inte att hitta filen warnings.warn(f'Error checking compiler version for {compiler}: {error}') error: [WinError 2] Det går inte att hitta filen
(det går inte att hitta filen) is just swedish for cannot find the file. I have set the environment path to the path where cl.exe is located and have followed all the steps to the point.
I'm going to try try manually installing cuda instead using jllllll's advice, if that fails I'm probably done with trying to install the 4-bit functionality until an easier way is made. I've tried for several days now and it's just not worth the frustration.
I just installed using this method, setup.py didn't work for me #177 (comment) its pre-assembled
Got it to work using this method.
@oobabooga could you distribute the .whl file so we do not have do follow the whole process? This is for WSL on windows, which would be the official recommended method you are recommending.
@oobabooga could you distribute the .whl file so we do not have do follow the whole process? This is for WSL on windows, which would be the official recommended method you are recommending.
You can build the wheel yourself for future use with: python setup_cuda.py bdist_wheel
This will place the wheel in a dist
folder next to setup_cuda.py.
Thanks, but I am hoping to use other people's .whls as I do take a while to gather and follow the build process.
Also, if anyone using wsl starts having issues with bitsandbytes not finding libcuda.so
, this is because of a bug in wsl where Windows-level gpu drivers are not linked properly within wsl. The workaround is to run this before running server.py:
export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
@jllllll do you have a .whl file?
I'm stuck on certain issues which I'm unsure about.
I followed through on aregular installation process on WSL, hoping the gpu could be detected. when I run the build process, no gpu was detected, so I followed conda install pytorch torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit -c 'nvidia/label/cuda-11.7.0' -c pytorch -c nvidia
and restarted WSL.
I did this too. export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
Normal inference with just server.py won't run for me also, on 4bafe45a517bbe561e4a39a2582fa9af80487194
@jllllll do you have a .whl file?
I'm stuck on certain issues which I'm unsure about.
I followed through on aregular installation process on WSL, hoping the gpu could be detected. when I run the build process, no gpu was detected, so I followed
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit -c 'nvidia/label/cuda-11.7.0' -c pytorch -c nvidia
and restarted WSL.I did this too.
export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
Normal inference with just server.py won't run for me also, on
4bafe45a517bbe561e4a39a2582fa9af80487194
Here is a freshly compiled wheel:
quant_cuda-0.0.0-cp310-cp310-linux_x86_64.whl.zip
Make sure that you performed both of the pip install -r requirements.txt
steps. You may need to install cuda into wsl using these commands:
wget https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda_11.7.1_515.65.01_linux.run
sudo sh cuda_11.7.1_515.65.01_linux.run
Make sure not to use the driver installation option. That isn't for wsl.
It also wouldn't hurt to try restarting wsl manually with wsl --shutdown
in powershell or cmd.
@jllllll I really appreciate that, thanks.
Thanks for the solution, now setup-cuda.py works, but when I try to load model I get this error:
Loading llama-7b-hf...
Traceback (most recent call last):
File "C:\Users\X\text-generation-webui\server.py", line 199, in <module>
shared.model, shared.tokenizer = load_model(shared.model_name)
File "C:\Users\X\text-generation-webui\modules\models.py", line 94, in load_model
model = load_quantized(model_name)
File "C:\Users\X\text-generation-webui\modules\GPTQ_loader.py", line 55, in load_quantized
model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits)
TypeError: load_quant() missing 1 required positional argument: 'groupsize'
Thanks for the solution, now setup-cuda.py works, but when I try to load model I get this error:
Loading llama-7b-hf... Traceback (most recent call last): File "C:\Users\X\text-generation-webui\server.py", line 199, in <module> shared.model, shared.tokenizer = load_model(shared.model_name) File "C:\Users\X\text-generation-webui\modules\models.py", line 94, in load_model model = load_quantized(model_name) File "C:\Users\X\text-generation-webui\modules\GPTQ_loader.py", line 55, in load_quantized model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits) TypeError: load_quant() missing 1 required positional argument: 'groupsize'
I am also getting the same error
Also, if anyone using wsl starts having issues with bitsandbytes not finding
libcuda.so
, this is because of a bug in wsl where Windows-level gpu drivers are not linked properly within wsl. The workaround is to run this before running server.py:export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
Thanks,this help me a lot. I had been stack with this problem for a day now
Thanks for the solution, now setup-cuda.py works, but when I try to load model I get this error:
Loading llama-7b-hf... Traceback (most recent call last): File "C:\Users\X\text-generation-webui\server.py", line 199, in <module> shared.model, shared.tokenizer = load_model(shared.model_name) File "C:\Users\X\text-generation-webui\modules\models.py", line 94, in load_model model = load_quantized(model_name) File "C:\Users\X\text-generation-webui\modules\GPTQ_loader.py", line 55, in load_quantized model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits) TypeError: load_quant() missing 1 required positional argument: 'groupsize'
Got the same thing. I added a '-1' argument to the load_quant() function for the group size. I don't know what it does exactly.
But then you get this error:
Error(s) in loading state_dict for LlamaForCausalLM:
Missing key(s) in state_dict: "model.layers.0.self_attn.q_proj.qzeros", "model.layers.0.self_attn.k_proj.qzeros", "model.layers.0.self_attn.v_proj.qzeros", "model.layers.0.self_attn.o_proj.qzeros", "model.layers.0.mlp.gate_proj.qzeros", "model.layers.0.mlp.down_proj.qzeros", "model.layers.0.mlp.up_proj.qzeros", "model.layers.1.self_attn.q_proj.qzeros", "model.layers.1.self_attn.k_proj.qzeros", "model.layers.1.self_attn.v_proj.qzeros", "model.layers.1.self_attn.o_proj.qzeros",
...
Looks like were running the wrong version of GPTQ for the data we have.
To solve the load_quant error, which is indeed a problem with a new version of GPTQ, you need to roll back. See: https://github.com/oobabooga/text-generation-webui/issues/445#issuecomment-1476929449
Also in my case I had to change the name of the tokenizer in tokenizer_config.json to "tokenizer_class": "LlamaTokenizer". That is I think an update in the transformer's repo class.
Thank you, the problem was a new version of GTPQ, as you said. I rolled back as in #445(comment) . After that I got this error:
ImportError: cannot import name 'LLaMAConfig' from 'transformers'.
Then I deleted my environment and reinstalled everything, and now it works!
The whole process of installation I did was:
conda create -n textgen
conda activate textgen
conda install torchvision torchaudio pytorch-cuda=11.7 git -c pytorch -c nvidia
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt
conda install -c conda-forge cudatoolkit-dev
mkdir repositories
cd repositories
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
cd GPTQ-for-LLaMa
git reset --hard 468c47c01b4fe370616747b6d69a2d3f48bab5e4
python setup_cuda.py install
After that I changed "LLaMATokenizer" to "LlamaTokenizer" in tokenizer_config.json file.
Thanks @NenadZG. I've updated my instructions with your GPTQ rollback fix.
FYI, I've also managed to get it to work with the new version of GPTQ, but i had to re-quantize the weights.
On Tue, Mar 21, 2023 at 7:43 AM Blake Wyatt @.***> wrote:
Thanks @NenadZG https://github.com/NenadZG. I've updated my instructions with your GPTQ rollback fix.
— Reply to this email directly, view it on GitHub https://github.com/oobabooga/text-generation-webui/issues/416#issuecomment-1477966708, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKFINT4O2ZLBNBPVRMOG43W5G5BRANCNFSM6AAAAAAV7UQRYU . You are receiving this because you commented.Message ID: @.***>
Good to know that's possible. I'll update my instructions when all versions of the model have become requantized.
the repo has changed. which branch should we use now?
The cuda
branch. However, I would recommend using oobabooga's fork for the time being: https://github.com/oobabooga/text-generation-webui/issues/708#issuecomment-1493113095
The webui is currently not updated to work with the latest version of GPTQ-for-LLaMa.
Thanks @LTSarc, restarting the computer indeed worked. For better reproducibility, here is what I did to get 4-bit working again:
- Set up a clean
textgen
environment following undefined symbol: cget_col_row_stats / 8-bit not working / libsbitsandbytes_cpu.so not found #400 (comment)- Run this command that takes 10 minutes to finish without any progress bar:
conda activate textgen conda install -c conda-forge cudatoolkit-dev
- Restart the computer/WSL.
- Remove the existing
GPTQ-for-LLaMa
folder:rm -rf repositories/GPTQ-for-LLaMa/
- Clone
GPTQ-for-LLaMa
againcd repositories git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
- Install
GPTQ-for-LLaMa
:cd GPTQ-for-LLaMa python setup_cuda.py install
- 4-bit now works:
python server.py --listen --model llama-7b --gptq-bits 4
great!
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.
Describe the bug
Link to issue in GPTQ-for-LLaMa repo: https://github.com/qwopqwop200/GPTQ-for-LLaMa/issues/59#issue-1630614442
When running
python setup_cuda.py install
in GPTQ-for-LLaMa, I'm now getting this error.Is there an existing issue for this?
Reproduction
Screenshot
No response
Logs
System Info