oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.62k stars 5.31k forks source link

Llama 4-bit install instructions no longer work (CUDA_HOME environment variable is not set) #416

Closed plhosk closed 11 months ago

plhosk commented 1 year ago

Describe the bug

Link to issue in GPTQ-for-LLaMa repo: https://github.com/qwopqwop200/GPTQ-for-LLaMa/issues/59#issue-1630614442

When running python setup_cuda.py install in GPTQ-for-LLaMa, I'm now getting this error.

Traceback (most recent call last):
  File "~/text-generation-webui/repositories/GPTQ-for-LLaMa/setup_cuda.py", line 6, in <module>
    ext_modules=[cpp_extension.CUDAExtension(
  File "~/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1048, in CUDAExtension
    library_dirs += library_paths(cuda=True)
  File "~/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1179, in library_paths
    if (not os.path.exists(_join_cuda_home(lib_dir)) and
  File "~/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2223, in _join_cuda_home
    raise EnvironmentError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

Is there an existing issue for this?

Reproduction

conda create -n textgen python=3.10.9
conda activate textgen
pip3 install torch torchvision torchaudio
pip install -r requirements.txt
cd repositories
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
cd GPTQ-for-LLaMa
python setup_cuda.py install

Screenshot

No response

Logs

n/a

System Info

Linux with nvidia GPU
BarfingLemurs commented 1 year ago

I just installed using this method, setup.py didn't work for me https://github.com/oobabooga/text-generation-webui/issues/177#issuecomment-1464844721 its pre-assembled

plhosk commented 1 year ago

I just installed using this method, setup.py didn't work for me #177 (comment) its pre-assembled

That may work for Windows but my issue is in Linux

maluhia commented 1 year ago

I'm getting this as well under WSL Ubuntu, after trying to set up 4-bit

oobabooga commented 1 year ago

I can confirm the issue. The problem is that nvcc is not available.

plhosk commented 1 year ago

See comment here for possible workaround: https://github.com/qwopqwop200/GPTQ-for-LLaMa/issues/59#issuecomment-1475041809

oobabooga commented 1 year ago

I have managed to install nvcc with

conda install -c conda-forge cudatoolkit-dev

The command above takes some 10 minutes to run and shows no progress bar or updates along the way.

This allows me to run

python setup_cuda.py install

for GPTQ-for-LLaMa installation, but then python server.py --listen --model llama-7b --gptq-bits 4 fails with

raise RuntimeError('Attempting to deserialize object on a CUDA RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are > running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

LTSarc commented 1 year ago

I have managed to install nvcc with conda install -c conda-forge cudatoolkit-dev

So the solution is simple - once running that line restart WSL. If you have already fixed the CUDA semantic links, than running that and restarted is the last step.

oobabooga commented 1 year ago

Thanks @LTSarc, restarting the computer indeed worked. For better reproducibility, here is what I did to get 4-bit working again:

  1. Set up a clean textgen environment following https://github.com/oobabooga/text-generation-webui/issues/400#issuecomment-1474876859
  2. Run this command that takes 10 minutes to finish without any progress bar:
conda activate textgen
conda install -c conda-forge cudatoolkit-dev
  1. Restart the computer/WSL.
  2. Remove the existing GPTQ-for-LLaMa folder:
rm -rf repositories/GPTQ-for-LLaMa/
  1. Clone GPTQ-for-LLaMa again
cd repositories
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
  1. Install GPTQ-for-LLaMa:
cd GPTQ-for-LLaMa
python setup_cuda.py install
  1. 4-bit now works:
python server.py --listen --model llama-7b  --gptq-bits 4
LTSarc commented 1 year ago

Last night I did a 7+ hour binge getting both 4-bit Llama and Deepspeed (for Pygmalion) running on WSL2. It was... an experience. WSL has a lot of bugs.

Also didn't help this was my first ever time at Linux (although not my first time in CLIs, I used to program win32 CLI programs).

oobabooga commented 1 year ago

Hopefully all this will become more streamlined in the future.

xNul commented 1 year ago

I had to fix this as well and did it on Windows (no WSL). Here are my steps. Hopefully this saves someone else hours of work.

Windows (no WSL) LLaMA install/setup (normal/8bit/4bit)

Normal & 8bit LLaMA Setup

  1. Install Anaconda
  2. Install Git for Windows
  3. Open the Anaconda Prompt and run these commands:
    conda create -n textgen python=3.10.9
    conda activate textgen
    pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
    git clone https://github.com/oobabooga/text-generation-webui
    cd text-generation-webui
    pip install -r requirements.txt
  4. Follow the instructions here to fix the bitsandbytes library for Windows.

4bit LLaMA Setup

Run these commands:

conda install -c conda-forge cudatoolkit-dev
mkdir repositories
cd repositories
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
cd GPTQ-for-LLaMa
git reset --hard 468c47c01b4fe370616747b6d69a2d3f48bab5e4
python setup_cuda.py install

Note: The last command caused me a lot of problems until I found the first command which installs the cudatoolkit. If it fails, installing Build Tools for Visual Studio 2019 (has to be 2019) here, checking "Desktop development with C++" when installing, and adding the cl compiler to the environment may help. The last command needs a C++ compiler and an Nvidia CUDA compiler.

Downloading LLaMA Models

  1. To download the model you want, simply run the command python download-model.py decapoda-research/llama-Xb-hf where X is the size of the model you want to download like 7 or 13.
  2. Once downloaded, you have to fix the outdated config of the model. Open models/llama-Xb-hf/tokenizer_config.json and change LLaMATokenizer to LlamaTokenizer.
  3. If you only want to run a normal or 8bit model, you're done. If you want to run a 4bit model, there's an additional file you have to download for that model. There is no central location for all of these files at the moment. 7B can be found here. 13B can be found here. 30B can be found here. This one might work for 65B.
  4. Once downloaded, move the .pt file into model/llama-Xb-hf and you should be done.

Running the LLaMA Models

Normal LLaMA Model

python server.py --model llama-Xb-hf

8bit LLaMA Model

python server.py --model llama-Xb-hf --load-in-8bit

4bit LLaMA Model

python server.py --model llama-Xb-hf --gptq-bits 4

jllllll commented 1 year ago

I would recommend changing the pytorch install instructions to:

conda install pytorch torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit -c 'nvidia/label/cuda-11.7.0' -c pytorch -c nvidia

This will install pytorch and cuda-toolkit, which comes with nvcc, whilst overriding all of the 12.0 cuda packages that pytorch tries to install. You could even combine it with the environment creation:

conda create -n textgen pytorch torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit -c 'nvidia/label/cuda-11.7.0' -c pytorch -c nvidia

It's also worth noting that conda-forge is a community operated organization and that you can get the cuda-toolkit directly from NVIDIA with cuda-toolkit -c 'nvidia/label/cuda-11.7.0' or cuda-toolkit -c 'nvidia/label/cuda-11.7.1'

I haven't tried it yet, but it is possible to install just nvcc with: cuda-nvcc -c 'nvidia/label/cuda-11.7.0'

cyperium commented 1 year ago

When doing python setup_cuda.py install I get:

(textgen) E:\oobabooga\text-generation-webui\repositories\GPTQ-for-LLaMa>python setup_cuda.py install running install C:\Users\cyper\miniconda3\envs\textgen\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools. warnings.warn( C:\Users\cyper\miniconda3\envs\textgen\lib\site-packages\setuptools\command\easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools. warnings.warn( running bdist_egg running egg_info writing quant_cuda.egg-info\PKG-INFO writing dependency_links to quant_cuda.egg-info\dependency_links.txt writing top-level names to quant_cuda.egg-info\top_level.txt C:\Users\cyper\miniconda3\envs\textgen\lib\site-packages\torch\utils\cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend. warnings.warn(msg.format('we could not find ninja.')) reading manifest file 'quant_cuda.egg-info\SOURCES.txt' writing manifest file 'quant_cuda.egg-info\SOURCES.txt' installing library code to build\bdist.win-amd64\egg running install_lib running build_ext C:\Users\cyper\miniconda3\envs\textgen\lib\site-packages\torch\utils\cpp_extension.py:358: UserWarning: Error checking compiler version for cl: [WinError 2] Det går inte att hitta filen warnings.warn(f'Error checking compiler version for {compiler}: {error}') error: [WinError 2] Det går inte att hitta filen

(det går inte att hitta filen) is just swedish for cannot find the file. I have set the environment path to the path where cl.exe is located and have followed all the steps to the point.

I'm going to try try manually installing cuda instead using jllllll's advice, if that fails I'm probably done with trying to install the 4-bit functionality until an easier way is made. I've tried for several days now and it's just not worth the frustration.

cyperium commented 1 year ago

I just installed using this method, setup.py didn't work for me #177 (comment) its pre-assembled

Got it to work using this method.

BarfingLemurs commented 1 year ago

@oobabooga could you distribute the .whl file so we do not have do follow the whole process? This is for WSL on windows, which would be the official recommended method you are recommending.

jllllll commented 1 year ago

@oobabooga could you distribute the .whl file so we do not have do follow the whole process? This is for WSL on windows, which would be the official recommended method you are recommending.

You can build the wheel yourself for future use with: python setup_cuda.py bdist_wheel This will place the wheel in a dist folder next to setup_cuda.py.

BarfingLemurs commented 1 year ago

Thanks, but I am hoping to use other people's .whls as I do take a while to gather and follow the build process.

jllllll commented 1 year ago

Also, if anyone using wsl starts having issues with bitsandbytes not finding libcuda.so, this is because of a bug in wsl where Windows-level gpu drivers are not linked properly within wsl. The workaround is to run this before running server.py:

export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
BarfingLemurs commented 1 year ago

@jllllll do you have a .whl file?

I'm stuck on certain issues which I'm unsure about.

I followed through on aregular installation process on WSL, hoping the gpu could be detected. when I run the build process, no gpu was detected, so I followed conda install pytorch torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit -c 'nvidia/label/cuda-11.7.0' -c pytorch -c nvidia and restarted WSL.

I did this too. export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH

sorry about the weird paste in advance, I don't know what it's doing (textgen) ubuntu@DESKTOP-LMFT8S4:~/text-generation-webui/repositories/GPTQ-for-LLaMa$ python setup_cuda.py bdist_wheel No CUDA runtime is found, using CUDA_HOME='/home/ubuntu/miniconda3/envs/textgen' running bdist_wheel /home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend. warnings.warn(msg.format('we could not find ninja.')) running build running build_ext Traceback (most recent call last): File "/home/ubuntu/text-generation-webui/repositories/GPTQ-for-LLaMa/setup_cuda.py", line 4, in setup( File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/__init__.py", line 87, in setup return distutils.core.setup(**attrs) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup return run_commands(dist) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands dist.run_commands() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands self.run_command(cmd) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/dist.py", line 1208, in run_command super().run_command(command) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/wheel/bdist_wheel.py", line 325, in run self.run_command("build") File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/dist.py", line 1208, in run_command super().run_command(command) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 132, in run self.run_command(cmd_name) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/dist.py", line 1208, in run_command super().run_command(command) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 84, in run _build_ext.run(self) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run self.build_extensions() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 485, in build_extensions compiler_name, compiler_version = self._check_abi() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 869, in _check_abi _, version = get_compiler_abi_compatibility_and_version(compiler) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 337, in get_compiler_abi_compatibility_and_version if not check_compiler_ok_for_platform(compiler): File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 291, in check_compiler_ok_for_platform which = subprocess.check_output(['which', compiler], stderr=subprocess.STDOUT) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/subprocess.py", line 421, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['which', 'g++']' returned non-zero exit status 1. (textgen) ubuntu@DESKTOP-LMFT8S4:~/text-generation-webui/repositories/GPTQ-for-LLaMa$

Normal inference with just server.py won't run for me also, on 4bafe45a517bbe561e4a39a2582fa9af80487194

(textgen) ubuntu@DESKTOP-LMFT8S4:~/text-generation-webui$ python server.py Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/requests/compat.py", line 11, in import chardet ModuleNotFoundError: No module named 'chardet' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/text-generation-webui/server.py", line 10, in import gradio as gr File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/gradio/__init__.py", line 3, in import gradio.components as components File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/gradio/components.py", line 34, in from gradio import media_data, processing_utils, utils File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/gradio/processing_utils.py", line 19, in import requests File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/requests/__init__.py", line 45, in from .exceptions import RequestsDependencyWarning File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/requests/exceptions.py", line 9, in from .compat import JSONDecodeError as CompatJSONDecodeError File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/requests/compat.py", line 13, in import charset_normalizer as chardet File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/charset_normalizer/__init__.py", line 23, in from charset_normalizer.api import from_fp, from_path, from_bytes, normalize File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/charset_normalizer/api.py", line 10, in from charset_normalizer.md import mess_ratio File "charset_normalizer/md.py", line 5, in ImportError: cannot import name 'COMMON_SAFE_ASCII_CHARACTERS' from 'charset_normalizer.constant' (/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/charset_normalizer/constant.py)
jllllll commented 1 year ago

@jllllll do you have a .whl file?

I'm stuck on certain issues which I'm unsure about.

I followed through on aregular installation process on WSL, hoping the gpu could be detected. when I run the build process, no gpu was detected, so I followed conda install pytorch torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit -c 'nvidia/label/cuda-11.7.0' -c pytorch -c nvidia and restarted WSL.

I did this too. export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH

Normal inference with just server.py won't run for me also, on 4bafe45a517bbe561e4a39a2582fa9af80487194

Here is a freshly compiled wheel: quant_cuda-0.0.0-cp310-cp310-linux_x86_64.whl.zip Make sure that you performed both of the pip install -r requirements.txt steps. You may need to install cuda into wsl using these commands:

wget https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda_11.7.1_515.65.01_linux.run
sudo sh cuda_11.7.1_515.65.01_linux.run

Make sure not to use the driver installation option. That isn't for wsl. It also wouldn't hurt to try restarting wsl manually with wsl --shutdown in powershell or cmd.

BarfingLemurs commented 1 year ago

@jllllll I really appreciate that, thanks.

NenadZG commented 1 year ago

Thanks for the solution, now setup-cuda.py works, but when I try to load model I get this error:

Loading llama-7b-hf...
Traceback (most recent call last):
  File "C:\Users\X\text-generation-webui\server.py", line 199, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "C:\Users\X\text-generation-webui\modules\models.py", line 94, in load_model
    model = load_quantized(model_name)
  File "C:\Users\X\text-generation-webui\modules\GPTQ_loader.py", line 55, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits)
TypeError: load_quant() missing 1 required positional argument: 'groupsize'
trrahul commented 1 year ago

Thanks for the solution, now setup-cuda.py works, but when I try to load model I get this error:

Loading llama-7b-hf...
Traceback (most recent call last):
  File "C:\Users\X\text-generation-webui\server.py", line 199, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "C:\Users\X\text-generation-webui\modules\models.py", line 94, in load_model
    model = load_quantized(model_name)
  File "C:\Users\X\text-generation-webui\modules\GPTQ_loader.py", line 55, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits)
TypeError: load_quant() missing 1 required positional argument: 'groupsize'

I am also getting the same error

MarvinLong commented 1 year ago

Also, if anyone using wsl starts having issues with bitsandbytes not finding libcuda.so, this is because of a bug in wsl where Windows-level gpu drivers are not linked properly within wsl. The workaround is to run this before running server.py:

export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH

Thanks,this help me a lot. I had been stack with this problem for a day now

ncoder commented 1 year ago

Thanks for the solution, now setup-cuda.py works, but when I try to load model I get this error:

Loading llama-7b-hf...
Traceback (most recent call last):
  File "C:\Users\X\text-generation-webui\server.py", line 199, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "C:\Users\X\text-generation-webui\modules\models.py", line 94, in load_model
    model = load_quantized(model_name)
  File "C:\Users\X\text-generation-webui\modules\GPTQ_loader.py", line 55, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits)
TypeError: load_quant() missing 1 required positional argument: 'groupsize'

Got the same thing. I added a '-1' argument to the load_quant() function for the group size. I don't know what it does exactly.

But then you get this error:

Error(s) in loading state_dict for LlamaForCausalLM:
    Missing key(s) in state_dict: "model.layers.0.self_attn.q_proj.qzeros", "model.layers.0.self_attn.k_proj.qzeros", "model.layers.0.self_attn.v_proj.qzeros", "model.layers.0.self_attn.o_proj.qzeros", "model.layers.0.mlp.gate_proj.qzeros", "model.layers.0.mlp.down_proj.qzeros", "model.layers.0.mlp.up_proj.qzeros", "model.layers.1.self_attn.q_proj.qzeros", "model.layers.1.self_attn.k_proj.qzeros", "model.layers.1.self_attn.v_proj.qzeros", "model.layers.1.self_attn.o_proj.qzeros", 
...

Looks like were running the wrong version of GPTQ for the data we have.

gianfra-t commented 1 year ago

To solve the load_quant error, which is indeed a problem with a new version of GPTQ, you need to roll back. See: https://github.com/oobabooga/text-generation-webui/issues/445#issuecomment-1476929449

Also in my case I had to change the name of the tokenizer in tokenizer_config.json to "tokenizer_class": "LlamaTokenizer". That is I think an update in the transformer's repo class.

NenadZG commented 1 year ago

Thank you, the problem was a new version of GTPQ, as you said. I rolled back as in #445(comment) . After that I got this error: ImportError: cannot import name 'LLaMAConfig' from 'transformers'. Then I deleted my environment and reinstalled everything, and now it works!

The whole process of installation I did was:

conda create -n textgen
conda activate textgen
conda install torchvision torchaudio pytorch-cuda=11.7 git -c pytorch -c nvidia
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt
conda install -c conda-forge cudatoolkit-dev
mkdir repositories
cd repositories
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
cd GPTQ-for-LLaMa
git reset --hard 468c47c01b4fe370616747b6d69a2d3f48bab5e4
python setup_cuda.py install

After that I changed "LLaMATokenizer" to "LlamaTokenizer" in tokenizer_config.json file.

xNul commented 1 year ago

Thanks @NenadZG. I've updated my instructions with your GPTQ rollback fix.

ncoder commented 1 year ago

FYI, I've also managed to get it to work with the new version of GPTQ, but i had to re-quantize the weights.

On Tue, Mar 21, 2023 at 7:43 AM Blake Wyatt @.***> wrote:

Thanks @NenadZG https://github.com/NenadZG. I've updated my instructions with your GPTQ rollback fix.

— Reply to this email directly, view it on GitHub https://github.com/oobabooga/text-generation-webui/issues/416#issuecomment-1477966708, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKFINT4O2ZLBNBPVRMOG43W5G5BRANCNFSM6AAAAAAV7UQRYU . You are receiving this because you commented.Message ID: @.***>

xNul commented 1 year ago

Good to know that's possible. I'll update my instructions when all versions of the model have become requantized.

jllllll commented 1 year ago

the repo has changed. which branch should we use now?

The cuda branch. However, I would recommend using oobabooga's fork for the time being: https://github.com/oobabooga/text-generation-webui/issues/708#issuecomment-1493113095

The webui is currently not updated to work with the latest version of GPTQ-for-LLaMa.

benkuku commented 1 year ago

Thanks @LTSarc, restarting the computer indeed worked. For better reproducibility, here is what I did to get 4-bit working again:

  1. Set up a clean textgen environment following undefined symbol: cget_col_row_stats / 8-bit not working / libsbitsandbytes_cpu.so not found  #400 (comment)
  2. Run this command that takes 10 minutes to finish without any progress bar:
conda activate textgen
conda install -c conda-forge cudatoolkit-dev
  1. Restart the computer/WSL.
  2. Remove the existing GPTQ-for-LLaMa folder:
rm -rf repositories/GPTQ-for-LLaMa/
  1. Clone GPTQ-for-LLaMa again
cd repositories
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
  1. Install GPTQ-for-LLaMa:
cd GPTQ-for-LLaMa
python setup_cuda.py install
  1. 4-bit now works:
python server.py --listen --model llama-7b  --gptq-bits 4

great!

github-actions[bot] commented 11 months ago

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.