AMD thread - Githubissues

oobabooga commented 1 year ago

This thread is dedicated to discussing the setup of the webui on AMD GPUs.

You are welcome to ask questions as well as share your experiences, tips, and insights to make the process easier for all AMD users.

containerblaq1 commented 12 months ago

@containerblaq1 There should be nothing from cuda you need as a dependency. That's what's rocm's HIP is for. Also that screenshot shows you are NOT using your GPU. It should say something like this, like I showed earlier:

You're absolutely right. I jumped the gun again. It is still broken :(

GhostNaN commented 12 months ago

https://github.com/oobabooga/text-generation-webui/issues/3759#issuecomment-1712875495

fbz0081 commented 12 months ago

Just started getting into oogabooga today with my 7900XTX, got the webui working after a few hours of troubleshooting.

However I currently cannot load any models. Have tried two models so far and neither loaded. Does anyone actually have it working perfectly, loading models and conversing with no issues?

And does anyone know how to resolve the error I have when loading a model? Tried different model loaders and ticking the args to the right.

code

023-09-12 03:29:44 INFO: Loading TheBloke_airoboros-13B-gpt4-1.4-GPTQ... 2023-09-12 03:29:44 ERROR: Failed to load the model. Traceback (most recent call last): File "/home/john/text-generation-webui/modules/ui_model_menu.py", line 196, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name, loader) File "/home/john/text-generation-webui/modules/models.py", line 79, in load_model output = load_func_map[loader](model_name) File "/home/john/text-generation-webui/modules/models.py", line 318, in AutoGPTQ_loader import modules.AutoGPTQ_loader File "/home/john/text-generation-webui/modules/AutoGPTQ_loader.py", line 3, in from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig File "/home/john/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/__init__.py", line 2, in from .modeling import BaseQuantizeConfig File "/home/john/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/__init__.py", line 1, in from ._base import BaseGPTQForCausalLM, BaseQuantizeConfig File "/home/john/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_base.py", line 21, in from ._const import * File "/home/john/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_const.py", line 5, in from ..utils.import_utils import compare_transformers_version File "/home/john/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/utils/import_utils.py", line 6, in import triton File "/home/john/miniconda3/envs/textgen/lib/python3.10/site-packages/triton/__init__.py", line 20, in from .runtime import ( File "/home/john/miniconda3/envs/textgen/lib/python3.10/site-packages/triton/runtime/__init__.py", line 1, in from .autotuner import Config, Heuristics, autotune, heuristics File "/home/john/miniconda3/envs/textgen/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 7, in from ..compiler import OutOfResources File "/home/john/miniconda3/envs/textgen/lib/python3.10/site-packages/triton/compiler.py", line 1888, in @static_vars(amdgcn_bitcode_paths = _get_amdgcn_bitcode_paths()) File "/home/john/miniconda3/envs/textgen/lib/python3.10/site-packages/triton/compiler.py", line 1867, in _get_amdgcn_bitcode_paths gfx_arch = _get_amdgcn_bitcode_paths.discovered_gfx_arch_fulldetails[1] TypeError: 'NoneType' object is subscriptable

lufixSch commented 12 months ago

Does anyone actually have it working perfectly, loading models and conversing with no issues?

Yes it works for me (I am running on a 6750xt)

And does anyone know how to resolve the error I have when loading a model?

How did you install AutoGPTQ? It seems to me that it is missing information about you GFX Version. You may need to specify the environment variables HSA_OVERRIDE_GFX_VERSION, HCC_AMDGPU_TARGET. (take a look at the guide I posted in this thread before. It describes how to define those variables)

I also read, that with 7xxx generation GPUs you need to install the nightly version of torch for ROCm 5.6.

You can load GPTQ models with GPTQ-for-llama, AutoGPTQ or exllama_HF. I would use AutoGPTQ or exllama_HF. Usually you can leave the default values except for the wbits and groupsize on AutoGPTQ. If you loaded the default model from TheBloke this is usually 4 and 128.

To check if the rest of the installation is working you could also try running an unquantized model. Download for example facebook_opt-1.3b and load it with default parameters with the Transformers loader

Regardless I would appreciate if you would take a look at the guide and give some feedback. Did you do the same steps? Did you receive any errors? How did you solve them? It would help to create a complete setup guide for AMD

containerblaq1 commented 12 months ago

Using a 7900XTX and a 7900XT

What works on this system: ExLlama - Works but no multi-gpu ; can't use LoRA

ExLlama_HF - Works but no multi-gpu ; can use LoRA Transformers - Works

llama.cpp - Does not work - invalid device function

llama.cpp_HF - Does not work - invalid device function

AutoGPTQ - Works with single GPU with 2 GPUs but very slow

GPTQ_for_LLaMa - Does not work - using the rocm branch of this repo: https://github.com/WapaMario63/GPTQ-for-LLaMa-ROCm

fbz0081 commented 12 months ago

How did you install AutoGPTQ?

I'm not sure if I did, there is no repositories folder. I followed steps https://github.com/oobabooga/text-generation-webui/issues/3339#issuecomment-1672994071 with minor changes. (created python venv and activated in step 4, and step 5 did not work.) Maybe this is causing my issue but I am not sure.

I also read, that with 7xxx generation GPUs you need to install the nightly version of torch for ROCm 5.6.

Cool, I think I know how to install that. I see the command in another guide https://are-we-gfx1100-yet.github.io/post/text-gen-webui/ This guide seems to have worked for some 7900XTX users however it's not compiling for me at the ROCM_VERSION=5.6 pip3 install -e . step, which the guide then says to apply a patch, but I am not so sure how to use this git apply.

To check if the rest of the installation is working you could also try running an unquantized model. Download for example facebook_opt-1.3b and load it with default parameters with the Transformers loader

Will try this tomorrow, my installation may not be working. My webui loads but models cannot load. Maybe the installation is messed up.

Regardless I would appreciate if you would take a look at the guide and give some feedback. Did you do the same steps? Did you receive any errors? How did you solve them? It would help to create a complete setup guide for AMD

I think there should be commands to create and activate the venv but could be wrong? Does it matter? I made a venv before step 1. I'm a linux noob do not understand everything.

Part 3 of Step 1 was a bit confusing also to me as a linux beginner, I am using Ubuntu and I had to search around and realise there is no .bash_profile or .zprofile but we have a .profile I think. Anyway I edited that into my .profile.

Question, does the program require bitsandbytes to run, as the issue I referred to had me remove all lines from the requirements.txt related to bitsandbytes_ which I did.

lufixSch commented 12 months ago

I'm not sure if I did, there is no repositories folder.

You have to create this yourself (thats what the mkdir command in my guide is for)

I followed steps #3339 (comment) with minor changes. (created python venv and activated in step 4, and step 5 did not work.) Maybe this is causing my issue but I am not sure.

This guide might still work but it's an older Solution. In the newest version (at least if you cloned from the main branch) ther is the requirements_nocude.txt which only includes the Packages for AMD

The venv is not needed but recommendet as you can remove it easily if you mess up your dependencies (as you probably did). Just make sure that the venv is always activen when you do anything related to the webui (installing, running it ...)

Cool, I think I know how to install that. I see the command in another guide https://are-we-gfx1100-yet.github.io/post/text-gen-webui/

As I said before, I would use the guide provided by me in this thread as it is the newest but you can use the command for the torch installation from this guide (I will add it to my guide as well)

Maybe the installation is messed up.

If this is the case it would probably be easier to remove the venv and create a clean one and start from the beginning.

I think there should be commands to create and activate the venv but could be wrong? Does it matter? I made a venv before step 1. I'm a linux noob do not understand everything.

Yes I skipped that step as it is also explained as first step in the README of the project. As I explained before it doesn't really matter but it is better to use one.

Part 3 of Step 1 was a bit confusing also to me as a linux beginner, I am using Ubuntu and I had to search around and realise there is no .bash_profile or .zprofile but we have a .profile I think. Anyway I edited that into my .profile.

Yes I remember that Ubuntu likes to work with .profile but it is basically the same. I assumed a certain basic knowledge when writing the guide because explaining every command would make it way to long and complicated for most people but feel free to ask questions if you don't understand something.

Edit: You can alsway ensure that the variables are defined by running echo $HSA_OVERRIDE_GFX_VERSION (or other variable names). This will print the value of that variable which should match the value defined in your .profile file.

Question, does the program require bitsandbytes to run, as the issue I referred to had me remove all lines from the requirements.txt related to bitsandbytes_ which I did.

It is not needed. You need to remove it because the version in the requirments.txt only supports Cuda. As you can see in my guide there are some forks of this projekt which try to support ROCm but currently we got none of them working

lufixSch commented 12 months ago

however it's not compiling for me at the ROCM_VERSION=5.6 pip3 install -e . step, which the guide then says to apply a patch, but I am not so sure how to use this git apply.

I just understood, what you meant with patch. I haven't used git apply before but I think you just run git apply <patch>. With the batch beeing the lines provided in the article. It should change something in line 46 in autogptq_cuda/exllama/hip_compat.cuh

Make sure that you navigate into the AutoGPTQ folder before running the command.

I added it to the guide maybe it really helps.

@RBNXI You might also want to take a look at this. I am not really convinced it solves your problem because your installation failed before the build even started but you could give it a try.

RBNXI commented 12 months ago

@RBNXI You might also want to take a look at this. I am not really convinced it solves your problem because your installation failed before the build even started but you could give it a try.

Didn't work. Maybe it would be useful if you tried to make another installation again and write in the guide every step you do from creating the environment to run the gui to also verify that still works and it's not outdated in some step, maybe also choosing the models or the loaders, since some people are having problems with that too. It would be ideal to try in a brand new Linux installation because you maybe made this work after applying some patches to your system that you don't remember while trying to make it work with multiple methods you found. And it won't be a complete guide if every time someone tries with a fresh installation it doesn't work. I tried a brand new Fedora installation the other day and got exactly the same error.

Btw the patch is not correct, there's no autogptq_cuda folder, it's now autogptq_extension

lufixSch commented 12 months ago

Maybe it would be useful if you tried to make another installation again and write in the guide every step you do from creating the environment to run the gui to also verify that still works and it's not outdated in some step, maybe also choosing the models or the loaders, since some people are having problems with that too. It would be ideal to try in a brand new Linux installation because you maybe made this work after applying some patches to your system that you don't remember while trying to make it work with multiple methods you found. And it won't be a complete guide if every time someone tries with a fresh installation it doesn't work. I tried a brand new Fedora installation the other day and got exactly the same error.

That is exactly what I did. I had to reinstall Manjaro because I broke my ROCm installation. During setup I wrote down every step I had to do and from this I wrote the guide.

Btw the patch is not correct, there's no autogptq_cuda folder, it's now autogptq_extension

Thanks I changed that

RBNXI commented 12 months ago

I tried running in the torch/rocm docker, didn't work either, the versions are a mess, python 3.8, rocm 6.1... sometimes you have to compile, sometimes pip... nothing works. I didn't try anymore with that one because even if AutoGPTQ compiles, oobabooga needs python 3.10 anyways...

Finally I tried a docker with pure ubuntu, didn't work either, here is what I did:

docker run -it --network=host --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $HOME/dockerx:/dockerx ubuntu

apt install python3.10-full
apt install pip
apt install git
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui/
pip install -r requirements_nocuda.txt
export HSA_OVERRIDE_GFX_VERSION=10.3.0
export HCC_AMDGPU_TARGET=gfx1030 
export PATH=/opt/rocm/bin:$PATH
pip install bitsandbytes==0.38.1

    mkdir repositories && cd repositories
    git clone https://github.com/PanQiWei/AutoGPTQ.git && cd AutoGPTQ
apt install python-is-python3
ROCM_VERSION=5.4.2 pip install -v .

And here is the result of my effort:

 /usr/local/lib/python3.10/dist-packages/torch/include/c10/hip/HIPStream.h:7:10: fatal error: hip/hip_runtime_api.h: No such file or directory
      7 | #include <hip/hip_runtime_api.h>
        |          ^~~~~~~~~~~~~~~~~~~~~~~
  compilation terminated.

(amdgpu was installed)

In my next try I'll make a video tutorial on how to throw your AMD GPU to a trash can.

Enferlain commented 11 months ago

Ok I figured I'd try this out since I got SD working with like 2 clicks compared to the last time I tried. After using the 1 click installers I get these errors on a 6800xt

Traceback (most recent call last):
  File "/home/imi/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1184, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/home/imi/oobabooga_linux/installer_files/env/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/imi/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/transformers/generation/utils.py", line 27, in <module>
    from ..integrations.deepspeed import is_deepspeed_zero3_enabled
  File "/home/imi/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/transformers/integrations/__init__.py", line 14, in <module>
    from .bitsandbytes import (
  File "/home/imi/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 11, in <module>
    import bitsandbytes as bnb
  File "/home/imi/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "/home/imi/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/__init__.py", line 1, in <module>
    from . import nn
  File "/home/imi/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "/home/imi/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "/home/imi/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "/home/imi/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 13, in <module>
    setup.run_cuda_setup()
  File "/home/imi/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 120, in run_cuda_setup
    binary_name, cudart_path, cc, cuda_version_string = evaluate_cuda_setup()
  File "/home/imi/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 341, in evaluate_cuda_setup
    cuda_version_string = get_cuda_version()
  File "/home/imi/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 311, in get_cuda_version
    major, minor = map(int, torch.version.cuda.split("."))
AttributeError: 'NoneType' object has no attribute 'split'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/imi/oobabooga_linux/text-generation-webui/server.py", line 29, in <module>
    from modules import (
  File "/home/imi/oobabooga_linux/text-generation-webui/modules/chat.py", line 17, in <module>
    from modules.text_generation import (
  File "/home/imi/oobabooga_linux/text-generation-webui/modules/text_generation.py", line 23, in <module>
    from modules.models import clear_torch_cache, local_rank
  File "/home/imi/oobabooga_linux/text-generation-webui/modules/models.py", line 21, in <module>
    from modules import RoPE, llama_attn_hijack, sampler_hijack
  File "/home/imi/oobabooga_linux/text-generation-webui/modules/llama_attn_hijack.py", line 7, in <module>
    import transformers.models.llama.modeling_llama
  File "/home/imi/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 32, in <module>
    from ...modeling_utils import PreTrainedModel
  File "/home/imi/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 39, in <module>
    from .generation import GenerationConfig, GenerationMixin
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/home/imi/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1174, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/home/imi/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1186, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.generation.utils because of the following error (look up to see its traceback):
'NoneType' object has no attribute 'split'

I'll try the last step by step that was posted in here and see what it does

Since the error is related to bitsandbytes, just changed it to 0.38.1 and that let me start it up. Now I just have to find out if it actually functions

bin /home/imi/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/imi/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cextension.py:33: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "

^ Ok this looks like some cpu shit which is not what I had in mind

Tried to add this https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6 but it doesn't work.

which: no nvcc in (/home/imi/oobabooga_linux/installer_files/env/bin:/home/imi/oobabooga_linux/installer_files/conda/condabin:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl)
which: no hipcc in (/home/imi/oobabooga_linux/installer_files/env/bin:/home/imi/oobabooga_linux/installer_files/conda/condabin:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl)
/bin/hipcc -std=c++14 -c -fPIC --offload-arch=gfx1030 -I /include -I /home/imi/oobabooga_linux/installer_files/env/lib/bitsandbytes/csrc -I /home/imi/oobabooga_linux/installer_files/env/lib/bitsandbytes/include -o /home/imi/oobabooga_linux/installer_files/env/lib/bitsandbytes/build/ops.o -DNO_CUBLASLT -DBITS_AND_BYTES_USE_ROCM /home/imi/oobabooga_linux/installer_files/env/lib/bitsandbytes/csrc/ops.cu
make: /bin/hipcc: No such file or directory
make: *** [Makefile:132: hip] Error 127

I'll try next time. For now koboldcpp just gor a rocm fork on windows, so time to try that.

lufixSch commented 11 months ago

Since the error is related to bitsandbytes, just changed it to 0.38.1 and that let me start it up. Now I just have to find out if it actually functions

bin /home/imi/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/imi/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cextension.py:33: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
 warn("The installed version of bitsandbytes was compiled without GPU support. "

@Enferlain you can ignore this as long as you do not want to use bitsandbytes.

Tried to add this https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6 but it doesn't work.

Got similar errors for most bitsandbytes forks for ROCm. All forks I found are listed in my AMD Setup guide above. You can try if one of them works for you.

I would also recommend to not install ROCm 5.6 specific packages (e.g. bitsandbytes ROCm 5.6 or torch ROCm 5.6) if you do not use a 7xxx GPU. Most of them are pretty new or still in beta.

userbox020 commented 11 months ago

hello, i think almost got it @lufixSch thanks for your step by step tutorial its very clean and clear. Im having this error

(vamd) mruserbox@guru-X99:/media/10TB_HHD/_AMD/text-generation-webui$ python server.py
bin /home/mruserbox/miniconda3/envs/vamd/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
2023-09-14 04:57:16 INFO:Loading the extension "gallery"...
Traceback (most recent call last):
  File "/media/10TB_HHD/_AMD/text-generation-webui/server.py", line 226, in <module>
    create_interface()
  File "/media/10TB_HHD/_AMD/text-generation-webui/server.py", line 114, in create_interface
    ui_model_menu.create_ui()  # Model tab
  File "/media/10TB_HHD/_AMD/text-generation-webui/modules/ui_model_menu.py", line 29, in create_ui
    total_mem.append(math.floor(torch.cuda.get_device_properties(i).total_memory / (1024 * 1024)))
  File "/home/mruserbox/miniconda3/envs/vamd/lib/python3.10/site-packages/torch/cuda/__init__.py", line 399, in get_device_properties
    return _get_device_properties(device)  # type: ignore[name-defined]
RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "../aten/src/ATen/hip/HIPContext.cpp":51, please report a bug to PyTorch.

userbox020 commented 11 months ago

no worries bro @lufixSch i did again a pip install -r requirements_nocuda.txt but with the flags of fore reinstall and it fix it thanks

pip install -r requirements_nocuda.txt --force-reinstall

Enferlain commented 11 months ago

The new koboldcpp rocm fork is an immense speed up. It's not even close. If you're on amd with a decent card and struggle with getting things work on linux or are only on linux for text gen reasons, just go back to windows and use that. Prompt processing was cut from minutes to like 5-10 seconds for a full 4k tokens. There is 0 wait time for pretty much any action in 4k context 13b, and 30b is probably going to be the same if not better as 13b was on clblast offloading. (probably better)

Andyholm commented 11 months ago

Getting around18T/s on the kobold rocm fork! Definitely worth checking out.

model: airoboros-33b-2.1.Q4_K_M.gguf (not 18T when I hit around 4k tokens on a 33b model though. Usually around 10T) context size: 4096 generation size: 512 GPU: 7900 XTX RAM: 32Gb 3600MHz

RBNXI commented 11 months ago

The new koboldcpp rocm fork is an immense speed up. It's not even close. If you're on amd with a decent card and struggle with getting things work on linux or are only on linux for text gen reasons, just go back to windows and use that. Prompt processing was cut from minutes to like 5-10 seconds for a full 4k tokens. There is 0 wait time for pretty much any action in 4k context 13b, and 30b is probably going to be the same if not better as 13b was on clblast offloading. (probably better)

Thanks for sharing, I tried it, works perfectly, I guess I'll use this one for a while instead of oobabooga's

oobabooga commented 11 months ago

The purpose of this thread is to figure out how to set up the web UI on an AMD GPU. Please stay on topic.

containerblaq1 commented 11 months ago

Anyone encountering Clip_Rectangle at 100% after loading a model? Didn't do this on 6800XT but does on 7900XTX

Only thing not working now is llama.cpp

Is there a way to build it separately using the llama.cpp repo and provide that to text-generation-webui?

Also has anyone been able to build ctransformers?

userbox020 commented 11 months ago

Anyone encountering Clip_Rectangle at 100% after loading a model? Didn't do this on 6800XT but does on 7900XTX

Only thing not working now is llama.cpp

Is there a way to build it separately using the llama.cpp repo and provide that to text-generation-webui?

Also has anyone been able to build ctransformers?

Havent been able to run yet ooba with AMD but i know i'm very close.

Before follow the steps above from @lufixSch you must ensure to have rocm installed with all their packages and then install hipblas. Then install hipblas https://github.com/ROCmSoftwarePlatform/hipBLAS

You will be able theorically to compile from source llama.cpp and any other loader with rocm support

If you want you can join the thebloke discord server hardware channel and we can figurate out and come back here with a complete step by step solution

oobabooga commented 11 months ago

The updated one-click installer installs ROCm 4.2 wheels for AutoGPTQ, ExLlama, GPTQ-for-LLaMa, and llama.cpp. The first 3 were already present, but the last one is new.

Those wheels are in the following new requirements.txt files:

https://github.com/oobabooga/text-generation-webui/blob/main/requirements_amd.txt https://github.com/oobabooga/text-generation-webui/blob/main/requirements_amd_noavx2.txt

It's Linux only for now. It should in principle work as long as one_click.py detects the environment correctly. It may be necessary to uncomment/edit one or more of those lines:

# Remove the '# ' from the following lines as needed for your AMD GPU on Linux
# os.environ["ROCM_PATH"] = '/opt/rocm'
# os.environ["HSA_OVERRIDE_GFX_VERSION"] = '10.3.0'
# os.environ["HCC_AMDGPU_TARGET"] = 'gfx1030'

ssbberggren commented 11 months ago

So I finally managed to get it working.

System info

Ryzen 3600X / 16GB RAM
6700 XT / gfx1031
Linux 21.2 Mint / 6.2.0-26-generic
ROCM Installed and configured -> 5.4.2 / 5.4.3 / 5.7.0
Ran using most recent commit a3ad9fe and ./start_linux.sh

The issue

Too many to count over the course of installing this, but I'll list the relevant bits down below. In general, I got stuck at the below error message for longer than I dare admit.

Error

Traceback (most recent call last):
  File "/home/localuser/NeuralNet/text-generation-webui/installer_files/env/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1184, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/home/localuser/NeuralNet/text-generation-webui/installer_files/env/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/localuser/NeuralNet/text-generation-webui/installer_files/env/lib/python3.10/site-packages/transformers/generation/utils.py", line 27, in <module>
    from ..integrations.deepspeed import is_deepspeed_zero3_enabled
  File "/home/localuser/NeuralNet/text-generation-webui/installer_files/env/lib/python3.10/site-packages/transformers/integrations/__init__.py", line 14, in <module>
    from .bitsandbytes import (
  File "/home/localuser/NeuralNet/text-generation-webui/installer_files/env/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 11, in <module>
    import bitsandbytes as bnb
  File "/home/localuser/NeuralNet/text-generation-webui/installer_files/env/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "/home/localuser/NeuralNet/text-generation-webui/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/__init__.py", line 1, in <module>
    from . import nn
  File "/home/localuser/NeuralNet/text-generation-webui/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "/home/localuser/NeuralNet/text-generation-webui/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "/home/localuser/NeuralNet/text-generation-webui/installer_files/env/lib/python3.10/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "/home/localuser/NeuralNet/text-generation-webui/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 13, in <module>
    setup.run_cuda_setup()
  File "/home/localuser/NeuralNet/text-generation-webui/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 120, in run_cuda_setup
    binary_name, cudart_path, cc, cuda_version_string = evaluate_cuda_setup()
  File "/home/localuser/NeuralNet/text-generation-webui/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 341, in evaluate_cuda_setup
    cuda_version_string = get_cuda_version()
  File "/home/localuser/NeuralNet/text-generation-webui/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 311, in get_cuda_version
    major, minor = map(int, torch.version.cuda.split("."))
AttributeError: 'NoneType' object has no attribute 'split'

Of which the important part seems to be

  File "/home/localuser/NeuralNet/text-generation-webui/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 311, in get_cuda_version
    major, minor = map(int, torch.version.cuda.split("."))
AttributeError: 'NoneType' object has no attribute 'split'

The Solution

This is quite similar to what @Enferlain experienced earlier, ie related to bitsandbytes. I went ahead and applied the same fix, which is just to install bitsandbytes 0.38.1 by updating the relevant requirements.txt files. It returned the exact same "without GPU support" line.

Alright, but the webui runs, so I went ahead and tried to load a model

rocBLAS error: Cannot read /home/localuser/NeuralNet/text-generation-webui/installer_files/env/lib/python3.10/site-packages/torch/lib/rocblas/library/TensileLibrary.dat: No such file or directory
Aborted (core dumped)

Which turns out to be an issue already documented earlier, simply uncommenting the lines in the one_click.py file fixes this problem. Due to my own specifications, I had to make some changes (specifying /rocm5.4.3 for my multi-install and gfx1031 for my 6700XT)

# Remove the '# ' from the following lines as needed for your AMD GPU on Linux
os.environ["ROCM_PATH"] = '/opt/rocm5.4.3'
os.environ["HSA_OVERRIDE_GFX_VERSION"] = '10.3.0'
os.environ["HCC_AMDGPU_TARGET"] = 'gfx1031'

HSA_OVERRIDE_GFX_VERSION seems to be what fixed it, but how does it actually perform?

llama_print_timings:        load time =   467.39 ms
llama_print_timings:      sample time =   108.58 ms /   200 runs   (    0.54 ms per token,  1841.89 tokens per second)
llama_print_timings: prompt eval time =   467.32 ms /    73 tokens (    6.40 ms per token,   156.21 tokens per second)
llama_print_timings:        eval time =  4479.24 ms /   199 runs   (   22.51 ms per token,    44.43 tokens per second)
llama_print_timings:       total time =  5381.29 ms
Output generated in 5.65 seconds (35.43 tokens/s, 200 tokens, context 73, seed 1427747452)

Holy shit

Maybe you should give it another go, @Enferlain :slightly_smiling_face:

oobabooga commented 11 months ago

@ssbberggren thanks for the feedback, I have updated the AMD requirements to use bitsandbytes==0.38.1: https://github.com/oobabooga/text-generation-webui/commit/08c4fb12ae30fec3130ddfaad45bcbbb32789036

So now the only additional step should be to setup those environment variables. I wonder if that can be automated as well somehow.

Enferlain commented 11 months ago

@ssbberggren I'll try this method soonish.

Maybe this will be useful for someone: I needed bitsandbytes for something else since last time (sd lora trainer scripts) and I got around to fixing it with a fork of bitsandbytes based on this comment

# bitsandbytes rocm
# video guide : https://www.youtube.com/watch?v=2cPsvwONnL8
# https://git.ecker.tech/mrq/bitsandbytes-rocm
## https://github.com/0cc4m/bitsandbytes-rocm <--- used this
git clone https://git.ecker.tech/mrq/bitsandbytes-rocm.git
cd bitsandbytes-rocm/
pip install -r requirements.txt
make hip
CUDA_VERSION=gfx1030 python setup.py install

I just had to edit the makefile as described in the video and with the help of chatgpt since I got a different error doing make hip, but it worked afterwards. And then I edited the requirements.txt of what I wanted to use with -e /home/"user"/bitsandbytes-rocm and did import bitsandbytes and that seemed to do the trick in my case at least. This is just for getting bitsandbytes to work. The version in setup.py says it's 0.37.0 but I don't see why this method wouldn't work for any other version.

Dasug commented 11 months ago

@containerblaq1

Using a 7900XTX and a 7900XT [...] llama.cpp - Does not work - invalid device function

I managed to get llama.cpp running on a 7900 XTX by installing llama_cpp_python the following way into the conda environment:

CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DAMDGPU_TARGETS=gfx1100" CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ pip install llama_cpp_python

You might need to uninstall llama_cpp_python first if pip doesn't want to update it and you also might need to adjust the compiler paths if your ROCm folder is not /opt/rocm.

containerblaq1 commented 11 months ago

@containerblaq1

Using a 7900XTX and a 7900XT [...] llama.cpp - Does not work - invalid device function

I managed to get llama.cpp running on a 7900 XTX by installing llama_cpp_python the following way into the conda environment:
CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DAMDGPU_TARGETS=gfx1100" CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ pip install llama_cpp_python
You might need to uninstall llama_cpp_python first if pip doesn't want to update it and you also might need to adjust the compiler paths if your ROCm folder is not /opt/rocm.

See also ggerganov/llama.cpp#3320 (comment)

Edit: Thanks for updating with your findings!

I'll give this one a try again since I've rebuilt the environment. What versions of ROCm and PyTorch packages are you using for this?

Dasug commented 11 months ago

What versions of ROCm and PyTorch packages are you using for this?

I'm using ROCm 5.6.1 (5.7 is not available yet on my distribution's package repos) with pytorch nightly (specifically 2.2.0.dev20230922+rocm5.6, so the nightly version from a few days ago. The current one should probably also work though). I also have the following environment variables active:

export HSA_OVERRIDE_GFX_VERSION=11.0.0
export HCC_AMDGPU_TARGET=gfx1100

I don't think you need them during compilation of llama_cpp_python but I haven't specifically tested without them either.

containerblaq1 commented 11 months ago

@Dasug Your solution worked perfectly, thanks! I was using the below command.

CMAKE_ARGS="-DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DCMAKE_CC_COMPILER=/opt/rocm/llvm/bin/clang -DLLAMA_HIPBLAS=on -DAMDGPU_TARGETS=gfx1100" pip install llama-cpp-python --force-reinstall --no-cache-dir

userbox020 commented 11 months ago

im sure this going to be helpfull

https://github.com/nktice/AMD-AI

containerblaq1 commented 11 months ago

Not sure it it helps anyone but inference on a 7900XTX and a 7900XT works. Quick enough for me.

llm_load_tensors: using ROCm for GPU acceleration
ggml_cuda_set_main_device: using device 0 (Radeon RX 7900 XTX) as main device
llm_load_tensors: mem required  =  140.76 MB (+  768.00 MB per state)
llm_load_tensors: offloading 48 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloading v cache to GPU
llm_load_tensors: offloading k cache to GPU
llm_load_tensors: offloaded 51/51 layers to GPU
llm_load_tensors: VRAM used: 19910 MB
...................................................................................................
llama_new_context_with_model: kv self size  =  768.00 MB
llama_new_context_with_model: compute buffer total size =  561.47 MB
llama_new_context_with_model: VRAM scratch buffer: 560.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
2023-09-30 17:03:27 INFO:Loaded the model in 21.62 seconds.

llama_print_timings:        load time =  1748.93 ms
llama_print_timings:      sample time =    33.09 ms /    84 runs   (    0.39 ms per token,  2538.68 tokens per second)
llama_print_timings: prompt eval time =  2171.88 ms /   552 tokens (    3.93 ms per token,   254.16 tokens per second)
llama_print_timings:        eval time =  2954.39 ms /    83 runs   (   35.60 ms per token,    28.09 tokens per second)
llama_print_timings:       total time =  5279.05 ms
Output generated in 5.53 seconds (15.02 tokens/s, 83 tokens, context 552, seed 1198831578)
Llama.generate: prefix-match hit
Output generated in 2.95 seconds (22.38 tokens/s, 66 tokens, context 649, seed 514118471)
Llama.generate: prefix-match hit

llama_print_timings:        load time =  1748.93 ms
llama_print_timings:      sample time =    59.64 ms /   145 runs   (    0.41 ms per token,  2431.34 tokens per second)
llama_print_timings: prompt eval time =   318.84 ms /    13 tokens (   24.53 ms per token,    40.77 tokens per second)
llama_print_timings:        eval time =  5247.92 ms /   144 runs   (   36.44 ms per token,    27.44 tokens per second)
llama_print_timings:       total time =  5844.58 ms
Output generated in 6.11 seconds (23.58 tokens/s, 144 tokens, context 728, seed 315484668)
Llama.generate: prefix-match hit
Output generated in 3.48 seconds (22.71 tokens/s, 79 tokens, context 884, seed 1660267709)

Edit: Should see this in the output as well.

ggml_init_cublas: found 2 ROCm devices:
  Device 0: Radeon RX 7900 XTX, compute capability 11.0
  Device 1: Radeon RX 7900 XT, compute capability 11.0

lufixSch commented 11 months ago

@containerblaq1 With which model did you get those results? And did you try any GPTQ models (AutoGPTQ/Exllama/ExllamaV2)? How did it perform? I am thinking about adding a 7900 XT or XTX to my system.

containerblaq1 commented 11 months ago

@containerblaq1 With which model did you get those results? And did you try any GPTQ models (AutoGPTQ/Exllama/ExllamaV2)? How did it perform? I am thinking about adding a 7900 XT or XTX to my system.

Sorry for the delay.

I was using: https://huggingface.co/TheBloke/Spicyboros-c34b-2.2-GGUF

Edit: The file was spicyboros-c34-2.2.Q4_K_M.gguf

Please post the models you would like me to test @lufixSch

lufixSch commented 11 months ago

I was using: https://huggingface.co/TheBloke/Spicyboros-c34b-2.2-GGUF

Around 22 t/s for a 34b Model sounds okay.

Please post the models you would like me to test @lufixSch

As I usually run GPTQ models, a 13B GPTQ model would be nice (every loader is fine but exllamav2 is fastest and has the best AMD support) For example TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ if this is possible.

Additionally It would be nice if you could try a large model to see how the performance scales. Maybe try TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ

containerblaq1 commented 11 months ago

I was using: https://huggingface.co/TheBloke/Spicyboros-c34b-2.2-GGUF

Around 22 t/s for a 34b Model sounds okay.

Please post the models you would like me to test @lufixSch

As I usually run GPTQ models, a 13B GPTQ model would be nice (every loader is fine but exllamav2 is fastest and has the best AMD support) For example TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ if this is possible.

Additionally It would be nice if you could try a large model to see how the performance scales. Maybe try TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ

I have that one and the SuperHOT version of it. Any configuration options?

lufixSch commented 11 months ago

I usually run the 4bit 128g Version but I think I could get an idea of the performance if you use an other version

Other than that I don't have any specific wishes for the configuration. I usually load with the default settings.

containerblaq1 commented 11 months ago

ExLlama below. Need to rebuild AutoGPTQ for ROCm 5.7 and for some reason ExLlamav2 does nothing now.

Wizard-Vicuna-30B Output generated in 10.27 seconds (12.85 tokens/s, 132 tokens, context 220, seed 1449319062)

"Hello world!"

In JavaScript that looks like:

function HelloWorld() {
    console.log("Hello World!");
}
HelloWorld();

What does that look like in bash?

In Bash, you could write a script to display the same message using echo:

#!/bin/bash
echo "Hello World!"
When executed, this script will output "Hello World!" to the terminal. Alternatively, you could use a function to achieve the same result:

#!/bin/bash
hello_world() {
    echo "Hello World!"
}
hello_world

This is just one example of how you can use functions in Bash to accomplish tasks. There are many other ways to achieve the same result, depending on your specific needs and preferences.

Wizard-Vicuna-13B Output generated in 10.48 seconds (19.57 tokens/s, 205 tokens, context 381, seed 279015373)

"Hello world!"

In JavaScript that looks like:

function HelloWorld() {
    console.log("Hello World!");
}
HelloWorld();

What does that look like in bash?

In Bash, you can create a similar function using the echo command to output the message. Here's an example:

echo "Hello World!" This will print the message "Hello World!" to the terminal. If you want to make the function more interactive, you could add user input prompts and perform different actions based on their response. For example:

function hello_world() {
 echo "What's your name?"
 read answer
 if [ "$answer" != "John" ] && [ "$answer" != "Jane" ]; then
 echo "Sorry, I can only greet John and Jane."
 else
 echo "Hello, $answer!"
 fi
}
hello_world

This function asks the user for their name, reads their response, and then outputs a personalized greeting based on their input. It also includes error handling to handle cases where the user enters an invalid name.

containerblaq1 commented 11 months ago

If you want you can join the thebloke discord server hardware channel and we can figurate out and come back here with a complete step by step solution

@userbox020

joined the Discord. Does one exist for this project?

userbox020 commented 11 months ago

Sup bro, sorry don't understand the question

lufixSch commented 11 months ago

joined the Discord. Does one exist for this project?

@containerblaq1 Yes there is an Text Generation WebUI Discord: https://discord.gg/jwZCF2dPQN

Output generated in 10.48 seconds (19.57 tokens/s, 205 tokens, context 381, seed 279015373)

Thanks for testing the models. The results are pretty interesting as I get around the same speed (for 13B Models) on my 6750 XT with AutoGPTQ. I expected around 20 t/s for the 30B Model and at least 30 t/s for the 13B version with AutoGPTQ.

Ps.: If you want to run GPTQ models, you should consider getting ExllamaV2 running on your system. With the current Version 0.0.2 I get an 50% (up to 60% in some cases) speed increase for GPTQ models with way better VRAM management than AutoGPTQ (no crashes if VRAM fills up. Just slower inference). It should Support the 7000 series of AMD GPU out of the box, if you use Torch for ROCM5.6 (Which is the only version working for the 7900 if I'm not mistaken). If you still have problems open an issue at the exllamav2 repo. The Guy implementing the ROCm support is pretty nice and helpful.

Edit: I have to correct myself. The main guy (ardfork) adding ROCm support to ExllamaV2 runs a 6000 Series GPU. Therefore I'm not sure if it was tested on 7000 Series GPUs before. But I'm sure he will still help you.

lufixSch commented 11 months ago

@oobabooga I noticed that the new requirements_amd.txt install llama-cpp-python without GPU support. Is that intendet?

If yes, I think it would be helpful to add the proper command to the documentation

export ROCM_HOME=/opt/rocm # depends on your ROCm installation
export HCC_AMDGPU_TARGET=gfx1030 # depends on your GPU
CMAKE_ARGS="-DCMAKE_CXX_COMPILER=$ROCM_HOME/llvm/bin/clang++ -DCMAKE_CC_COMPILER=$ROCM_HOME/llvm/bin/clang -DLLAMA_HIPBLAS=on -DAMDGPU_TARGETS=$HCC_AMDGPU_TARGET" pip install llama-cpp-python --force-reinstall --no-cache-dir

I know that the README.md already links the llama-cpp-python documentation but the command there does not work for some GPUs. The one above worked for all people in this thread so far.

Ps.: Thanks for adding the platform specific requirements. It makes the setup much easier.

lufixSch commented 11 months ago

@GhostNaN I finally tried llama.cpp again and figured out why it did not work with all layers on GPU at least after I reinstalled it with the command above:

The default prompt context size (n_ctx) is huge (17000 or so) and therefore the VRAM fills up and the whole WebUI exits with a segmentation fault. Without the context it only fills up 60% of my VRAM that's probably why I didn't think about that before. Reducing n_ctx (4096) made it work for me but I am not sure if it is was the same error we discussed before.

Edit: Got the out of memory crash again with a larger model during inference. @oobabooga is it possible to catch the out of memory error before the whole program crashes? It is pretty annoying to restart the WebUI every time the VRAM fills up.

oobabooga commented 11 months ago

@lufixSch 2 llama.cpp wheels are installed, one with and one without GPU support. The latter is used by default.

About llama-cpp-python crashing, I don't know if an exception can be caught: https://github.com/abetlen/llama-cpp-python/issues/374

lufixSch commented 11 months ago

2 llama.cpp wheels are installed, one with and one without GPU support. The latter is used by default.

@oobabooga Thanks for clarifying. I tried it again and uninstalled all llama_cpp_python packages. After installing the requirements I see the two following llama.cpp packages:

llama_cpp_python          0.2.11
llama_cpp_python_cuda     0.2.11+rocm5.4.2

When loading a GGUF, I don't see the expected output with the llm_load_tensors: offloaded x/y layers to GPU and when I run text generation inference runs only on CPU.

About llama-cpp-python crashing, I don't know if an exception can be caught: https://github.com/abetlen/llama-cpp-python/issues/374

Okay sounds like we have to wait until the error is properly handled in llama.cpp. Too bad.

containerblaq1 commented 11 months ago

Is anyone booting 5.15.0-86-generic ?

EDIT: seems to be an issue with that kernel, not any of these packages. Leaving this here to save someone some time.

https://forums.linuxmint.com/viewtopic.php?t=405349

LasonHistory commented 11 months ago

hello, i have a problem in using gpu. after running python server.py, it warns: /home/user/miniconda3/lib/python3.11/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. warn("The installed version of bitsandbytes was compiled without GPU support. "

in the webui, only can use cpu, but not 8bit or 4bit.does anyone have the same problem?

lufixSch commented 11 months ago

@LasonHistory this issue was discussed multiple times before. The warning is only related to bitsandbytes as the original version does not support AMD. You can usually ignore the warning as long as you don’t want to load models using bitsandbytes. Usually you would use GPTQ or GGUF models wich have loaders with better AMD support

As described in my setup guide comment in this thread (which is otherwise outdated since the requirements where updated) there are multiple bitsandbytes forks which do support AMD but I got none of them working so far.

Outdated Setup Guide

LasonHistory commented 11 months ago

@LasonHistory this issue was discussed multiple times before. The warning is only related to bitsandbytes as the original version does not support AMD. You can usually ignore the warning as long as you don’t want to load models using bitsandbytes. Usually you would use GPTQ or GGUF models wich have loaders with better AMD support

As described in my setup guide comment in this thread (which is otherwise outdated since the requirements where updated) there are multiple bitsandbytes forks which do support AMD but I got none of them working so far.

Outdated Setup Guide

thankfully i run gguf successfully! BTW gptq somehow died. but still, the first time using AMD gpu device, really appreciate~

LasonHistory commented 11 months ago

@lufixSch it seems that runs with llama-cpp-python. But i cannot implement within own venv, can you tell me what packages to install? detail: My venv:

python3.10, 2. torch-rocm5.6, 3. requirements_amd.txt sadly it doesnt work.

lufixSch commented 11 months ago

BTW gptq somehow died

@LasonHistory Which loader did you use? AutoGPTQ is the default but I would try exllamaV2 (or exllama). they are both much faster and I had less problems with them.

thankfully i run gguf successfully!

Did you make sure, it ran on GPU? If not, set the n-gpu-layers to maximum and set the context size to a decent value (for example 2048 or 4096) otherwise it might crash because of a full VRAM

But i cannot implement within own venv

I am not sure I understand, what you mean. Your Information about the venv also doesn't really help. You can get a list of all packages with pip list or print it directly to a file with pip freeze > filename.txt. This information would give a lot more insight

oobabooga / text-generation-webui

AMD thread #3759

System info

The issue

Error

The Solution