oobabooga commented 10 months ago

This thread is dedicated to discussing the setup of the webui on AMD GPUs.

You are welcome to ask questions as well as share your experiences, tips, and insights to make the process easier for all AMD users.

MistakingManx commented 10 months ago

Why no AMD for Windows?

BarfingLemurs commented 10 months ago

@MistakingManx there is, you have to diy a llama cpp python build. It will be harder to setup than Linux.

lufixSch commented 10 months ago

Does someone has a working AutoGPTQ setup?

Mine was really slow when I installed the wheel: https://github.com/PanQiWei/AutoGPTQ/releases/download/v0.4.2/auto_gptq-0.4.2+rocm5.4.2-cp310-cp310-linux_x86_64.whl

When building from source, the text generation is much faster but the output is just gibberish.

I am running on a RX 6750 XT if this is important.

MistakingManx commented 10 months ago

@MistakingManx there is, you have to diy a llama cpp python build. It will be harder to setup than Linux.

https://github.com/ggerganov/llama.cpp/issues/2799#issuecomment-1694504740

https://github.com/ggerganov/llama.cpp/issues/2843

Why exactly do models prefer a GPU instead of a CPU? Mine is running quick on CPU, but OBS kills it off due to OBS using so much.

BarfingLemurs commented 10 months ago

Why exactly do models prefer

users prefer. Since:

an AMD gpu comparable with 3090 may work at ~20t/s for 34B model.

MistakingManx commented 10 months ago

I have an AMD Radeon RX 5500 XT, is that good? My CPU spits fully completed things out within 6 seconds, when the CPU isn't stressed with OBS. Otherwise it takes around 35 seconds, if I could speed that up with my GPU I'd say it's worth the setup

CNR0706 commented 10 months ago

I'm having trouble getting the WebUI to even launch. I'm using ROCm 6.1 on openSuSE Tumbleweed Linux with a 6700XT.

I used the 1 click installer to set it up (and I selected ROCm support) but after the installation finished it just threw an error:

cnr07@opensuse-linux-gpc:~/oobabooga_linux> ./start_linux.sh Traceback (most recent call last): File "/home/cnr07/oobabooga_linux/text-generation-webui/server.py", line 28, in <module> from modules import ( File "/home/cnr07/oobabooga_linux/text-generation-webui/modules/training.py", line 21, in <module> from peft import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/__init__.py", line 22, in <module> from .auto import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/auto.py", line 31, in <module> from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/mapping.py", line 23, in <module> from .peft_model import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/peft_model.py", line 38, in <module> from .tuners import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/tuners/__init__.py", line 21, in <module> from .lora import LoraConfig, LoraModel File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/tuners/lora.py", line 45, in <module> import bitsandbytes as bnb File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module> from . import cuda_setup, utils, research File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/__init__.py", line 1, in <module> from . import nn File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module> from .modules import LinearFP8Mixed, LinearFP8Global File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module> from bitsandbytes.optim import GlobalOptimManager File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module> from bitsandbytes.cextension import COMPILED_WITH_CUDA File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 13, in <module> setup.run_cuda_setup() File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 120, in run_cuda_setup binary_name, cudart_path, cc, cuda_version_string = evaluate_cuda_setup() File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 341, in evaluate_cuda_setup cuda_version_string = get_cuda_version() File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 311, in get_cuda_version major, minor = map(int, torch.version.cuda.split(".")) AttributeError: 'NoneType' object has no attribute 'split' --- System --- GPU: RX 6700XT CPU: R5 3600 RAM: 16 GiB OS: openSuSE Tumbleweed (up to date) Kernel: Linux 6.4.11-1-default GPU Driver: AMDGPU FOSS Kernel driver, full Mesa 23.1.6 ROCm: 6.1, from AMD's SuSE repo

henrittp commented 10 months ago

I'm having trouble getting the WebUI to even launch. I'm using ROCm 6.1 on openSuSE Tumbleweed Linux with a 6700XT.

I used the 1 click installer to set it up (and I selected ROCm support) but after the installation finished it just threw an error:

cnr07@opensuse-linux-gpc:~/oobabooga_linux> ./start_linux.sh Traceback (most recent call last): File "/home/cnr07/oobabooga_linux/text-generation-webui/server.py", line 28, in <module> from modules import ( File "/home/cnr07/oobabooga_linux/text-generation-webui/modules/training.py", line 21, in <module> from peft import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/__init__.py", line 22, in <module> from .auto import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/auto.py", line 31, in <module> from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/mapping.py", line 23, in <module> from .peft_model import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/peft_model.py", line 38, in <module> from .tuners import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/tuners/__init__.py", line 21, in <module> from .lora import LoraConfig, LoraModel File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/tuners/lora.py", line 45, in <module> import bitsandbytes as bnb File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module> from . import cuda_setup, utils, research File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/__init__.py", line 1, in <module> from . import nn File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module> from .modules import LinearFP8Mixed, LinearFP8Global File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module> from bitsandbytes.optim import GlobalOptimManager File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module> from bitsandbytes.cextension import COMPILED_WITH_CUDA File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 13, in <module> setup.run_cuda_setup() File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 120, in run_cuda_setup binary_name, cudart_path, cc, cuda_version_string = evaluate_cuda_setup() File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 341, in evaluate_cuda_setup cuda_version_string = get_cuda_version() File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 311, in get_cuda_version major, minor = map(int, torch.version.cuda.split(".")) AttributeError: 'NoneType' object has no attribute 'split' --- System --- GPU: RX 6700XT CPU: R5 3600 RAM: 16 GiB OS: openSuSE Tumbleweed (up to date) Kernel: Linux 6.4.11-1-default GPU Driver: AMDGPU FOSS Kernel driver, full Mesa 23.1.6 ROCm: 6.1, from AMD's SuSE repo

same issue here. Still no solution for me. Anyone can gimme some light here? ty in advance.

CNR0706 commented 10 months ago

Okay, so this is definitely not idea but I found that VERY carefully following the manual installation guide and then uninstalling bitsandbytes makes it work. I'm still figuring things out but at least it works now.

henrittp commented 10 months ago

then uninstalling bitsandbytes makes it work

Then you installed that modified version of bitsandbytes for rocm? Or..? What exactly did you do? Tks in advance.

henrittp commented 10 months ago

@CNR0706 I managed to install a modified version of bitsandbytes for ROCm. Just follow this tutorial and you should be fine: YT Video. Therefore, you can leverage all of this lib offers (or almost everything, but anyways...)

lufixSch commented 10 months ago

@CNR0706 I managed to install a modified version of bitsandbytes for ROCm. Just follow this tutorial and you should be fine: YT Video. Therefore, you can leverage all of this lib offers (or almost everything, but anyways...)

I am not sure which version is newer, but I used https://github.com/agrocylo/bitsandbytes-rocm. You need to build it from source with the following commands:

git clone git@github.com:agrocylo/bitsandbytes-rocm.git
cd bitsandbytes-rocm/
export PATH=/opt/rocm/bin:$PATH #Add ROCm to $PATH
export HSA_OVERRIDE_GFX_VERSION=10.3.0 HCC_AMDGPU_TARGET=gfx1030
make hip
python setup.py install

Make sure the environment variables are also set, when you start the webui. Depending on your GPU you might need to change the GPU target or GFX Version.

lufixSch commented 10 months ago

I have an AMD Radeon RX 5500 XT, is that good? My CPU spits fully completed things out within 6 seconds, when the CPU isn't stressed with OBS. Otherwise it takes around 35 seconds, if I could speed that up with my GPU I'd say it's worth the setup

Saying it takes 6 seconds is not that helpful to get an Idea of the performance you have. Because that depends on the length of the output. Take a look at the console. After every generation it spits out the generation speed in t/s. It also depends on what model you are using.

With my RX 6750 XT I got about 35 t/s with a 7B GPTQ Model

lufixSch commented 10 months ago

@henrittp, @CNR0706 Did you try setting up AutoGPTQ? Did it work for you?

RBNXI commented 10 months ago

I have AttributeError: 'NoneType' object has no attribute 'split' error too... Has ANYONE managed to run this with ROCM at all?, I'm starting to think that AMD is just useless for this stuff

lufixSch commented 10 months ago

I have AttributeError: 'NoneType' object has no attribute 'split' error too...

@RBNXI This is caused by bitsandbytes. You need to install a specific Version. Take a look at my comment above.

Has ANYONE managed to run this with ROCM at all?, I'm starting to think that AMD is just useless for this stuff

Yes it worked really good on my PC until I broke my Installation with an update of the repository. I am also running Stable diffusion on my PC with AUTOMATIC1111 and it works great. The AUTOMATIC1111 Setup is much easier, because the install script takes care of everything.

I plan on improving the one click installer and/or the setup guide of the oobabooga webui for AMD to make the setup easier, if I ever get it running again :)

RBNXI commented 10 months ago

I plan on improving the one click installer and/or the setup guide of the oobabooga webui for AMD to make the setup easier, if I ever get it running again :)

Cool, I'll be waiting for that then.

@RBNXI This is caused by bitsandbytes. You need to install a specific Version. Take a look at my comment above.

I saw it and tried to build it, but gave an error and got tired of trying stuff, I just thought "well, having to do so many steps and then having so many errors must mean it's just not ready yet...". But I could try another day when I have more time if I can fix that error, thanks.

lufixSch commented 10 months ago

@RBNXI What Error did you get? Make sure the repo is located on a path without spaces. This seems to cause issues sometimes. And you need the rocm-hip-sdk package (at least on arch linux it is called that way)

"well, having to do so many steps and then having so many errors must mean it's just not ready yet..."

Yes I can understand that. The setup with NVIDIA is definitely easier.

RBNXI commented 10 months ago

@RBNXI What Error did you get? Make sure the repo is located on a path without spaces. This seems to cause issues sometimes. And you need the rocm-hip-sdk package (at least on arch linux it is called that way)

I don't remember the error, I'm sorry. But I had a question for when I try again, the command you used to clone (git clone git@github.com:agrocylo/bitsandbytes-rocm.git) I remember it gave me an error, is it ok to just clone with the default link to the repo? It said the link you used is private or something like that

lufixSch commented 10 months ago

Yes you can of course use the link from the repo directly. You probably mean this one: https://github.com/agrocylo/bitsandbytes-rocm.git

RBNXI commented 10 months ago

I tried again and same result. I followed the installation tutorial, everything works fine, then run and get the split error, then I compiled bitsandbytes from that repo (now it worked) and then tried to run again and same split error again... Edit: I managed to fix that error, now everything is apparently working, but I try to load a model and says: assert self.model is not None Errors are never ending...

containerblaq1 commented 10 months ago

installing bitsandbytes-rocm is the only way I've been able to make this work. The new install doesn't seem to work for the 7900XTX

lufixSch commented 10 months ago

AMD Setup Step-by-Step Guide (WIP)

I finally got my setup working again (by reinstalling everything). Here is a step by step guide on how I got it running:

I tested all steps on Manjaro but they should work on other Linux distros. I have no Idea how the steps can be transferred to windows. Please leave a comment if you have a solution for Windows.

NOTE: At the start of each step I assume you have the terminal opened at the root of the project and that you have ROCm installed (on you need to install the rocm-hip-sdk package). Furthermore consider creating an virtual environment (for example with miniconda or venv) and activating it

NOTE: If you have a 7xxx Gen AMD GPU please read the notes at the End of this guide

Step 1: Install dependencies (should be similar to the one click installer except the last step)

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2
pip install -r requirements_nocuda.txt
export HSA_OVERRIDE_GFX_VERSION=10.3.0, export HCC_AMDGPU_TARGET=gfx1030 and export PATH=/opt/rocm/bin:$PATH (consider adding those lines to your .bash_profile, .zprofile or .profile as you need to run them every time you start the webui) (the gfx version might change depending on your GPU -> https://www.llvm.org/docs/AMDGPUUsage.html#processors)

If you get an error installing torch try running pip install -r requirements_nocuda.txt first. After this run the torch install command with the --force-reinstall option

Step 2: Fix bitsandbytes

This step did not work properly for me. If you only want to get it working and don't want to use bitsandbytes on your GPU just run pip install bitsandbytes==0.38.1. I mostly run GPTQ models and this was fine for me. It seems like the official bitsandbytes project is working on supporting ROCm but this will take a while until there is a working version

mkdir repositories && cd repositories
git clone https://github.com/broncotc/bitsandbytes-rocm.git (or another fork listed below)
make hip
python setup.py install

I found the following forks which should work for ROCm but got none of them working. If you find a working version please give some feedback.

https://github.com/broncotc/bitsandbytes-rocm
https://github.com/0cc4m/bitsandbytes-rocm/tree/rocm (rocm branch)
https://github.com/Titaniumtown/bitsandbytes-rocm/tree/patch-2 (patch-2 branch)
https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6/tree/rocm (rocm branch)
https://git.ecker.tech/mrq/bitsandbytes-rocm

Step 3: Install AutoGPTQ

This is only neccessary if you want to run GPTQ models

mkdir repositories && cd repositories
git clone https://github.com/PanQiWei/AutoGPTQ.git && cd AutoGPTQ
ROCM_VERSION=5.4.2 pip install -v .

If the installation fails try applying the patch provided by this article. Run git apply with the patch provided below as argument

diff --git a/autogptq_extension/exllama/hip_compat.cuh b/autogptq_cuda/exllama/hip_compat.cuh
index 5cd2e85..79e0930 100644
--- a/autogptq_cuda/exllama/hip_compat.cuh
+++ b/autogptq_cuda/exllama/hip_compat.cuh
@@ -46,4 +46,6 @@ __host__ __forceinline__ hipblasStatus_t __compat_hipblasHgemm(hipblasHandle_t
 #define rocblas_set_stream hipblasSetStream
 #define rocblas_hgemm __compat_hipblasHgemm

+#define hipblasHgemm __compat_hipblasHgemm
+
 #endif

Step 4: Exllama

This is only neccessary if you want to use this model loader (Faster for GPTQ models)

mkdir repositories && cd repositories
git clone https://github.com/turboderp/exllama && cd exllama
pip install -r requirments.txt

Step 4.5: ExllamaV2

ExllamaV2 works out of the box and will be installed automatically when installing requirements_nocuda.txt

If you get an error running ExllamaV2 try installing the nightly version of torch for ROCm5.6 (Should be released as stable version soon)

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.6 --force-reinstall

Step 5: llama-cpp-python

Did not work for me today but it worked before (not sure what I did wrong today)

CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DCMAKE_CXX_FLAGS='-fPIC'" FORCE_CMAKE=1 CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ pip install llama-cpp-python

You might need to add the --no-cache-dir and --force-reinstall option if you installed llama-cpp-python before

I hope you can get it working with this guide :) I would appreciate some feedback on how this guide worked for you so we can create a complete and robust setup guide for AMD devices (and maybe even updated the one click installer based on the guide)

Notes on 7xxx AMD GPUs

Remember that you have to change the GFX Version for the envrionment variables: export HSA_OVERRIDE_GFX_VERSION=11.0.0, export HCC_AMDGPU_TARGET=gfx1100

As described by this article you should make sure to install/setup ROCm without opencl as this might cause problems with hip.

You also need to install the nighly version of torch for ROCm 5.6 instead of ROCm 5.4.2 (Should be released as stable version soon):

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.6

lufixSch commented 10 months ago

I try to load a model and says: assert self.model is not None Errors are never ending...

@RBNXI What model are you using? Which loader are you using? Usually this error means the loader failed to load the model.

As explained by my guide above you have to do extra steps for AutoGPTQ and Exllama/Exllama_HF.

Also note that with AutoGPTQ you often have to define the wbits and groupsize otherwise it will fail.

RBNXI commented 10 months ago

Awesome guide, thanks, I'll try it when I can. You mentioned that llama-cpp-python didn't work today and you don't know why. The model I was using was one of those, I think that there's currently a known bug that doesn't let us load llama models, could that be the problem?. Also, I think my GPU doesn't appear here https://www.llvm.org/docs/AMDGPUUsage.html#processors I have a RX 6600, is that one also 1030? Edit: I was able to load the model with llama.cpp, but they run in CPU, do I have to do anything special for it to run in GPU? I launch it with this: python server.py --chat --api --auto-devices --n-gpu-layers 1000000000 --n_ctx 4096 --mlock --verbose --model mythomax-l2-13b.Q5_K_M.gguf Don't tell me my GPU doesn't support ROCM please...

I tried with different --n-gpu-layers and same result.

Also, AutoGPTQ installation failed with

 Total number of replaced kernel launches: 4
  running clean
  removing 'build/temp.linux-x86_64-cpython-310' (and everything under it)
  removing 'build/lib.linux-x86_64-cpython-310' (and everything under it)
  'build/bdist.linux-x86_64' does not exist -- can't clean it
  'build/scripts-3.10' does not exist -- can't clean it
  removing 'build'
Failed to build auto-gptq
ERROR: Could not build wheels for auto-gptq, which is required to install pyproject.toml-based projects

Edit 2: I tried running a GPTQ model anyways, and it starts to load in VRam so the GPU is detected, but fails with:

Traceback (most recent call last):

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/modules/ui_model_menu.py”, line 196, in load_model_wrapper

shared.model, shared.tokenizer = load_model(shared.model_name, loader)

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/modules/models.py”, line 79, in load_model

output = load_func_map[loader](model_name)

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/modules/models.py”, line 320, in AutoGPTQ_loader

return modules.AutoGPTQ_loader.load_quantized(model_name)

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/modules/AutoGPTQ_loader.py”, line 57, in load_quantized

model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params)

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/auto.py”, line 108, in from_quantized

return quant_func(

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_base.py”, line 875, in from_quantized

accelerate.utils.modeling.load_checkpoint_in_model(

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/lib/python3.10/site-packages/accelerate/utils/modeling.py”, line 1392, in load_checkpoint_in_model

set_module_tensor_to_device(

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/lib/python3.10/site-packages/accelerate/utils/modeling.py”, line 281, in set_module_tensor_to_device

raise ValueError(

ValueError: Trying to set a tensor of shape torch.Size([108, 640]) in “qzeros” (which has shape torch.Size([432, 640])), this look incorrect.

lufixSch commented 10 months ago

@RBNXI I found this Issue in the ROCm Repo discussing the RX 6600. According to this the RX 6600 should work. Usually for all 6xxx cards gfx1030 works. You could check if your GPU is working by running rocminfo and clinfo. Both commands should mention your GPU

llama.cpp probably runs on CPU because the prebuild python package is only build with CPU support. This is why you need to install it with the command from my guide.

Regarding AutoGPTQ: I think you just copied the last lines not the real error that broke the installation. Therefore I am not sure what the problem is. Maybe check your ROCm Version and change the ROCM_VERSION Variable accordingly. Did you install the rocm-hip-sdk package (or whatever it is called on your distro). What Linux Distro are you running by the way?

I usually run the webui with python server.py and load the models using the GUI. This way the GUI usually chooses the default parameters by itself and it is easier to get it working. I also should note, that I run the newest version from the main branch. If you are using the one-click installer v1.5 your using the old requirements.txt which might explain where llama.cpp with cpu support is installed and why AutoGPTQ kind of works even though you did not install it.

RBNXI commented 10 months ago

I don't have rocminfo installed, should I?. But clinfo shows my GPU indeed.

I'll try to reinstall again and see if it works now.

I did install rocm-hip-sdk. And I'm using Arch.

Also I'm running it in a miniconda environment, is that a problem?

Also the ROCM I have installed is from arch repository, I think it's 5.6.0, is that a problem? if I change the version in the command (pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2 -> pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6.0) it says

ERROR: Could not find a version that satisfies the requirement torchvision (from versions: none) ERROR: No matching distribution found for torchvision

RBNXI commented 10 months ago

I'm trying to install, still errors everywhere. First of all the bitsandbytes installation fails, so I have to use the pip one. Then I try to install AutoGPTQ and can't, gives this error: (tried with both ROCM versions)


(textgen) [ruben@ruben AutoGPTQ]$ ROCM_VERSION=5.6.0 pip install -v .
Using pip 23.2.1 from /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/lib/python3.10/site-packages/pip (python 3.10)
Processing /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/repositories/AutoGPTQ
  Running command python setup.py egg_info
  Trying to compile auto-gptq for RoCm, but PyTorch 2.0.1+cu117 is installed without RoCm support.
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 255
  ╰─> See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/bin/python -c '
  exec(compile('"'"''"'"''"'"'
  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
  #
  # - It imports setuptools before invoking setup.py, to enable projects that directly
  #   import from `distutils.core` to work with newer packaging standards.
  # - It provides a clear error message when setuptools is not installed.
  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize

  try:
      import setuptools
  except ImportError as error:
      print(
          "ERROR: Can not execute `setup.py` since setuptools is not available in "
          "the build environment.",
          file=sys.stderr,
      )
      sys.exit(1)

  __file__ = %r
  sys.argv[0] = __file__

  if os.path.exists(__file__):
      filename = __file__
      with tokenize.open(__file__) as f:
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"

  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/repositories/AutoGPTQ/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' egg_info --egg-base /tmp/pip-pip-egg-info-spo0oczo
  cwd: /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/repositories/AutoGPTQ/
  Preparing metadata (setup.py) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
(textgen) [ruben@ruben AutoGPTQ]$ ROCM_VERSION=5.4.2 pip install -v .
Using pip 23.2.1 from /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/lib/python3.10/site-packages/pip (python 3.10)
Processing /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/repositories/AutoGPTQ
  Running command python setup.py egg_info
  Trying to compile auto-gptq for RoCm, but PyTorch 2.0.1+cu117 is installed without RoCm support.
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 255
  ╰─> See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/bin/python -c '
  exec(compile('"'"''"'"''"'"'
  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
  #
  # - It imports setuptools before invoking setup.py, to enable projects that directly
  #   import from `distutils.core` to work with newer packaging standards.
  # - It provides a clear error message when setuptools is not installed.
  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize

  try:
      import setuptools
  except ImportError as error:
      print(
          "ERROR: Can not execute `setup.py` since setuptools is not available in "
          "the build environment.",
          file=sys.stderr,
      )
      sys.exit(1)

  __file__ = %r
  sys.argv[0] = __file__

  if os.path.exists(__file__):
      filename = __file__
      with tokenize.open(__file__) as f:
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"

  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/repositories/AutoGPTQ/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' egg_info --egg-base /tmp/pip-pip-egg-info-3151ooou
  cwd: /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/repositories/AutoGPTQ/
  Preparing metadata (setup.py) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

What am I doing wrong? I'm following the guide... this is so frustrating... could it be that I have to install ROCM 5.4.2 from some rare repository or compile it myself or something obscure like that? It says pytorch is installed without ROCM support? even if I installed it with pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2

Edit: The 1 and 2 steeps in the install dependencies section are in different orders, if you run pip install requirements_nocuda first, it will install pytorch without ROCM support...

lufixSch commented 10 months ago

I don't have rocminfo installed, should I?. But clinfo shows my GPU indeed.

Yes it should be installed. AutoGPTQ needs it too. Nevertheless you GPU seems to be detected.

I did install rocm-hip-sdk. And I'm using Arch.

Okay that should be fine, but I am running Manjaro which installs the core packages of ROCm automatically in order to set up the GPU. I only needed to add rocm-hip-sdk but maybe on plain Arch you need to install them yourself. In some articles the rocm-opencl-sdk was referenced but I am not sure about this one. Last time I installed other rocm packages than the rocm-hip-sdk I broke everything xD Did you do anything other to get you GPU running on Arch (could be helpful to extend the Guide)

Also I'm running it in a miniconda environment, is that a problem?

No that is great. Running a virtual environment makes it easy to remove everything (python related) if you mess up your dependencies.

RBNXI commented 10 months ago

I installed in Arch the rocm-hip-sdk package too and installed a bunch of dependencies, I think it works fine. And I didn't do anything special for the GPU to work in Arch, maybe you installed the AMD Pro drivers? I use the open source ones with mesa. I'm still trying, now I got passed the error of pytorch not compiled with ROCM , but still get the same error I said before


  Total number of replaced kernel launches: 4
  running clean
  removing 'build/temp.linux-x86_64-cpython-310' (and everything under it)
  removing 'build/lib.linux-x86_64-cpython-310' (and everything under it)
  'build/bdist.linux-x86_64' does not exist -- can't clean it
  'build/scripts-3.10' does not exist -- can't clean it
  removing 'build'
Failed to build auto-gptq
ERROR: Could not build wheels for auto-gptq, which is required to install pyproject.toml-based projects

lufixSch commented 10 months ago

try:
      import setuptools
  except ImportError as error:
      print(
          "ERROR: Can not execute `setup.py` since setuptools is not available in "
          "the build environment.",
          file=sys.stderr,
      )
      sys.exit(1)

@RBNXI That error is straight forward. You seem to be missing setuptools It should be installed automatically when installing the other requirements but maybe try installing it manually: pip install setuptools

With the last Error I am still unsure what the problem is.

No I too used the open source drivers. I just wasn't sure that it is the same on plain Arch. This means installing rocm-hip-sdk should be enough.

RBNXI commented 10 months ago

I have now rocminfo installed but the command rocminfo says it's not found... should I worry?

Edit: Apparently installing it doesn't set the path, solved with export PATH=/opt/rocm/bin:$PATH

lufixSch commented 10 months ago

Edit: The 1 and 2 steeps in the install dependencies section are in different orders, if you run pip install requirements_nocuda first, it will install pytorch without ROCM support...

Yes that could be the case. I did this because I had an error about installing setuptools and installing requirements_nocuda.txt first fixed it. But I think I added --force-reinstall to the torch install command and forgot it in the guide.

Edit: I updated the guide.

I have now rocminfo installed but the command rocminfo says it's not found... should I worry?

That should indeed not happen. It should be located at /opt/rocm/bin which you added to $PATH if you followed my instructions (Edidt: You did not xD). This is also important for other ROCm commands to work. Maybe that was the problem all along

RBNXI commented 10 months ago

Edit: The 1 and 2 steeps in the install dependencies section are in different orders, if you run pip install requirements_nocuda first, it will install pytorch without ROCM support...

Yes that could be the case. I did this because I had an error about installing setuptools and installing requirements_nocuda.txt first fixed it. But I think I added --force-reinstall to the torch install command and forgot it in the guide.

Edit: I updated the guide.

I have now rocminfo installed but the command rocminfo says it's not found... should I worry?

That should indeed not happen. It should be located at /opt/rocm/bin which you added to $PATH if you followed my instructions (Edidt: You did not xD). This is also important for other ROCm commands to work. Maybe that was the problem all along

Trust me I'm following the instructions... The command wasn't working because I installed it and run it in another terminal, since it's a command from the system, not related to the conda environment. The rest of the guided I'm doing it running the correct exports before it. I'll run the installation of AutoGPTQ AGAIN and send you the complete log, maybe it's useful to know what the hell happens... I think I've created more than 50 conda installations already... I'm going to give up soon

RBNXI commented 10 months ago

The error I had was not having gekko installed (is not in requirements file...). Now I installed it and still get the setuptools error (it's installed). Maybe it's incompatible with the python version? I'm using python=3.10.9. Which one do you use? If you have installed ROCM from repositories you should have also 5.6, did you still set 5.4.2 variable?

Edit: Tried python 3.11 and same... setuptools error... but it's surely installed, I can import it manually, what's happening? Screenshot_20230910_132602

Edit: I saw some people using --no-cache-dir when pip install llama cpp, try it, maybe it works for you now

lufixSch commented 10 months ago

The error I had was not having gekko installed (is not in requirements file...). Now I installed it and still get the setuptools error (it's installed).

It does not need to be part requirements file. If gekko or setuptools is a dependency of a package it should usually be installed automatically. I have no Idea why this is not the case (or why it is not found)

I'm sorry, I'm really out of Ideas here.

I'm using python=3.10.9. Which one do you use?

I am running python@3.10.12. I also use venv for environments instead of conda but this shouldn't make a difference

If you have installed ROCM from repositories you should have also 5.6, did you still set 5.4.2 variable?

I used 5.4.2 because torch is installed for version 5.4.2. There is a nightly version of torch for ROCm 5.6 which works better for newer GPUs but as we both are using 6xxx models I don't think it will change something

I saw some people using --no-cache-dir when pip install llama cpp, try it, maybe it works for you now

I tried it. Sadly it did not work. I got it working before but I don't remember, what I did different.

GhostNaN commented 10 months ago

I have a RX 6700XT and I can compile llama-cpp-python with this:

CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DCMAKE_CXX_FLAGS='-fPIC'" FORCE_CMAKE=1 CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ PATH=/opt/rocm/bin:$PATH HSA_OVERRIDE_GFX_VERSION=10.3.0 HCC_AMDGPU_TARGET=gfx1030 pip install --no-cache-dir --force-reinstall llama-cpp-python

But my issue is that I can't offload all the gpu layers because it crashes without errors when I send a prompt. Having all the layers minus 1 (ex. n-gpu-layers = 43/44) works for some reason.

lufixSch commented 10 months ago

Thanks for the hint @GhostNaN. This worked for me too. I will update the Guide

I have problems loading a model at all (with llama.cpp). I never worked with llama.cpp before. Do I have to set some specific parameters or do the default parameter work?

GhostNaN commented 10 months ago

@lufixSch I load the program with this:

HSA_OVERRIDE_GFX_VERSION=10.3.0 python ./server.py --mul_mat_q --threads 6

Otherwise, besides adjusting the n-gpu-layers, all else is default. Make sure you are using the new gguf models as well. Something like this: https://huggingface.co/TheBloke/CodeLlama-7B-GGUF

lufixSch commented 10 months ago

Yes this model works but without defining --n-gpu-layers it runs on CPU (I am not sure if this is expected behavior)

When I define the --n-gpu-layers the program always exits with a segmentation fault. Independent of the specific number I provide (43, 44, 1000000000)

GhostNaN commented 10 months ago

Yes this model works but without defining --n-gpu-layers it runs on CPU (I am not sure if this is expected behavior)

Yes, that's the case. The more you offload the less CPU and more GPU it will use.

When I define the --n-gpu-layers the program always exits with a segmentation fault.

Kinda the case for me as I said. Look at the logs and see how many layers the model has and try to offload all the layers, minus 1. As seen in my terminal: screenshot-2023-09-10-12:58:06

Also make sure you can fit the whole model in VRAM as it will also crash if it can't. Just try with offload like 1 layer to see if it works at all, if all else above fails.

lufixSch commented 10 months ago

Look at the logs and see how many layers the model has and try to offload all the layers, minus 1.

Tried it and with 34/35 it works.

RBNXI commented 10 months ago

If anyone finds what step is missing in the guide that is needed to run and I'm not doing it please tell me, I tried everything already. I don't have anything else to try, so I'm probably missing a package or a dependency that most people have installed as dependencies for other common packages and I don't. Or maybe there's something else to do in the conda environment before the other steps, or something to install with pip that was not mentioned.

containerblaq1 commented 10 months ago

I have a RX 6700XT and I can compile llama-cpp-python with this:
CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DCMAKE_CXX_FLAGS='-fPIC'" FORCE_CMAKE=1 CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ PATH=/opt/rocm/bin:$PATH HSA_OVERRIDE_GFX_VERSION=10.3.0 HCC_AMDGPU_TARGET=gfx1030 pip install --no-cache-dir --force-reinstall llama-cpp-python
But my issue is that I can't offload all the gpu layers because it crashes without errors when I send a prompt. Having all the layers minus 1 (ex. n-gpu-layers = 43/44) works for some reason.

Installing llama-cpp-python in this way and attempting to generate anything returns this error on my machine.

CUDA error 98 at /tmp/pip-install-7pnwqq34/llama-cpp-python_311ef1c7ce5446508ba8bfae88d1677c/vendor/llama.cpp/ggml-cuda.cu:6105: invalid device function /arrow/cpp/src/arrow/filesystem/s3fs.cc:2598: arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit

CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DCMAKE_CXX_FLAGS='-fPIC'" FORCE_CMAKE=1 CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ PATH=/opt/rocm/bin:$PATH HSA_OVERRIDE_GFX_VERSION=11.0.0 HCC_AMDGPU_TARGET=gfx1100 pip install --no-cache-dir --force-reinstall llama-cpp-python

The model does load into VRAM but breaks when trying to get text

EDIT for those who end up in a similar situation:

transformers was version 4.30.2 and I upgraded to 4.32.1 from requirements_nocuda.txt

numpy was 1.25.2 and I downgraded to 1.24.0

fastapi also has a dependency issue with chromadb but I ended up running ooga before I installed that. Works now.

GhostNaN commented 10 months ago

@containerblaq1 You are aware that gfx1030 is for RDNA2 compatible GPUs only? That command won't work on RDNA3 GPUs, you need to recompile. The part of the command you want to change: HSA_OVERRIDE_GFX_VERSION=10.3.0 HCC_AMDGPU_TARGET=gfx1030 To just: HCC_AMDGPU_TARGET=gfx1100 See here: https://www.llvm.org/docs/AMDGPUUsage.html#processors

I had HSA_OVERRIDE_GFX_VERSION because gfx1031 support isn't official in rocBLAS.

containerblaq1 commented 10 months ago

@containerblaq1 You are aware that gfx1030 is for RDNA2 compatible GPUs only? That command won't work on RDNA3 GPUs, you need to recompile. The part of the command you want to change: HSA_OVERRIDE_GFX_VERSION=10.3.0 HCC_AMDGPU_TARGET=gfx1030 To just: HCC_AMDGPU_TARGET=gfx1100 See here: https://www.llvm.org/docs/AMDGPUUsage.html#processors

I had HSA_OVERRIDE_GFX_VERSION because gfx1031 support isn't official in rocBLAS.

CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DCMAKE_CXX_FLAGS='-fPIC'" FORCE_CMAKE=1 CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ PATH=/opt/rocm/bin:$PATH HSA_OVERRIDE_GFX_VERSION=11.0.0 HCC_AMDGPU_TARGET=gfx1100 pip install --no-cache-dir --force-reinstall llama-cpp-python

the above is the command I used; switched them around in my message that I've now edited. Sorry about that!

containerblaq1 commented 10 months ago

Out of curiosity could I get the output of some of y'all's

pip list ?

Please only post package lists you're comfortable with posting. I'm not sure what's available over pip,

GhostNaN commented 10 months ago

pip_list.txt Go nuts @containerblaq1

containerblaq1 commented 10 months ago

@GhostNaN another change. Turns out I didn't enable GPU layers when I ran it last time thinking it was fixed. Actually started working once I added cuda12 to my env and installed nvidia-cudnn

GhostNaN commented 10 months ago

@containerblaq1 There should be nothing from cuda you need as a dependency. That's what's rocm's HIP is for. Also that screenshot shows you are NOT using your GPU. It should say something like this, like I showed earlier: screenshot-2023-09-10-12:58:06

oobabooga / text-generation-webui

AMD thread #3759

AMD Setup Step-by-Step Guide (WIP)

Step 1: Install dependencies (should be similar to the one click installer except the last step)

Step 2: Fix bitsandbytes

Step 3: Install AutoGPTQ

Step 4: Exllama

Step 4.5: ExllamaV2

Step 5: llama-cpp-python

Notes on 7xxx AMD GPUs