Open oobabooga opened 1 year ago
@lufixSch
I would try exllamaV2 (or exllama)
exllama works fine.
Did you make sure, it ran on GPU?
Yes, i set n_gpu_layers and it works fine.
You can get a list of all packages with pip list
pip_list.txt I dont understand what except the three steps in the last comment, what else have to be done?
hello guys, finally got my RX 6800 detected by ooba, going to share the steps I did before installing ooba. Im using Ubuntu 22.04
------------------------------ UNINSTALL PAST ROCM
dpkg -l | grep rocm
#check all of them are in the follow command
sudo apt purge rocm-clang-ocl rocm-cmake rocm-core rocm-dbgapi rocm-debug-agent rocm-dev rocm-device-libs rocm-dkms rocm-gdb rocm-libs rocm-llvm rocm-ocl-icd rocm-opencl rocm-opencl-dev rocm-smi-lib rocm-utils rocminfo
sudo apt autoremove
sudo apt update
--------------------------- INSTALL ROCM
sudo apt update -y && sudo apt upgrade -y
sudo apt-add-repository -y -s -s
sudo apt install -y "linux-headers-$(uname -r)" \
"linux-modules-extra-$(uname -r)"
sudo mkdir --parents --mode=0755 /etc/apt/keyrings
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | \
gpg --dearmor | sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.6/ ubuntu jammy main' \
| sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update -y
sudo apt install -y amdgpu-dkms
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.6 jammy main" \
| sudo tee --append /etc/apt/sources.list.d/rocm.list
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
| sudo tee /etc/apt/preferences.d/rocm-pin-600
sudo apt update -y
sudo apt install -y rocm-dev rocm-libs rocm-hip-sdk rocm-dkms rocm-libs
sudo apt install -y rocm-opencl rocm-opencl-dev
sudo apt install -y hipsparse hipblas hipblas-dev hipcub
sudo apt install -y rocblas rocblas-dev rccl rocthrust roctracer-dev
# COPY AND RUN THE FOLLOW ALL TOGETHER
sudo tee --append /etc/ld.so.conf.d/rocm.conf <<EOF
/opt/rocm/lib
/opt/rocm/lib64
EOF
sudo ldconfig
# update path
echo "PATH=/opt/rocm/bin:/opt/rocm/opencl/bin:$PATH" >> ~/.profile
sudo /opt/rocm/bin/rocminfo | grep gfx
sudo adduser `whoami` video
sudo adduser `whoami` render
# git and git-lfs (large file support
sudo apt install -y git git-lfs
# development tool may be required later...
sudo apt install -y libstdc++-12-dev
# stable diffusion likes TCMalloc...
sudo apt install -y libtcmalloc-minimal4
sudo apt install -y nvtop
sudo apt install -y radeontop rovclock
sudo reboot
Now I can use my amd gpu but I can only use llama.cpp loader with GGUF formats, when I try to use GPTQ format with any supported loader I get the follow error:
2023-10-16 02:22:17 ERROR:Failed to load the model.
Traceback (most recent call last):
File "/media/10TB_HHD/_OOBAGOOBA-AMD/text-generation-webui/modules/ui_model_menu.py", line 201, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name, loader)
File "/media/10TB_HHD/_OOBAGOOBA-AMD/text-generation-webui/modules/models.py", line 79, in load_model
output = load_func_map[loader](model_name)
File "/media/10TB_HHD/_OOBAGOOBA-AMD/text-generation-webui/modules/models.py", line 320, in AutoGPTQ_loader
return modules.AutoGPTQ_loader.load_quantized(model_name)
File "/media/10TB_HHD/_OOBAGOOBA-AMD/text-generation-webui/modules/AutoGPTQ_loader.py", line 57, in load_quantized
model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params)
File "/media/10TB_HHD/_OOBAGOOBA-AMD/text-generation-webui/installer_files/env/lib/python3.10/site-packages/auto_gptq/modeling/auto.py", line 108, in from_quantized
return quant_func(
File "/media/10TB_HHD/_OOBAGOOBA-AMD/text-generation-webui/installer_files/env/lib/python3.10/site-packages/auto_gptq/modeling/_base.py", line 875, in from_quantized
accelerate.utils.modeling.load_checkpoint_in_model(
File "/media/10TB_HHD/_OOBAGOOBA-AMD/text-generation-webui/installer_files/env/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 1414, in load_checkpoint_in_model
set_module_tensor_to_device(
File "/media/10TB_HHD/_OOBAGOOBA-AMD/text-generation-webui/installer_files/env/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 291, in set_module_tensor_to_device
value = value.to(old_value.dtype)
RuntimeError: HIP error: the operation cannot be performed in the present state
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing HIP_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
Any idea what should I do? thanks in advance
Can we make an AMD channel on discord server please? @oobabooga
I think this is why Autogptq not working on rocm5.6
https://github.com/PanQiWei/AutoGPTQ/commit/3de7fbb0d53ccc4516910a7a4000d526c6289d2a
hello
was anybody successful at compiling https://github.com/ROCmSoftwarePlatform/flash-attention?
@fractal-fumbler I haven't tried because the last time I checked they did not yet support flash-attention 2.
There was an open PR for flash-attention 2 but I can't find it (maybe ROCmSoftwarePlatform/flash-attention#14).
@fractal-fumbler I haven't tried because the last time I checked they did not yet support flash-attention 2.
There was an open PR for flash-attention 2 but I can't find it (maybe ROCmSoftwarePlatform/flash-attention#14).
That's the one. I use A1111 but that one does work, albeit with slower speed. It is actively being developed, ROCM's PyTorch repo has some branches being actively developed that add Flash Attention V2 support as well (they don't build)
Hey guys, following the guide on Linux ( Manjaro Plasma 22. - kernel 6.5. ) AMD GPU ( RX480 ) worked without much trouble I'm so happy ! I've struggling with sd-web-ui until now... I'll review my steps and post it later.
my system conf:
# 1 step - kernel
command: `uname -a`
> Linux lu 6.5.5-1-MANJARO #1 SMP PREEMPT_DYNAMIC Sat Sep 23 12:48:15 UTC 2023 x86_64 GNU/Linux
# 1 step - AMD ROCM Info
command: `rocminfo`
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 5 3600 6-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen 5 3600 6-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
*******
Agent 2
*******
Name: gfx803
Uuid: GPU-XX
Marketing Name: AMD Radeon RX 480 Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
# 3 step - exporte path to rocm
command: `export ROCBLAS_TENSILE_LIBPATH=/opt/rocm/lib/rocblas/library/`
> to verify `echo $ROCBLAS_TENSILE_LIBPATH`
>> /opt/rocm/lib/rocblas/library/
the steps:
git clone https://github.com/oobabooga/text-generation-webui.git
cd text-generation-webui
python -m venv venv
source venv/bin/activate
./start_linux.sh
Note:
1 - on llama.cpp enable numa and load to gpu 2 - verify if the GPU is loading with "amdgpu_top" on another terminal ( should be installed already if follow the amd guide )
Hey did anyone of you try TheBloke/deepseek-coder-6.7B-instruct-GGUF yet?
Every time I try to load it it crashes with:
ERROR: byte not found in vocab: '
'
./webui.sh: line 33: 14936 Segmentation fault (core dumped) python server.py $STARTUP_OPTIONS
Im trying to use Oobabooga on my RX 6750 XT, which means I have to use Linux for the first time in my life. After installing it, I
ve followed every tutorial Ive seen about ROMC and Oobabooga on AMD, but in the end, I can
t download a model, getting this error:
Traceback (most recent call last):
File "/home/lewis/text-generation-webui/modules/ui_model_menu.py", line 239, in download_model_wrapper
model, branch = downloader.sanitize_model_and_branch_names(repo_id, None) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lewis/text-generation-webui/download-model.py", line 39, in sanitize_model_and_branch_names
if model[-1] == '/': ~^^^^
IndexError: string index out of range
And even if I try to use a downloaded model, it doesn`t load, giving this error on the terminal:
UserWarning: The value passed into gr.Dropdown() is not in the list of choices. Please update the list of choices to include: llama or set allow_custom_value=True.
warnings.warn(
2023-11-19 12:57:17 INFO:Loading LLaMA2-13B-Tiefighter.Q8_0.gguf...
2023-11-19 12:57:17 INFO:llama.cpp weights detected: models/LLaMA2-13B-Tiefighter.Q8_0.gguf
2023-11-19 12:57:17 INFO:Cache capacity is 0 bytes
ggml_init_cublas: found 2 ROCm devices:
Device 0: AMD Radeon RX 6750 XT, compute capability 10.3
Device 1: AMD Radeon Graphics, compute capability 10.3
Additionally, when I open Oobabooga, it warns me this on the terminal:
UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support.
I cant understand a single thing, so please, explain as if I
m 5-years-old.
THANKS IN ADVANCE!
"UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable."
The warning is normal, the provided bitsandbytes version is currently not compatible with ROCM. AFAIK there is a version for ROCM, but it's currently not used.
@lewis100 Hi, when you say you followed every tutorial. What did you do? Currently the WebUI should work on AMD without any special configuration. Did you use the start_linux.sh
script?
Which Linux Distribution are you using?
I'm pretty sure, the downloading problem is a formatting problem. What are you entering inside of the text fields?
The bitsandbytes warning doesn't need to concern you as long as you use GGUF, GPTQ or AWG Models.
The problem with loading the model is a bit harder. Is that everything you get as output? Does the WebUI still work or does the process exit and you have to start it again.
If we are lucky this is just an issue with full VRAM. The default configuration for LLaMa.cpp is way to high for the 6750 XT. Look at the last line of the Error in the Terminal where you started the WebUI. Does it say something like [Segmentation Fault]
and Full Memory
.
Ps.: you can use Markdown formatting to make your message more readable: Cheat Sheet. For example write error messages in code blocks.
Ive being trying for more than 8 hours, so I
ve installed a lot on stuff on Linuxs terminal out of frustration. I
m using start_linux.sh
and Ubuntu 22.04.3 LTS. Ive loaded
codellama-7b.Q4_0.ggufusing CPU and it worked! So I think it means the problem is with it using the GPU. It
s recognizing it, because the last thing it say on the terminal is:
ggml_init_cublas: found 2 ROCm devices:
Device 0: AMD Radeon RX 6750 XT, compute capability 10.3
Device 1: AMD Radeon Graphics, compute capability 10.3
Does that mean that I have to somehow select which one I should use (Integrated vs dedicated)?
I didnt see [Segmentation Fault] and Full Memory anywhere and I
ve been trying out different configs on LLaMa.cpp, so I suppose it`s not a VRAM problem.
Edit: Im curious about the
Cache capacity is 0 bytes`. Is that normal?
Okay. Yeah installing a lot of things without knowing what they do can lead to a lot of problems. I too had to reinstall my OS once because I broke the GPU driver and was unable to fix it.
I just checked my output when loading a GGUF model. I get the following output for my RX 6750 XT:
2023-11-19 20:22:14 INFO:Loading settings from settings.yaml...
2023-11-19 20:22:14 INFO:Loading openhermes-2.5-mistral-7b.Q6_K.gguf...
2023-11-19 20:22:14 INFO:llama.cpp weights detected: models/openhermes-2.5-mistral-7b.Q6_K.gguf
2023-11-19 20:22:14 INFO:Cache capacity is 0 bytes
ggml_init_cublas: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, compute capability 10.3
I think the problem lies with the second detected ROCm device (Device 0). Are you running a AMD CPU with internal GPU?
From my output the 6750 XT should be recognized as AMD Radeon Graphics
. It seems like your PC is detecting two of the same GPUs.
I am not really sure how to fix that. What does rocminfo
give as output?
If you are running Ubuntu only for this, the easiest way might be to start again with a fresh install of Ubuntu.
Usually you should only need to install rocm (You can check if an installation already exists with rocminfo
). And run the start_linux.sh
script.
Yes, I think maybe the problem is with the integrated GPU on my Ryzen 5600G as well, but it could be with the installation of the ROCm. Here is the info (I have no idea what the Agent 3 is):
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 5 5600G with Radeon Graphics
Uuid: CPU-XX
Marketing Name: AMD Ryzen 5 5600G with Radeon Graphics
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3900
BDFID: 0
Internal Node ID: 0
Compute Unit: 12
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 15710808(0xefba58) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 15710808(0xefba58) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 15710808(0xefba58) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx1030
Uuid: GPU-XX
Marketing Name: AMD Radeon RX 6750 XT
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 3072(0xc00) KB
L3: 98304(0x18000) KB
Chip ID: 29663(0x73df)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2880
BDFID: 768
Internal Node ID: 1
Compute Unit: 40
SIMDs per CU: 2
Shader Engines: 2
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 12566528(0xbfc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1030
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*******
Agent 3
*******
Name: gfx1030
Uuid: GPU-XX
Marketing Name:
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 2
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 1024(0x400) KB
Chip ID: 5688(0x1638)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1900
BDFID: 2560
Internal Node ID: 2
Compute Unit: 7
SIMDs per CU: 4
Shader Engines: 1
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 524288(0x80000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1030
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
Yes I am pretty sure the third one should not be there. But the only thing I could suggest is a clean reinstall.
Additionally: I just looked up how to install ROCm on Ubuntu and the official ROCm documentation has a banner on the top of its page which says the following:
ROCm currently doesn’t support integrated graphics. Should your system have an AMD IGP installed, disable it in the BIOS prior to using ROCm. If the driver can enumerate the IGP, the ROCm runtime may crash the system, even if told to omit it via HIP_VISIBLE_DEVICES.
This may be the original reason why you faced issues during installation.
Hhhmm I still can't run any awg or gptq models on my gpu, only ggff versions. Im using ubuntu and the follow ooba version
git show ae8cd449ae3e0236ecb3775892bb1eea23f9ed68
git describe --tags
snapshot-2023-10-15-12-gae8cd44
AWQ is CUDA-only afaik, but AutoGPTQ, exllama (v1 at least), and as a last resort GPTQ-for-LLaMa should all work. Do you have ROCm 5.6 installed?
Sup bro, I have installed rocm 5.6 check here https://github.com/oobabooga/text-generation-webui/issues/3759#issuecomment-1763973978
This week going to do a fresh install of ooba and check if other models and transformers loader working with amd
ExllamaV2 has ROCm support out of the box. It should work too.
Hello !
As anyone being able to run this project using docker ?
I'm trying to adapt the dockerfile by using rocm/pytorch instead of the nvidia/cuda:12.1.0-devel-ubuntu22.04 base iamge but I end up with the following error :
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [10 lines of output]
/app/venv/lib/python3.9/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),
Traceback (most recent call last):
File "git submodule update --init --recursive
?
[end of output]
Cutlass is for NVIDIA right ?
Shouldn't the pip install -r requirement_amd.txt be sufficient to dodge this issue ?
Hey there, small thing to add. If you encounter llama.cpp using the wrong GPU as your main device, this environment variable worked for re-ordering the devices for me before running text-generation-webui:
export CUDA_VISIBLE_DEVICES=1,0
I'm not sure if it'll work to exclude a device from use but will test later.
@smCloudInTheSky I never tried.
You might need to adjust a lot of things because the Dockerfile looks quite old. You should in any case use the requirements_amd.txt
otherwise it wont work at all.
I would also recommend removing the GPTQ-for-LLaMa part (for now) as it hasn't worked with AMD for some time.
Maybe you can open a Pull request for adding a AMD compatible Dockerfile. I would be happy to help building (or at least testing) it.
@containerblaq1 Thanks that's really helpful. I will be getting a second GPU in the next days and was already searching for something like that.
@lewis100 Maybe this also helps with your problem of multiple detected GPUs
@lufixSch In the end I found something that builded ! skipping GPTQ and using requiremlents_amd worked :+1: However when trying to run the docker I end up with this :
docker run text-generation-webui-text-generation-webui
/app/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py:33: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
2023-11-21 07:55:00 INFO:Loading the extension "gallery"...
bin /app/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
Traceback (most recent call last):
File "/app/server.py", line 236, in <module>
create_interface()
File "/app/server.py", line 117, in create_interface
ui_parameters.create_ui(shared.settings['preset']) # Parameters tab
File "/app/modules/ui_parameters.py", line 11, in create_ui
generate_params = presets.load_preset(default_preset)
File "/app/modules/presets.py", line 49, in load_preset
with open(Path(f'presets/{name}.yaml'), 'r') as infile:
FileNotFoundError: [Errno 2] No such file or directory: 'presets/simple-1.yaml'
Even though my docker-compose is the one from the project it seems he didn't copy the folders properly.
cat docker-compose.yml
version: "3.3"
services:
text-generation-webui:
build:
context: .
args:
# specify which cuda version your card supports: https://developer.nvidia.com/cuda-gpus
TORCH_CUDA_ARCH_LIST: ${TORCH_CUDA_ARCH_LIST:-7.5}
WEBUI_VERSION: ${WEBUI_VERSION:-HEAD}
env_file: .env
ports:
- "${HOST_PORT:-7860}:${CONTAINER_PORT:-7860}"
- "${HOST_API_PORT:-5000}:${CONTAINER_API_PORT:-5000}"
stdin_open: true
tty: true
volumes:
- ./characters:/app/characters
- ./extensions:/app/extensions
- ./loras:/app/loras
- ./models:/app/models
- ./presets:/app/presets
- ./prompts:/app/prompts
- ./softprompts:/app/softprompts
- ./training:/app/training
- ./cloudflared:/etc/cloudflared
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0']
capabilities: [gpu]
Here is my current Dockerfile for curiosity. When it'll work I plan to write a script and propose it to use either mine or the original Dockerfile depending on the detected hardware (for linux at least)
cat Dockerfile
FROM rocm/dev-ubuntu-22.04:latest
LABEL maintainer="Your Name <your.email@example.com>"
LABEL description="Docker image for GPTQ-for-LLaMa and Text Generation WebUI"
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked,rw apt-get update && \
apt-get install --no-install-recommends -y python3-dev libportaudio2 libasound-dev git python3 python3-pip make g++ ffmpeg && \
rm -rf /var/lib/apt/lists/*
RUN --mount=type=cache,target=/root/.cache/pip,rw pip3 install virtualenv
RUN mkdir /app
WORKDIR /app
ARG WEBUI_VERSION
RUN test -n "${WEBUI_VERSION}" && git reset --hard ${WEBUI_VERSION} || echo "Using provided webui source"
# Create virtualenv
RUN virtualenv /app/venv
RUN --mount=type=cache,target=/root/.cache/pip,rw \
. /app/venv/bin/activate && \
python3 -m pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/rocm571/ && \
python3 -m pip install --upgrade pip setuptools wheel ninja && \
python3 -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.7
# Install main requirements
COPY requirements_amd.txt /app/requirements_amd.txt
RUN --mount=type=cache,target=/root/.cache/pip,rw \
. /app/venv/bin/activate && \
python3 -m pip install -r requirements_amd.txt
COPY . /app/
RUN cp /app/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121.so /app/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
# Install extension requirements
RUN --mount=type=cache,target=/root/.cache/pip,rw \
. /app/venv/bin/activate && \
for ext in /app/extensions/*/requirements.txt; do \
cd "$(dirname "$ext")"; \
python3 -m pip install -r requirements.txt; \
done
ENV CLI_ARGS=""
EXPOSE ${CONTAINER_PORT:-7860} ${CONTAINER_API_PORT:-5000} ${CONTAINER_API_STREAM_PORT:-5005}
CMD . /app/venv/bin/activate && python3 server.py ${CLI_ARGS}
How do I use "export CUDA_VISIBLE_DEVICES=1,0"? I'm a total newbie at this.
I've reinstalled Ubuntu 22.04.3 LTS and made somethings differently. Installing Oobabooga through Pinokio helped a lot, and it's not replying with a error anymore, but it's not using the GPU to offload.
I've even tried Kobold, but it doesn't recognize the GPU as well. I'm starting to feel that it's impossible to run a model with a RX 6750 XT.
Has anyone ever done that?
Never mind. I've randomly opened one of the Kobolds I've got laying around (koboldcpp_nocuda) and it totally worked on windows with the CLBlast. I'm so tired that I'll give up on Oobabooga for now and stick with Kobold.
but it's not using the GPU to offload
This is the default behaviour of lama.cpp. You need to increase the gpu offload slider in the UI before loading the model
Has anyone ever done that?
Yes, I‘m running a RX 6750 XT. But I run Manjaro as OS thats why I can’t really help you with the driver setup on Ubuntu
How do I use "export CUDA_VISIBLE_DEVICES=1,0"? I'm a total newbie at this.
You'll enter this in your terminal. From your output above, you would probably want to use export CUDA_VISIBLE_DEVICES=1
I've reinstalled Ubuntu 22.04.3 LTS and made somethings differently. Installing Oobabooga through Pinokio helped a lot, and it's not replying with a error anymore, but it's not using the GPU to offload.
Try increasing the gpu_layers
slider in the UI.
I've even tried Kobold, but it doesn't recognize the GPU as well. I'm starting to feel that it's impossible to run a model with a RX 6750 XT.
It should work fine. If you're looking to give it another go, check out the Discord!
Has anyone ever done that?
I've used it on 6800XT, 7900XTX, 7900XT. Works well.
I've tried that on the terminal and didn't work. I'm using the slider as well. I didn't know about Discord. I'll check it out.
I've used it on 6800XT, 7900XTX, 7900XT. Works well.
@containerblaq1 Is there anything special I should know about setting up the 7900XTX or running the gui with multiple GPUs? I just got a 7900XT and want to add it to my system.
Ive tried to make it work one last time and it turns out it wasn
t offloading the GPU because the following command wasn`t executed properly:
pip install torch==1.13.1+rocm5.2 torchvision==0.14.1+rocm5.2 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/rocm5.2
I had to use a previous version of Python for it to work. (Python 3.8). I hope this info helps someone. It`s finally offloading, but now I have a new error to deal with:
CUDA error 98 at /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-oython-cuBLAS-wheels/vendor/llma.cpp/ggml-cuda.cu:6951: invalid device function current device: 0 /arrow/cpp/src/arrow/filesystem/s3fs.cc:2904: arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit
It happens after my first prompt. Ive tried
export CUDA_VISIBLE_DEVICES=1and
HIP_VISIBLE_DEVICES=1`, but no luck
I've used it on 6800XT, 7900XTX, 7900XT. Works well.
@containerblaq1 Is there anything special I should know about setting up the 7900XTX or running the gui with multiple GPUs? I just got a 7900XT and want to add it to my system.
I believe I just had to make hip
again.
@lewis100
Try:
CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DAMDGPU_TARGETS=gfx1032" CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ pip install llama_cpp_python --force-reinstall --no-cache-dir
Edit:
The above command came from this comment:
https://github.com/oobabooga/text-generation-webui/issues/3759#issuecomment-1735247373
I receiving the following error when I try that:
Downloading typing_extensions-4.8.0-py3-none-any.whl (31 kB)
Building wheels for collected packages: llama_cpp_python
Building wheel for llama_cpp_python (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building wheel for llama_cpp_python (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [48 lines of output]
*** scikit-build-core 0.6.1 using CMake 3.27.7 (wheel)
*** Configuring CMake...
2023-11-23 11:01:01,922 - scikit_build_core - WARNING - libdir/ldlibrary: /home/lewis/miniconda3/lib/libpython3.11.a is not a real file!
2023-11-23 11:01:01,922 - scikit_build_core - WARNING - Can't find a Python library, got libdir=/home/lewis/miniconda3/lib, ldlibrary=libpython3.11.a, multiarch=x86_64-linux-gnu, masd=None
loading initial cache file /tmp/tmp0rq5j8qe/build/CMakeInit.txt
-- The C compiler identification is Clang 17.0.0
-- The CXX compiler identification is Clang 17.0.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/rocm/llvm/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - failed
-- Check for working CXX compiler: /opt/rocm/llvm/bin/clang++
-- Check for working CXX compiler: /opt/rocm/llvm/bin/clang++ - broken
CMake Error at /tmp/pip-build-env-ag82x3sc/normal/lib/python3.11/site-packages/cmake/data/share/cmake-3.27/Modules/CMakeTestCXXCompiler.cmake:60 (message):
The C++ compiler
"/opt/rocm/llvm/bin/clang++"
is not able to compile a simple test program.
It fails with the following output:
Change Dir: '/tmp/tmp0rq5j8qe/build/CMakeFiles/CMakeScratch/TryCompile-9Jqhb1'
Run Build Command(s): /tmp/pip-build-env-ag82x3sc/normal/lib/python3.11/site-packages/ninja/data/bin/ninja -v cmTC_665af
[1/2] /opt/rocm/llvm/bin/clang++ -MD -MT CMakeFiles/cmTC_665af.dir/testCXXCompiler.cxx.o -MF CMakeFiles/cmTC_665af.dir/testCXXCompiler.cxx.o.d -o CMakeFiles/cmTC_665af.dir/testCXXCompiler.cxx.o -c /tmp/tmp0rq5j8qe/build/CMakeFiles/CMakeScratch/TryCompile-9Jqhb1/testCXXCompiler.cxx
[2/2] : && /opt/rocm/llvm/bin/clang++ CMakeFiles/cmTC_665af.dir/testCXXCompiler.cxx.o -o cmTC_665af && :
FAILED: cmTC_665af
: && /opt/rocm/llvm/bin/clang++ CMakeFiles/cmTC_665af.dir/testCXXCompiler.cxx.o -o cmTC_665af && :
ld.lld: error: unable to find library -lstdc++
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.
CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
CMakeLists.txt:3 (project)
-- Configuring incomplete, errors occurred!
*** CMake configuration failed
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for llama_cpp_python
Failed to build llama_cpp_python
ERROR: Could not build wheels for llama_cpp_python, which is required to install pyproject.toml-based projects
From the Wiki:
Requires ROCm SDK 5.4.2 or 5.4.3 to be installed. Some systems may also need:
sudo apt-get install libstdc++-12-dev
I believe I just had to make hip again
@containerblaq1 What do you mean with that?
Are you able to split Models between (different) GPUs?
@lufixSch Yup! We briefly tested this a bit ago.
https://github.com/oobabooga/text-generation-webui/issues/3759#issuecomment-1741872811
Edit:
make hip ROCM_TARGET=gfx1100
@containerblaq1
Where should I use
CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DAMDGPU_TARGETS=gfx1032" CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ pip install llama_cpp_python --force-reinstall --no-cache-dir
Is that on the main terminal? It's my first time using Linux. If so, I've tried and I keep getting CUDA error 98. My GPU is a RX 6750 XT, so I think it's a gfx1031, so I've tried it as well, but I'm still getting the same error.
@containerblaq1
Where should I use
CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DAMDGPU_TARGETS=gfx1032" CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ pip install llama_cpp_python --force-reinstall --no-cache-dir
Is that on the main terminal? It's my first time using Linux. If so, I've tried and I keep getting CUDA error 98. My GPU is a RX 6750 XT, so I think it's a gfx1031, so I've tried it as well, but I'm still getting the same error.
The command is run in the terminal so that llama_cpp is built properly.
That GPU seems to be 1032. Please check here:
https://rocm.docs.amd.com/en/latest/release/windows_support.html
Please post the output of pip list
in your terminal after making sure you are in the correct Conda environment.
It may be better at this point to recreate your conda environment from scratch
Via Google, use the search "conda destroy environment". FreeCodeCamp has a great tutorial on how to remove the conda environment.
Here is documentation on how to remove ROCm:
https://rocm.docs.amd.com/en/latest/deploy/linux/os-native/uninstall.html
Double Edit: Didn't register the XT on your card. Your ROCm output states gfx1030 -
Your ROCM output states gfx1030 -
Name: gfx1030
Uuid: GPU-XX
Marketing Name: AMD Radeon RX 6750 XT
So use:
CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DAMDGPU_TARGETS=gfx1030" CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ pip install llama_cpp_python --force-reinstall --no-cache-dir
Please post llama.cpp's output before using the chat. Look for these lines:
llm_load_print_meta: general.name = oobabooga_codebooga-34b-v0.1
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: PAD token = 2 '</s>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.14 MB
llm_load_tensors: using ROCm for GPU acceleration
ggml_cuda_set_main_device: using device 0 (Radeon RX 7900 XTX) as main device
llm_load_tensors: mem required = 22733.87 MB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/51 layers to GPU
llm_load_tensors: VRAM used: 0.00 MB
@lewis100 You said that you had tried
export CUDA_VISIBLE_DEVICES=1andHIP_VISIBLE_DEVICES=1
Have you tried HIP_VISIBLE_DEVICES=0
?
Further up in the thread you posted the following:
Device 0: AMD Radeon RX 6750 XT, compute capability 10.3
Device 1: AMD Radeon Graphics, compute capability 10.3
From this it looks like your iGPU is device 1 and 6750 XT is device 0. It could be worth a try at least if you haven't already.
I've tried HIP_VISIBLE_DEVICES=0
as well and even disabled the iGPU.
It was not even loading the model, but then I've duplicated TensileLibrary_lazy_gfx1030
and renamed it TensileLibrary_lazy_gfx1031
, and it loaded, but with CUDA error 98
after my first prompt.
I wonder if I'm managing those environments properly, because I've never used Linux before.
@lewis100
Might be better to troubleshoot this in real time using some messenger. CUDA error 98 is the same error I got when llama.cpp was not properly being built which was fixed in this comment: https://github.com/oobabooga/text-generation-webui/issues/3759#issuecomment-1735247373
Where is the Name:
field read from to be presented in rocminfo
? Does rocminfo
show gfx1030 for you as well, @lufixSch ?
I'm unstabletable0321 on Discord.
@lufixSch I read up earlier in the chat and was able to run CodeBooga. Still having issues with that one?
Does rocminfo show gfx1030 for you as well, @lufixSch ?
I‘m not able to verify this right now but I‘m pretty sure it does
Didn't register the XT on your card. Your ROCm output states gfx1030
1030, 1031 and 1032 are basically the same architecture thats why you can just use 1030.
Can integrated graphics be used. In my case AMD Ryzen 7 5800H with Radeon Graphics. I get output from rocminfo... http://ix.io/4MBL
@DocMAX No because ROCm does not support them. From the ROCm Documentation:
ROCm currently doesn’t support integrated graphics. Should your system have an AMD IGP installed, disable it in the BIOS prior to using ROCm. If the driver can enumerate the IGP, the ROCm runtime may crash the system, even if told to omit it via HIP_VISIBLE_DEVICES
Have you considered supporting AMD graphics cards on Windows through pytorch-directml? I am really looking forward to it.
Now that I had some time setting up my new 7900 XTX. Here are some of my findings and problems with running a dual GPU setup (6750 XT + 7900 XTX). Maybe it helps some of you and maybe some of you can help me.
Even though I already had a 6750 XT, I had to remove the GPU drivers and reinstall them. Otherwise the setup was easy. The text-generation-webui started directly without a need to reinstall anything
llama.cpp: Works
Llama cpp worked out of the box but I have the issue that after loading the model the GPU usage goes up to 100% in idle. This seems to be an ROCm bug and is already discussed in RadeonOpenCompute/ROCK-Kernel-Driver#153. There is a "workaround" by setting sched_policy=2
(amd driver parameter).
This mostly worked for me but I still sometimes get the issue that the usage goes up to 100% and only a restart fixes it. Updating to the nightly version of torch for ROCm 6.7 reduced (maybe solved) this issue.
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.7 --no-cache-dir --force-reinstall
exllamaV2: Doesn't Work
When I click on load nothing happens other than the output Loading <model name>
.
I need to restart the WebUI to load any other model.
exllama: Doesn't Work
It starts loading the model but crashes when its nearly done.
After updating torch (see llama.cpp) the error disappeared and I get a similar behavior as with exllamaV2
AutoGPTQ: Doesn't Work
Same as exllama
You can not run a 6xxx and 7xxx GPU and split the model between them (Maybe AMD will add official support for 6xxx to ROCm in the future which would probably make this possible). You need to select the right GPU with HIP_VISIBLE_DEVICES
or CUDA_VISIBLE_DEVICES
.
Interestingly for me the ids where different from the output of rocm-smi
. I used DRI_PRIME=<gpu id> glxinfo | grep "OpenGL renderer"
to find the right id for each GPU but this command might be exclusive to Arch (or even Manjaro)
This thread is dedicated to discussing the setup of the webui on AMD GPUs.
You are welcome to ask questions as well as share your experiences, tips, and insights to make the process easier for all AMD users.