oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.15k stars 5.26k forks source link

How to get this working with an AMD GPU? #62

Closed robonxt-ai closed 9 months ago

robonxt-ai commented 1 year ago

EDIT: Woops, accidently hit enter without saying anything in the issue.

Is there a way to install this as a UI on a existing KoboldAI or TavernAI system?

With this webui installer, the backend fails on my AMD machine, but if I install stock KoboldAI, it works just fine.

oobabooga commented 1 year ago

The issue is installing pytorch on an AMD GPU then. I don't know because I don't have an AMD GPU, but maybe others can help.

Ph0rk0z commented 1 year ago

I just moved from an AMD card and had stable diffusion running on it. To get it working I had to install the AMD driver and ROCM. Then I installed the ROCM specific versions of pytorch and everything else. Vega stuff is most supported so I hope you have that because my RX580 was a pain and needed patches. Look for guides on stable diffusion and AMD. There were a bunch on the subreddit.

robonxt-ai commented 1 year ago

The issue is installing pytorch on an AMD GPU then. I don't know because I don't have an AMD GPU, but maybe others can help.

@oobabooga Regarding that, since I'm able to get TavernAI and KoboldAI working in CPU mode only, is there ways I can just swap the UI into yours, or does this webUI also changes the underlying system (If I'm understanding it properly)?

Look for guides on stable diffusion and AMD. There were a bunch on the subreddit.

@Ph0rk0z So in other words, if there are instructions for my 6600 card to run stable diffusion, I should be able to set it up other pytorch projects?

I'll give it a go though. Thank you both!

oobabooga commented 1 year ago

@robonxt-ai the backend can't be swapped. TavernAI is a frontend project that uses the kobold API, while this is is a web UI + backend in a single script.

Ph0rk0z commented 1 year ago

Yes.. Just use the virtual environment that works for stable diffusion. It uses mostly the same things. There is some stuff you have to do like set the environment variable for your card to be detected properly. But that is all in the guides.

Spencer-Dawson commented 1 year ago

I was trying to install this with ROCm last night and got nowhere. Interestingly I had little to no trouble getting Kobald and TavernAI up and running. I also have Stable Diffusion working. So obviously I have ROCm installed and working just fine outside of this particular environment. This is on Ubuntu 20.04 with a 6950xt.

I bet if I spennt enough time comparing this to kobald and reading pytorch documentation I can get it working, but ROCm is such a pain to setup(at least for ubuntu) that I'm not sure it's worth the headache of supporting(for other people). If I do bother to get it working for me I'll at least let you guys know the workaround.

In general I'd argue AMD has better Linux drivers, but when it comes to AI nonsense the further you get from the typical data scientist workstation (ubuntu running cuda on a nvidia card for example) the harder it is to keep up with the latest and greatest AI tools. So just my 2 cents if you're thinking about buying AMD for ai use just don't bother.

Just some notes for anyone else bored enough to work on this:

Looking at Kobald's source code briefly I notice it lists the following in it's rocm.yml file under pip dependencies "--extra-index-url https://download.pytorch.org/whl/rocm5.1.1" and "torch==1.12.1+rocm5.1.1" Kobald's python server lists 21 references to torch.cuda and kobald is working fine for me regardless so I suspect it's just an issue of getting the "right version" of pytorch installed and configured(maybe I need to set some envars?) in conda and then it should "just work"

Looking at automatic11's stable-diffusion-webui's source code I don't see much other than that the install for torch rocm includes the following argument "pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm5.2"

Both KobaldAI and stable-diffusion-webui have some pretty layered packaging(kobald appears to use docker compose somehow and stable-diffusion-webui uses micromamba) so I may be missing something.

EDIT: I have it working Since I already had ROCm working on my machine and I just needed to fix the pip install for pytorch I just ran the following two commands and it's working now. 'pip3 uninstall torch torchvision torchaudio' 'pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/rocm5.2'

Re: where the second command came from it's from pytorch's website(https://pytorch.org/) when you give it the arguments for installing pytorch ROCm in pip

Summary for anyone troubleshooting this for their own use: Step1 install and get ROCm working. Step2 In in the README where it says "If you have an AMD GPU, you should install the ROCm version of pytorch instead." The command for that is 'pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/rocm5.2'

Spencer-Dawson commented 1 year ago

I made a PR for this for the README. https://github.com/oobabooga/text-generation-webui/pull/68

Spencer-Dawson commented 1 year ago

I found another AMD install issue. bitsandbytes which is python dependency used for loading models in with 8 bit precision doesn't support ROCm. There is a fork on github which claims to support ROCm, but no pip package for it. I haven't tried it yet, but if I do get it working I might update the installer script

FatCache commented 1 year ago

@Spencer-Dawson I did the setup but on Windows WSL2 Ubuntu. I get the following error:

` /home/sandbox_ml/text-generation-webui# python3 server.py Loading opt-1.3b... Traceback (most recent call last): File "/home/sandbox_ml/text-generation-webui/server.py", line 173, in shared.model, shared.tokenizer = load_model(shared.model_name) File "/home/sandbox_ml/text-generation-webui/modules/models.py", line 47, in load_model model = AutoModelForCausalLM.from_pretrained(Path(f"models/{shared.model_name}"), low_cpu_mem_usage=True, torch_dtype=torch.bfloat16 if shared.args.bf16 else torch.float16).cuda() File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 749, in cuda return self._apply(lambda t: t.cuda(device)) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 641, in _apply module._apply(fn) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 641, in _apply module._apply(fn) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 641, in _apply module._apply(fn) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 664, in _apply param_applied = fn(param) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 749, in return self._apply(lambda t: t.cuda(device)) File "/usr/local/lib/python3.10/dist-packages/torch/cuda/init.py", line 229, in _lazy_init torch._C._cuda_init() RuntimeError: Unexpected error from hipGetDeviceCount(). Did you run some cuda functions before calling NumHipDevices() that might have already set an error? Error 101: hipErrorInvalidDevice

`

Spencer-Dawson commented 1 year ago

I know it's not common knowledge for some reason, but ROCm itself is not supported in windows. I've seen claims of people getting it working, but even if you got it working via some hack I suspect it would be such a niche poorly supported environment that you would have a hard time keeping it that way.

This is from the pytorch installation instructions page. For example. Notice it says ROCm doesn't support windows. image

And if you look at the AMD ROCm install instructions(which are pretty terrible imo) It only shows RHEL, SLES, and Ubuntu as supported https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.4/page/Introduction_to_ROCm_Installation_Guide_for_Linux.html

So your options as best as I can tell are the following.

FatCache commented 1 year ago

@Spencer-Dawson

Although ROCm is not officially supported in Window, there is an option to run Ubuntu subsystem within Windows. I am not familiar with the technical details but WSL2 allows to run Ubuntu within Windows with near native performance.

I don't know why this did not work out of the box although, just thinking about it - it should. What I did in my try was to clone text-generation-webui inside the Ubuntu subsystem and compiled it. Because I have AMD GPU, I followed the instructions provided in the ReadME.

The server.py does not work, I believe in GPU mode. However, it works fine if I had --cpu flag.

Any suggestion, ideas why GPU mode did not work in GPU mode? Does the log I shared above provide any insight?

Spencer-Dawson commented 1 year ago

For your first(implied) question. For GPU support you need to use a library which supports the feature(in this case pytorch) and that library depends on a lower level high performance computing library which in this case is CUDA for nvidia GPUs or ROCm (a library maintained by AMD that can directly run code compiled to execute on CUDA). AMD's windows driver does not support AMD's ROCm library. I have no idea how WSL interacts with AMD's GPU drivers so I can't elaborate more intelligently except to say that ROCm is not officially supported in windows or in WSL.

CPU mode works because it isn't using CUDA(AFAIK) it's effectively using a different backend for pytorch that doesn't rely on gpu library or driver support.

Your log tells me that pytorch is trying to initialize a CUDA compatible GPU and isn't finding one. That's about all I know regarding that error. I have seen that error in linux with an AMD card without the ROCm version of torch installed, but I don't think my solution for that(uninstalling torch and then installing torch from the ROCm library) will help you in this case.

FatCache commented 1 year ago

For your first(implied) question. For GPU support you need to use a library which supports the feature(in this case pytorch) and that library depends on a lower level high performance computing library which in this case is CUDA for nvidia GPUs or ROCm (a library maintained by AMD that can directly run code compiled to execute on CUDA). AMD's windows driver does not support AMD's ROCm library. I have no idea how WSL interacts with AMD's GPU drivers so I can't elaborate more intelligently except to say that ROCm is not officially supported in windows or in WSL.

Noted. Thank you for the great response. The behavior that you describe is what I am seeing. ROCm although it is installed inside the WSL2, it does not get picked. After further diving into the literature briefly, the issue is exactly this. WSL2 AMD GPU at the end of the day needs to interact with Windows Graphic Library and does not work. I guess this is why Microsoft implemented DirectML that abstract access to GPU and allow PyTorch to directly work on any GPU.

Although according the docs, to port an existing PyTorch code to work with DirectML is straightforward, it is still sketchy because what if text_generation_webui has a dependency on a library that requires CUDA and not supported to work on DirectML.

oobabooga commented 1 year ago

One idea is to try looking for tutorials on setting up Stable Diffusion for AMD GPUs on Windows. I think that the dependencies are kind of the same.

Enferlain commented 1 year ago

You can use sd on amd windows now with directml, maybe the same could be done here

Ph0rk0z commented 1 year ago

FYI, Rocm also fails if you don't have PCIE3.0 atomics. So I am screwed on linux for my old box.

If you want bits&bytes for AMD.. it got set up here: https://git.ecker.tech/mrq/ai-voice-cloning/issues/25 Just follow the thread and do like they do.

Spencer-Dawson commented 1 year ago

I was working on integrating compiling/installing bitsandbytes-rocm based on @Ph0rk0z thread link and while I succeeded at that it is failing at runtime for me. I'll probably take another crack at it later, but here is some notes in case anyone wants to try to install it manually.

NOTE: Using ubuntu 220.04 with amd rocm already installed.

First problem bits-and-bytes rocm isn't a pip package so you have to compile it yourself using the hip compiler plugin installed with rocm. I had to re-run amdgpu-install with the following arguments to get that part to work. sudo amdgpu-install --usecase=hiplibsdk,rocm I also had to install the libstdc++-12-dev package for a cmath.h dependency sudo apt install libstdc++-12-dev I already had this installed, but I assume from experience it is also needed sudo apt install build-essential

second issue actually compiling/installing follows these steps from within the conda runtime environment

    echo "Installing ROCm compatible version of bitsandbytes..."
    #uninstall bitsandbytes if it's already installed(assuming wrong version)
    echo "Uninstalling old version of bitsandbytes if installed..."
    pip3 uninstall bitsandbytes
    pip3 uninstall bitsandbytes-rocm
    echo "Installing ROCm compatible version of bitsandbytes..."
    #there are other versions of this on github, but this one doesn't throw linker errors for me
    git clone https://git.ecker.tech/mrq/bitsandbytes-rocm
    cd bitsandbytes-rocm || exit
    # set CUDA_VERSION to gfx1030 if not set. This is the default for 6XXX series cards
    CUDA_VERSION=${CUDA_VERSION:-gfx1030}
    export CUDA_VERSION
    make hip
    python setup.py install
    cd ..
    rm -rf bitsandbytes-rocm

Third issue text generation is giving me this error

Loading pygmalion-350m...
Loaded the model in 3.37 seconds.
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
  0%|                                                                                                                                                                                     | 0/26 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "/home/spencer/anaconda3/lib/python3.9/site-packages/gradio/routes.py", line 374, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/spencer/anaconda3/lib/python3.9/site-packages/gradio/blocks.py", line 1017, in process_api
    result = await self.call_function(
  File "/home/spencer/anaconda3/lib/python3.9/site-packages/gradio/blocks.py", line 849, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/spencer/anaconda3/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/spencer/anaconda3/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/spencer/anaconda3/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/spencer/anaconda3/lib/python3.9/site-packages/gradio/utils.py", line 453, in async_iteration
    return next(iterator)
  File "/home/spencer/Downloads/oobabooga-linux-tst/oobabooga/text-generation-webui/modules/chat.py", line 163, in cai_chatbot_wrapper
    for _history in chatbot_wrapper(text, max_new_tokens, do_sample, temperature, top_p, typical_p, repetition_penalty, top_k, min_length, no_repeat_ngram_size, num_beams, penalty_alpha, length_penalty, early_stopping, name1, name2, context, check, chat_prompt_size, chat_generation_attempts):
  File "/home/spencer/Downloads/oobabooga-linux-tst/oobabooga/text-generation-webui/modules/chat.py", line 119, in chatbot_wrapper
    for reply in generate_reply(f"{prompt}{' ' if len(reply) > 0 else ''}{reply}", max_new_tokens, do_sample, temperature, top_p, typical_p, repetition_penalty, top_k, min_length, no_repeat_ngram_size, num_beams, penalty_alpha, length_penalty, early_stopping, eos_token=eos_token, stopping_string=f"\n{name1}:"):
  File "/home/spencer/Downloads/oobabooga-linux-tst/oobabooga/text-generation-webui/modules/text_generation.py", line 187, in generate_reply
    output = eval(f"shared.model.generate({', '.join(generate_params)}){cuda}")[0]
  File "<string>", line 1, in <module>
  File "/home/spencer/anaconda3/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/spencer/anaconda3/lib/python3.9/site-packages/transformers/generation/utils.py", line 1452, in generate
    return self.sample(
  File "/home/spencer/anaconda3/lib/python3.9/site-packages/transformers/generation/utils.py", line 2504, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Given the specific additional compilation requirements it's probably a bad idea to integrate compiling bitsandbytes into the installer at this time, but I wrote a seperate installer if someone wants to try installing/troubleshooting this in ubuntu 22.04 https://gist.github.com/Spencer-Dawson/198a1938cc44405d82727971bba76bc6

Spencer-Dawson commented 1 year ago

For anyone wondering how to install ROCm in Ubuntu 22.04 running nativeley(not wsl or in a vm or anything weird like that). I wrote a quick installer script. Don't assume it's bug free. I hacked this together in about an hour just using the ROCm installation readme and github copilot. https://gist.github.com/Spencer-Dawson/7fdb5dd09461b6cece8a99537f381e44

Note: This doesn't install text-genneration-ui it's just an install script for the ROCm dependency which isn't accounted for by oobabooga's installer as it's a system specific dependency.

ttio2tech commented 1 year ago

I made a detailed video guide for AMD GPU on Ubuntu for llama and also Lora 7B at: https://youtu.be/UtcaO3zTCKQ hope it helps!

Spencer-Dawson commented 1 year ago

Two FYI things worth adding to this thread.

  1. I made my own installer wrapper for this project and stable-diffusion-webui on my github that I'm maintaining really for my own use.

  2. TH posted an article a few hours ago claiming AMD ROCm support for windows is coming back, but doesn't give a timeline. Until you can go to pytorch's website and see official pytorch rocm support for windows I'm just going to assume it's not worth the effort, but it certainly sounds like a good sign. https://www.tomshardware.com/news/amd-rocm-comes-to-windows-on-consumer-gpus

ScvWebFire commented 1 year ago

I got it somewhat working with torch dml. With only little changes (moved cpu install line to amd and added torch dml with pip install. run_cmd("python -m pip install torch-directml", assert_success=True, environment=True)

In script.py I replaced the cpu with dml: def _get_device(self, setting_name): if params[setting_name] is None: return torch.device("cuda:0") if torch.cuda.is_available() else torch_directml.device() return torch.device(params[setting_name])

In models.py I added the import torch_directml and changed the cuda line to. if torch.has_mps: device = torch.device('mps') model = model.to(device) else: device = torch_directml.device(); model = model.to(device)

But I ran out of 16GB of ram really quick.

Traceback (most recent call last): File "F:\Home\ai\oobabooga_windows\text-generation-webui\server.py", line 916, in <module> shared.model, shared.tokenizer = load_model(shared.model_name) File "F:\Home\ai\oobabooga_windows\text-generation-webui\modules\models.py", line 92, in load_model model = model.to(device) File "F:\Home\ai\oobabooga_windows\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 1896, in to return super().to(*args, **kwargs) File "F:\Home\ai\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1145, in to return self._apply(convert) File "F:\Home\ai\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply module._apply(fn) File "F:\Home\ai\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply module._apply(fn) File "F:\Home\ai\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply module._apply(fn) [Previous line repeated 2 more times] File "F:\Home\ai\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply param_applied = fn(param) File "F:\Home\ai\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1143, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) RuntimeError: Could not allocate tensor with 301989888 bytes. There is not enough GPU video memory available!

Now to figure out how to lower memory. The make something some correct code changes instead of hacks.

robonxt commented 1 year ago

I was wondering, since stable diffusion is running quite stable using directml with the option --medvram, is there a flag to prevent the ai from exceeding the vram?

maxime-fleury commented 1 year ago

Next version of ROCM 5.6.0 will be released soon ON WINDOWS, which should allow you to install pytorch

robonxt commented 1 year ago

In script.py I replaced the cpu with dml: def _get_device(self, setting_name): if params[setting_name] is None: return torch.device("cuda:0") if torch.cuda.is_available() else torch_directml.device() return torch.device(params[setting_name])

@ScvWebFire I've found and changed webui.py and models.py according to your instructions, but where is script.py? I'm scratching my head because I can't find out which file and what line am I changing from.

keyxmakerx commented 1 year ago

Gonna be that guy, Any way of getting a simple script, or easy to read instructions. So many changes/issues has led me to be quite confused of the status of AMD support for this tool? I know the one-touch installer says AMD not supported but i'm not sure if that is true given the amount of work that seems to have gone into the AMD discussions.

Enferlain commented 1 year ago

Not on windows, since there is no rocm on windows. The one click installer works on linux, and there are other guides to set it up https://rentry.org/eq3hg

Nicopara commented 1 year ago

Next version of ROCM 5.6.0 will be released soon ON WINDOWS, which should allow you to install pytorch

how soon?

robonxt commented 1 year ago

Gonna be that guy, Any way of getting a simple script, or easy to read instructions. So many changes/issues has led me to be quite confused of the status of AMD support for this tool? I know the one-touch installer says AMD not supported but i'm not sure if that is true given the amount of work that seems to have gone into the AMD discussions.

@keyxmakerx it works, just very slow and buggy. Im my tests, it's better to run GPT4ALL in the meantime.

keyxmakerx commented 1 year ago

Not on windows, since there is no rocm on windows. The one click installer works on linux, and there are other guides to set it up https://rentry.org/eq3hg

Hm, I was using the script on Linux. But upon selecting AMD it says "not supported" and then the script ends.

jllllll commented 1 year ago

I am attempting to add AMD GPU support for Linux to the one-click-installer. I don't have an AMD GPU, so any help testing is appreciated.

https://github.com/oobabooga/one-click-installers/pull/98 https://github.com/jllllll/one-click-installers/tree/cross-platform-amd

Cop13r commented 1 year ago

I am attempting to add AMD GPU support for Linux to the one-click-installer. I don't have an AMD GPU, so any help testing is appreciated.

oobabooga/one-click-installers#98 https://github.com/jllllll/one-click-installers/tree/cross-platform-amd

i just started today and can do some testing if needed

Spencer-Dawson commented 1 year ago

I am attempting to add AMD GPU support for Linux to the one-click-installer. I don't have an AMD GPU, so any help testing is appreciated.

oobabooga/one-click-installers#98 https://github.com/jllllll/one-click-installers/tree/cross-platform-amd

Looks promising but supporting AMD cards is probably more difficult then you'd first expect. I'll try to help you as best as I can.

jllllll commented 1 year ago

Looks promising but supporting AMD cards is probably more difficult then you'd first expect. I'll try to help you as best as I can.

Oh I know. There is a reason I've been putting this off till now. AMD really doesn't make it easy. I'm only hoping to get some AMD support. Manual work will always be necessary due to AMD's lack of effort in making ROCm accessible. Some GPUs simply aren't going to work and AMD adds and removes GPUs from that list constantly.

The only reason I'm even trying is because there is enough community support in place to make some automated setup worthwhile. At minimum, handling exllama AMD support in the installer is needed due to the NVIDIA-only exllama module in the webui's requirements.txt.

evanrodgers commented 1 year ago

Thank you for your work. The 7900XTX is effectively half the price of a 4090 (and not everyone loves buying used GPUs on eBay), so even though ROCm is a bummer, the need is definitely there.

ehartford commented 11 months ago

hello @oobabooga I try to run on my server

$ rocm-smi --showproductname

========================= ROCm System Management Interface =========================
=================================== Product Info ===================================
GPU[0]          : Card series:          Arcturus GL-XL [Instinct MI100]
GPU[0]          : Card model:           0x0c34
GPU[0]          : Card vendor:          Advanced Micro Devices, Inc. [AMD/ATI]
GPU[0]          : Card SKU:             D3431401
GPU[1]          : Card series:          Arcturus GL-XL [Instinct MI100]
GPU[1]          : Card model:           0x0c34
GPU[1]          : Card vendor:          Advanced Micro Devices, Inc. [AMD/ATI]
GPU[1]          : Card SKU:             D3431401
GPU[2]          : Card series:          Arcturus GL-XL [Instinct MI100]
GPU[2]          : Card model:           0x0c34
GPU[2]          : Card vendor:          Advanced Micro Devices, Inc. [AMD/ATI]
GPU[2]          : Card SKU:             D3431401
GPU[3]          : Card series:          Arcturus GL-XL [Instinct MI100]
GPU[3]          : Card model:           0x0c34
GPU[3]          : Card vendor:          Advanced Micro Devices, Inc. [AMD/ATI]
GPU[3]          : Card SKU:             D3431401
GPU[4]          : Card series:          Arcturus GL-XL [Instinct MI100]
GPU[4]          : Card model:           0x0c34
GPU[4]          : Card vendor:          Advanced Micro Devices, Inc. [AMD/ATI]
GPU[4]          : Card SKU:             D3431401
GPU[5]          : Card series:          Arcturus GL-XL [Instinct MI100]
GPU[5]          : Card model:           0x0c34
GPU[5]          : Card vendor:          Advanced Micro Devices, Inc. [AMD/ATI]
GPU[5]          : Card SKU:             D3431401
GPU[6]          : Card series:          Arcturus GL-XL [Instinct MI100]
GPU[6]          : Card model:           0x0c34
GPU[6]          : Card vendor:          Advanced Micro Devices, Inc. [AMD/ATI]
GPU[6]          : Card SKU:             D3431401
GPU[7]          : Card series:          Arcturus GL-XL [Instinct MI100]
GPU[7]          : Card model:           0x0c34
GPU[7]          : Card vendor:          Advanced Micro Devices, Inc. [AMD/ATI]
GPU[7]          : Card SKU:             D3431401
====================================================================================
=============================== End of ROCm SMI Log ================================

I installed requirements_amd.txt

I try to run like this:

LLAMA_HIPBLAS=on python ./server.py --listen --gpu-memory 32 32 32 32 32 32 32 32

then I got this message

bin /home/eric/miniconda3/envs/textgen/lib/python3.11/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/eric/miniconda3/envs/textgen/lib/python3.11/site-packages/bitsandbytes/cextension.py:33: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "

And I don't see any cards listed here in the UI

image

Any idea what I did wrong?

oobabooga commented 11 months ago

@ehartford I have never tested the web UI on an AMD GPU; apparently most quantized backends work (ExLlama, AutoGPTQ, llama.cpp), but I don't know about transformers. Based on your screenshot, it is possible that the following 2 parts of the code need to be updated for the AMD case:

https://github.com/oobabooga/text-generation-webui/blob/c0655475ae289e7e1eba51abfe02288171d1c58c/modules/ui.py#L90

https://github.com/oobabooga/text-generation-webui/blob/c0655475ae289e7e1eba51abfe02288171d1c58c/modules/ui_model_menu.py#L31

This issue is the most active currently on AMD support:

https://github.com/oobabooga/text-generation-webui/issues/3759

userbox020 commented 11 months ago

@ehartford sup bro, one question, why you not trying the ooba one click install?

ehartford commented 11 months ago

You suppose that works on an AMD server?

ehartford commented 11 months ago

I got it working (not with one click install). I needed to install HIP (in addition to ROCm)

userbox020 commented 11 months ago

I got it working (not with one click install). I needed to install HIP (in addition to ROCm)

Last week did a fresh ooba setup with one click installer and only gguf models and their old versions worked, i have confirmed it with few ooba guys, nowdays don't know if it already worked with other models too. Can you share your steps on how to compiled hip from source? I remember tried to do it like two months ago and didnt succed.

dnalbach commented 11 months ago

@ehartford - how is performance on your MI100 setup? The theoretical specs on the MI100 are higher than an A100, and they are dirt cheap in comparison to buy used on ebay. Even one of those cards working seems like it should be smoking fast with oobabooga?

DocMAX commented 10 months ago

Can it work on AMD APUs like 5800U? Doesn't look good to me...

root@pve:~# rocm-smi

======================= ROCm System Management Interface =======================
================================= Concise Info =================================
Exception caught: map::at
ERROR: GPU[0]           : sclk clock is unsupported
================================================================================
ERROR: 2 GPU[0]:RSMI_STATUS_NOT_SUPPORTED: This function is not supported in the current environment.
GPU  Temp   AvgPwr  SCLK  MCLK     Fan  Perf  PwrCap       VRAM%  GPU%
0    46.0c  19.0W   None  1600Mhz  0%   auto  Unsupported    4%   0%
================================================================================
============================= End of ROCm SMI Log ==============================
github-actions[bot] commented 9 months ago

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

TeaCult commented 2 months ago

This is easy . I am using 7600 XT with ROCM 6.1 , just adjust your environment with pyenv , do not use default one. I am giving you the list

OS (arch) : hip-runtime-amd 6.0.2-4 hipblas 6.0.2-1 hipblaslt 6.0.2-1 hipcub 6.0.2-1 hipfft 6.0.2-1 hiprand 6.0.2-1 hipsolver 6.0.2-1 hipsparse 6.0.2-1 hsa-rocr 6.0.2-2 miopen-hip 6.0.2-1 rccl 6.0.2-1 rocalution 6.0.2-2 rocblas 6.0.2-1 rocfft 6.0.2-1 rocm-clang-ocl 6.0.2-1 rocm-cmake 6.0.2-1 rocm-core 6.0.2-2 rocm-device-libs 6.0.2-1 rocm-hip-libraries 6.0.2-1 rocm-hip-runtime 6.0.2-1 rocm-hip-sdk 6.0.2-1 rocm-language-runtime 6.0.2-1 rocm-llvm 6.0.2-1 rocm-opencl-runtime 6.0.2-1 rocm-smi-lib 6.0.2-1 rocminfo 6.0.2-1 rocprim 6.0.2-1 rocrand 6.0.2-1 rocsolver 6.0.2-1 rocsparse 6.0.2-2 rocthrust 6.0.2-1 roctracer 6.0.2-1

python:

pytorch-triton-rocm 3.0.0 torch 2.4.0+rocm6.1 torchaudio 2.4.0+rocm6.1 torchvision 0.19.0+rocm6.1

pip install accelerate datasets einops gradio hqq jinja2 lm_eval markdown numba numpy optimum pandas peft Pillow psutil pyyaml requests rich safetensors scipy sentencepiece tensorboard transformers tqdm wandb

I have installed llama_cpp_python for rocm and then linked it as llama_cpp_python_cuda
Then I modified the text_generation_web_ui start script for hijacking so it wont give error of a library is already loaded. I have it on discussions on text-generation web ui website ...

It works very good with both exllamav2 and llama.cpp and transformers

(This section is out of scope of this issue) However, I think rocm is incomplete , or I should search for transformers library for romc or something. Quantization via transformers or bitsandbytes are not go in rocm :

NotImplementedError: Could not run 'quantized::linear_dynamic' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'quantized::linear_dynamic' is only available for these backends: [CPU, Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastXPU, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

CPU: registered at ../aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp:782 [kernel] Meta: registered at ../aten/src/ATen/core/MetaFallbackKernel.cpp:23 [backend fallback] BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback] Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:153 [backend fallback] FuncTorchDynamicLayerBackMode: registered at ../aten/src/ATen/functorch/DynamicLayer.cpp:497 [backend fallback] Functionalize: registered at ../aten/src/ATen/FunctionalizeFallbackKernel.cpp:349 [backend fallback] Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback] Conjugate: registered at ../aten/src/ATen/ConjugateFallback.cpp:17 [backend fallback] Negative: registered at ../aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback] ZeroTensor: registered at ../aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback] ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:86 [backend fallback] AutogradOther: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:53 [backend fallback] AutogradCPU: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:57 [backend fallback] AutogradCUDA: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:65 [backend fallback] AutogradXLA: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:69 [backend fallback] AutogradMPS: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:77 [backend fallback] AutogradXPU: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:61 [backend fallback] AutogradHPU: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:90 [backend fallback] AutogradLazy: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:73 [backend fallback] AutogradMeta: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:81 [backend fallback] Tracer: registered at ../torch/csrc/autograd/TraceTypeManual.cpp:297 [backend fallback] AutocastCPU: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:209 [backend fallback] AutocastXPU: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:351 [backend fallback] AutocastCUDA: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:165 [backend fallback] FuncTorchBatched: registered at ../aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:731 [backend fallback] BatchedNestedTensor: registered at ../aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:758 [backend fallback] FuncTorchVmapMode: fallthrough registered at ../aten/src/ATen/functorch/VmapModeRegistrations.cpp:27 [backend fallback] Batched: registered at ../aten/src/ATen/LegacyBatchingRegistrations.cpp:1075 [backend fallback] VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback] FuncTorchGradWrapper: registered at ../aten/src/ATen/functorch/TensorWrapper.cpp:207 [backend fallback] PythonTLSSnapshot: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:161 [backend fallback] FuncTorchDynamicLayerFrontMode: registered at ../aten/src/ATen/functorch/DynamicLayer.cpp:493 [backend fallback] PreDispatch: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:165 [backend fallback] PythonDispatcher: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:157 [backend fallback]