turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.2k stars 236 forks source link

EndeavourOS + ROCm crashes computer #340

Closed NinjaPerson24119 closed 2 weeks ago

NinjaPerson24119 commented 4 months ago

CUDA build seems to work fine, however

turboderp commented 4 months ago

Can you give me some details about the GPU model, Torch version etc.?

NinjaPerson24119 commented 4 months ago

Oh yeah totally.

steps

  1. clone
  2. python3 -m virtualenv venv && source ./venv/bin/activate
  3. pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7
  4. python3 -r requirements.txt
  5. Then it seems to be missing a dep so pip install tokenizer
  6. pip install .
  7. Then try to run a model. Doesn't seem to matter which Python script I run, it gets stuck loading then crashes

pip list

Package             Version
------------------- --------------
certifi             2022.12.7
charset-normalizer  2.1.1
cramjam             2.8.1
exllamav2           0.0.13.post2
fastparquet         2024.2.0
filelock            3.9.0
fsspec              2024.2.0
huggingface-hub     0.20.3
idna                3.4
Jinja2              3.1.2
MarkupSafe          2.1.3
mpmath              1.3.0
networkx            3.2.1
ninja               1.11.1.1
numpy               1.26.3
packaging           23.2
pandas              2.2.0
pillow              10.2.0
pip                 23.3.1
Pygments            2.17.2
python-dateutil     2.8.2
pytorch-triton-rocm 2.2.0
pytz                2024.1
PyYAML              6.0.1
regex               2023.12.25
requests            2.28.1
safetensors         0.4.2
sentencepiece       0.1.99
setuptools          69.0.2
six                 1.16.0
sympy               1.12
tokenizers          0.15.2
torch               2.2.0+rocm5.7
torchaudio          2.2.0+rocm5.7
torchvision         0.17.0+rocm5.7
tqdm                4.66.2
typing_extensions   4.8.0
tzdata              2024.1
urllib3             1.26.13
websockets          12.0
wheel               0.42.0

However, do note that, I'm not sure if pytorch includes shared libs. the latest package on Arch for ROCm is 6.0, which isn't backwards compatible.

sudo pacman -Q | grep rocm
python-pytorch-opt-rocm 2.2.0-1
python-torchvision-rocm 0.16.2-1
python-torchvision-rocm-debug 0.16.2-1
rocm-clang-ocl 6.0.0-1
rocm-cmake 6.0.0-1
rocm-core 6.0.0-2
rocm-device-libs 6.0.0-1
rocm-hip-libraries 6.0.0-1
rocm-hip-runtime 6.0.0-1
rocm-hip-sdk 6.0.0-1
rocm-language-runtime 6.0.0-1
rocm-llvm 6.0.0-2
rocm-opencl-runtime 6.0.0-1
rocm-opencl-sdk 6.0.0-1
rocm-smi-lib 6.0.0-1
rocminfo 6.0.0-1
torchvision-rocm 0.16.2-1

The system libraries I don't think should be in use in a venv though.

CUDA one I tried on 3070TI. Works fine on same machine.

lufixSch commented 4 months ago

ExllamaV2 is working for me with ROCm 6.0 on my 7900 XTX. But I installed the nightly build of pytorch for rocm 6.0. Maybe this will fix it.

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.0

Otherwise try setting GPU_MAX_HW_QUEUES=1 before running exllama. There is a ROCm bug which causes 100% GPU usage and sometimes even a system crash. I observed, that this issue is more severe with multiple GPUs connected to the system or if the GPU is connected to PCIe via Chipset instead of directly to the CPU

NinjaPerson24119 commented 3 months ago

So I got this to work by using the nightly wheel, good idea. But I also got a 2nd 7900 XTX and now it crashes as soon as I try to use GPU splits.

NinjaPerson24119 commented 3 months ago

That GPU_MAX_HW_QUEUES=1 is magic. Makes the two GPUs work without the full system crash. Oddly, it's not necessary for llama.cpp.