Closed NinjaPerson24119 closed 2 weeks ago
Can you give me some details about the GPU model, Torch version etc.?
Oh yeah totally.
steps
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7
python3 -r requirements.txt
pip install tokenizer
pip list
Package Version
------------------- --------------
certifi 2022.12.7
charset-normalizer 2.1.1
cramjam 2.8.1
exllamav2 0.0.13.post2
fastparquet 2024.2.0
filelock 3.9.0
fsspec 2024.2.0
huggingface-hub 0.20.3
idna 3.4
Jinja2 3.1.2
MarkupSafe 2.1.3
mpmath 1.3.0
networkx 3.2.1
ninja 1.11.1.1
numpy 1.26.3
packaging 23.2
pandas 2.2.0
pillow 10.2.0
pip 23.3.1
Pygments 2.17.2
python-dateutil 2.8.2
pytorch-triton-rocm 2.2.0
pytz 2024.1
PyYAML 6.0.1
regex 2023.12.25
requests 2.28.1
safetensors 0.4.2
sentencepiece 0.1.99
setuptools 69.0.2
six 1.16.0
sympy 1.12
tokenizers 0.15.2
torch 2.2.0+rocm5.7
torchaudio 2.2.0+rocm5.7
torchvision 0.17.0+rocm5.7
tqdm 4.66.2
typing_extensions 4.8.0
tzdata 2024.1
urllib3 1.26.13
websockets 12.0
wheel 0.42.0
However, do note that, I'm not sure if pytorch includes shared libs. the latest package on Arch for ROCm is 6.0, which isn't backwards compatible.
sudo pacman -Q | grep rocm
python-pytorch-opt-rocm 2.2.0-1
python-torchvision-rocm 0.16.2-1
python-torchvision-rocm-debug 0.16.2-1
rocm-clang-ocl 6.0.0-1
rocm-cmake 6.0.0-1
rocm-core 6.0.0-2
rocm-device-libs 6.0.0-1
rocm-hip-libraries 6.0.0-1
rocm-hip-runtime 6.0.0-1
rocm-hip-sdk 6.0.0-1
rocm-language-runtime 6.0.0-1
rocm-llvm 6.0.0-2
rocm-opencl-runtime 6.0.0-1
rocm-opencl-sdk 6.0.0-1
rocm-smi-lib 6.0.0-1
rocminfo 6.0.0-1
torchvision-rocm 0.16.2-1
The system libraries I don't think should be in use in a venv though.
CUDA one I tried on 3070TI. Works fine on same machine.
ExllamaV2 is working for me with ROCm 6.0 on my 7900 XTX. But I installed the nightly build of pytorch for rocm 6.0. Maybe this will fix it.
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.0
Otherwise try setting GPU_MAX_HW_QUEUES=1
before running exllama. There is a ROCm bug which causes 100% GPU usage and sometimes even a system crash. I observed, that this issue is more severe with multiple GPUs connected to the system or if the GPU is connected to PCIe via Chipset instead of directly to the CPU
So I got this to work by using the nightly wheel, good idea. But I also got a 2nd 7900 XTX and now it crashes as soon as I try to use GPU splits.
That GPU_MAX_HW_QUEUES=1
is magic. Makes the two GPUs work without the full system crash. Oddly, it's not necessary for llama.cpp.
CUDA build seems to work fine, however