ZLUDA with Pytorch not working on Ubuntu 22.04

radna0 commented 4 months ago

I'm trying to setup pytorch and zluda to run cuda on amd gpus, but to no avail

(base) r4-0@r40-desktop:~/pytorch$ LD_LIBRARY_PATH="$HOME/zluda:$LD_LIBRARY_PATH" python3 
Python 3.12.3 | packaged by Anaconda, Inc. | (main, May  6 2024, 19:46:43) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
False
>>> torch.cuda.device_count()
0

This is the setup scrip I use, for setting up the OS and building pytorch

OS

#!/bin/bash

# Update system and install essential packages
sudo apt-get update -y && sudo apt-get upgrade -y
sudo DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
    ca-certificates \
    nano \
    wget \
    curl \
    gnupg \
    ripgrep \
    ltrace \
    file \
    python3-minimal \
    build-essential \
    git \
    cmake \
    ninja-build \
    python3-pip

# Set environment variables
export PATH="${PATH}:/opt/rocm/bin:/opt/rocm/llvm/bin:/usr/local/cuda/bin/"
export NVIDIA_VISIBLE_DEVICES=all
export NVIDIA_DRIVER_CAPABILITIES=compute,utility

# Install CUDA
CUDA_VERSION="11-8"
sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
sudo DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
    nvidia-headless-no-dkms-515 \
    nvidia-utils-515 \
    cuda-cudart-${CUDA_VERSION} \
    cuda-compiler-${CUDA_VERSION} \
    libcufft-dev-${CUDA_VERSION} \
    libcusparse-dev-${CUDA_VERSION} \
    libcublas-dev-${CUDA_VERSION} \
    cuda-nvml-dev-${CUDA_VERSION} \
    libcudnn8-dev \
    cuda-toolkit-${CUDA_VERSION} \
    cudnn9-cuda-${CUDA_VERSION}

export CUDA_HOME=/usr/local/cuda-11.8
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.8/lib64:/usr/local/cuda/extras/CUPTI/lib64
export PATH=$PATH:$CUDA_HOME/bin

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sudo sh -s -- -y
source $HOME/.cargo/env

# Install ROCm

sudo apt update
wget http://repo.radeon.com/amdgpu-install/23.40.2/ubuntu/jammy/amdgpu-install_6.0.60002-1_all.deb
sudo apt install -y ./amdgpu-install_6.0.60002-1_all.deb
sudo amdgpu-install -y --usecase=graphics,rocm,hip,hiplibsdk
sudo usermod -a -G render,video $LOGNAME

# Install Miniconda
cd $HOME
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod +x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda3
echo 'export PATH="$HOME/miniconda3:$PATH"' >> $HOME/.bashrc
source $HOME/miniconda3/etc/profile.d/conda.sh
source $HOME/.bashrc
conda init
conda --version
source $HOME/.bashrc

# Cleanup
cd $HOME
sudo rm -rf cuda-keyring_1.0-1_all.deb Miniconda3-latest-Linux-x86_64.sh
sudo apt-get autoclean -y
sudo apt-get autoremove -y

# Default to a login shell
source $HOME/.bashrc

ZLUDA and Pytorch

#!/bin/bash

cd $HOME

# Set destination directory
destination="$HOME"
wget https://github.com/lshqqytiger/ZLUDA/releases/download/rel.11cc5844514f93161e0e74387f04e2c537705a82/ZLUDA-linux-amd64.tar.gz -P "$destination"
tar -xzf "$destination/ZLUDA-linux-amd64.tar.gz" -C "$destination"

#git clone --recurse-submodules https://github.com/vosen/ZLUDA.git $HOME/ZLUDA
#cd $HOME/ZLUDA
#cargo xtask --release

# Install PyTorch
git clone --recursive https://github.com/pytorch/pytorch $HOME/pytorch
cd $HOME/pytorch
git submodule sync
git submodule update --init --recursive
conda install -y cmake ninja
pip install -r requirements.txt
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
export TORCH_CUDA_ARCH_LIST="6.1+PTX"
export CUDAARCHS=61
export CMAKE_CUDA_ARCHITECTURES=61
export USE_SYSTEM_NCCL=1
export USE_NCCL=0
export USE_EXPERIMENTAL_CUDNN_V8_API=OFF
export DISABLE_ADDMM_CUDA_LT=1
export USE_ROCM=OFF
LD_LIBRARY_PATH="$HOME/zluda:$LD_LIBRARY_PATH" python3 setup.py develop

# Cleanup
cd $HOME

Yama-K commented 4 months ago

Just out of curiosity, why would you run zluda on Linux?

radna0 commented 4 months ago

Running on Linux would allow me to spin up in instance using the script like above and also control it via ssh. I'm trying to build out gpu clusters for deep learning so I believe going with Linux is the wisest choice here

lshqqytiger commented 4 months ago

They provide PyTorch packages built with ROCm. You can reach same goals with them. ZLUDA is meaningful when you are trying to run CUDA-only softwares.

radna0 commented 4 months ago

Rocm is not widely adopted as Cuda, there are many libraries that do not officially work with Rocm like Cuda. Flash attention might work for example but not flash attention 2.

unclemusclez commented 3 months ago

They provide PyTorch packages built with ROCm. You can reach same goals with them. ZLUDA is meaningful when you are trying to run CUDA-only softwares.

This is only for Linux. PyTorch does not work for Windows with ROCm currently. ZLUDA is particularly useful for Windows Users who have AMD Hardware. HIP/ROCm is only available up to 5.7.1 for windows (6.1 on Linux). PyTorch started support for ROCm with 6.

ComfyUI-Zluda https://github.com/patientx/ComfyUI-Zluda is making use of an older ZLUDA, but i am currently using your fork, which works generally quite well.

Is ZLUDA 3.8 compatible with 12.4 or 12.1 CUDA? I am running 11.8 CUDA Pytorch with ComfyUI via ZLUDA on a 7900XT.

lshqqytiger commented 2 months ago

I made it work by building PyTorch myself. The official CUDA release of PyTorch won't work.

lshqqytiger / ZLUDA

ZLUDA with Pytorch not working on Ubuntu 22.04 #20

OS

ZLUDA and Pytorch