leejet / stable-diffusion.cpp

Stable Diffusion and Flux in pure C/C++
MIT License
3.5k stars 304 forks source link

When trying to build with SD_HIPBLAS I get a CUDA compilation errors that gfx1100 is not a supported architecture #464

Open lcarsos opened 1 day ago

lcarsos commented 1 day ago

Following the build instructions in the readme,

cmake .. -G "Ninja" -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DSD_HIPBLAS=ON -DCMAKE_BUILD_TYPE=Release -DAMDGPU_TARGETS=gfx1100
cmake --build . --config Release -j 32

I get several failures that look just like this.

[1/74] Building HIP object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/arange.cu.o
FAILED: ggml/src/CMakeFiles/ggml.dir/ggml-cuda/arange.cu.o 
ccache /opt/rocm/lib/llvm/bin/clang++  -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_MAX_NAME=128 -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_HIPBLAS -DGGML_USE_OPENMP -DK_QUANTS_PER_ITERATION=2 -DSD_USE_HIPBLAS -DUSE_PROF_API=1 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -D__HIP_PLATFORM_AMD__=1 -I/home/user/p/stable-diffusion.cpp/ggml/src/../include -I/home/user/p/stable-diffusion.cpp/ggml/src/. -I/opt/rocm/include --offload-arch=gfx1100 -o ggml/src/CMakeFiles/ggml.dir/ggml-cuda/arange.cu.o  -c /home/user/p/stable-diffusion.cpp/ggml/src/ggml-cuda/arange.cu
clang++: error: unsupported CUDA gpu architecture: gfx1100

I pulled ggml separately at the same commit pinned in the repo, it also doesn't build, with the same errors. switching to master ggml builds completely, but there must have been some breaking changes, as when I try using it with the latest here, there are other compile errors.

Any tips or pointers for how I might fix this? I'm willing, but I've poked at this for a few hours now and haven't made any progress.


hipconfig:

% hipconfig
HIP version: 6.2.41134-0

==hipconfig
HIP_PATH           :/opt/rocm
ROCM_PATH          :/opt/rocm
HIP_COMPILER       :clang
HIP_PLATFORM       :amd
HIP_RUNTIME        :rocclr
CPP_CONFIG         : -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm/include -I/include

==hip-clang
HIP_CLANG_PATH     :/opt/rocm/lib/llvm/bin
clang version 18.0.0git (/srcdest/rocm-llvm 77cf9ad00e298ed06e06aec0f81009510f545714)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/lib/llvm/bin
AOMP-18.0-12 (http://github.com/ROCm-Developer-Tools/aomp):
 Source ID:18.0-12-ce1873ac686bb90ddec72bb99889a4e80e2de382
  LLVM version 18.0.0git
  Optimized build with assertions.
  Default target: x86_64-pc-linux-gnu
  Host CPU: znver3

  Registered Targets:
    amdgcn  - AMD GCN GPUs
    nvptx   - NVIDIA PTX 32-bit
    nvptx64 - NVIDIA PTX 64-bit
    r600    - AMD GPUs HD2XXX-HD6XXX
    x86     - 32-bit X86: Pentium-Pro and above
    x86-64  - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags :
 -isystem "/opt/rocm/include" -O3
hip-clang-ldflags :
--driver-mode=g++ -L "/opt/rocm/lib" -lamdhip64 -O3 -L/opt/rocm/lib --hip-link

== Environment Variables
PATH =/home/user/p/ggml/.venv/bin:/home/user/.pyenv/shims:/home/user/.cargo/bin:/home/user/.local/bin:/home/user/.cargo/bin:/home/user/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/opt/brlcad/bin:/opt/cuda/bin:/opt/cuda/nsight_compute:/opt/cuda/nsight_systems/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/opt/rocm/bin:/usr/lib/rustup/bin

egrep: warning: egrep is obsolescent; using grep -E
CUDA_PATH=/opt/cuda

== Linux Kernel
Hostname      :
jamesmonroe
Linux jamesmonroe 6.11.8-arch1-2 #1 SMP PREEMPT_DYNAMIC Fri, 15 Nov 2024 15:35:07 +0000 x86_64 GNU/Linux
LSB Version:    n/a
Distributor ID: Arch
Description:    Arch Linux
Release:    rolling
Codename:   n/a
softcookiepp commented 18 hours ago

I sadly can't give any advice, other than that ROCm ended up being too much of a headache for me to work with too. I ended up just using the Vulkan backend, and I recommend that you do the same. AMD doesn't put a high priority on fixing issues specific to consumer-grade GPUs in ROCm, unfortunately. Until this changes, it is better to avoid it.

fszontagh commented 5 hours ago

@lcarsos I don't know too much about the AMD architectures, but if you check out the sdcpp workflow, there is a rocm build (but only on windows). That's using the rocm 5.5 with gfx1100 too:

defines: '-G Ninja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DSD_HIPBLAS=ON -DCMAKE_BUILD_TYPE=Release -DAMDGPU_TARGETS="gfx1100;gfx1102;gfx1030" -DSD_BUILD_SHARED_LIBS=ON'

And there is my cmake file in my project, where the sdcpp versions are built with the workflow sucessfully.

I hope this helps.