liuliu / swift-diffusion

BSD 3-Clause "New" or "Revised" License
424 stars 33 forks source link

Issues in CUDA version of swift-diffusion #57

Open ghost opened 1 month ago

ghost commented 1 month ago

I tried to setup up swift-diffusion with CUDA, the build seems to run without any error. But when i generate an image with txt2img, i get gray image. the diffusion steps run very fast. And there is not GPU usage. But there is no error thrown.

Could you help what I could be missing?

I tried an old version of swift diffusion and i got this error:

ndefinedBehaviorSanitizer:DEADLYSIGNAL
==16880==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55d225c9251c bp 0x7ffe7351afb0 sp 0x7ffe7351af10 T16880)
==16880==The signal is caused by a WRITE memory access.
==16880==Hint: address points to the zero page.
    #0 0x55d225c9251c in ccv_cnnp_model_new (/root/.cache/bazel/_bazel_root/583b35200f4b85c52a829767b6536391/execroot/__main__/bazel-out/k8-opt/bin/examples/txt2img+0x28f51c) (BuildId: 4b8e3fbf7c150ca5)

I am on Ubuntu 22 with RTX3090

liuliu commented 1 month ago

There aren't meaningful difference in code between macOS and CUDA (that's the point, we usually do model conversion with NVIDIA hardware and then integrate into macOS app directly).

However, CUDA doesn't work with ubsan so this error looks strange.

Your WORKSPACE should at least have these turned on:

ccv_setting(
    name = "local_config_ccv",
    have_cblas = True,
    have_cudnn = True,
    have_pthread = True,
    use_openmp = True,
)

and your .bazelrc.local should have these:

build --action_env TF_NEED_CUDA="1"
build --action_env TF_NEED_OPENCL="1"
build --action_env TF_CUDA_CLANG="0"
build --action_env HOST_CXX_COMPILER="/usr/local/bin/clang"
build --action_env HOST_C_COMPILER="/usr/local/bin/clang"
build --action_env CLANG_CUDA_COMPILER_PATH="/usr/local/bin/clang"
build --action_env GCC_HOST_COMPILER_PATH="/usr/local/bin/clang"

build --action_env CUDA_TOOLKIT_PATH="/usr/local/cuda"
build --action_env TF_CUDA_VERSION="12.4"
 build --action_env TF_CUDA_COMPUTE_CAPABILITIES="8.0"
 build --action_env COMPUTECPP_TOOLKIT_PATH="/usr/local/computecpp"
 build --action_env TMP="/tmp"
 build --action_env TF_CUDNN_VERSION="8"
 build --action_env CUDNN_INSTALL_PATH="/usr"
 build --action_env TF_NCCL_VERSION="2"
 build --action_env NCCL_INSTALL_PATH="/usr"

 build --config=clang
 build --config=cuda

 build --linkopt="-z nostart-stop-gc"
 build --host_linkopt="-z nostart-stop-gc"

 build --define=enable_sm80=true

I am not sure if 3090 needs enable_sm80, you also need to check your CUDA version (we support from 11 to 12 I believe, but you need to specify exact version there). For CUDNN, we recommend from 7.x to 8.x.

CUDNN is required.