Open notdanilo opened 1 year ago
Are you using cudnn and flash-attention? If not these are likely to speed up the generation massively, you can turn them on via --features cudnn,flash-attn
and use the --use-flash-attn
argument (note that this is pretty slow to compile and you may want to set the CANDLE_FLASH_ATTN_BUILD_DIR
environment variable to ensure that it's not recompiled too often.
I need some help here. I am failing to build the 'flash-attn'. I just installed cudnn-windows-x86_64-8.9.4.25_cuda12-archive
on the NVIDIA GPU Computing Toolkit\CUDA\v12.2
folder (i.e. copied bin
, include
and lib
). But I am facing lots of errors like these:
[... many others rerun-if-changed defs above. Just showing relevant info bellow]
cargo:rerun-if-changed=kernels/static_switch.h
cargo:rustc-env=CUDA_INCLUDE_DIR=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\include
cargo:rerun-if-env-changed=CUDA_COMPUTE_CAP
cargo:rustc-env=CUDA_COMPUTE_CAP=sm_86
cutlass/include\cute/numeric/math.hpp(299): error: identifier "not" is undefined
typename std::enable_if<(not std::is_unsigned<T>::value)>::type* = nullptr>
^
cutlass/include\cute/numeric/math.hpp(299): error: expected a ")"
typename std::enable_if<(not std::is_unsigned<T>::value)>::type* = nullptr>
^
cutlass/include\cute/numeric/math.hpp(299): error: expected a "," or ">"
typename std::enable_if<(not std::is_unsigned<T>::value)>::type* = nullptr>
^
cutlass/include\cute/numeric/math.hpp(299): error: the global scope has no "type"
typename std::enable_if<(not std::is_unsigned<T>::value)>::type* = nullptr>
^
Googling for this actual error, I came across this issue, in a nutshell flash-attn-v2 doesn't seem to support building on windows at the moment because of cutlass.
Stable Diffusion is super slow. It more than 30 seconds to generate an example image with the default configuration with a RTX 4090 and CUDA enabled while it would take less than 5 seconds with diffusers.