Open Gadersd opened 1 week ago
Compilation can be very long for the cuda kernels flash attn (and easily runs out of memory too). More than 10 minutes wouldn't be surprising. Do you see anything in top / ps (nvcc, cicc, ...)?
Also you probably want to set the CANDLE_FLASH_ATTN_BUILD_DIR
environment variable to something like $HOME/.candle
so that the kernel compilation doesn't trigger too often.
When I added
candle-flash-attn
to my .toml file the build process seems hang onBuilding [=======================> ] 114/118: candle-flash-attn(build)
and the compilation doesn't proceed.My .toml file is