huggingface / candle

Minimalist ML framework for Rust
Apache License 2.0
14.24k stars 797 forks source link

candle-flash-attn infinite compile time #2275

Open Gadersd opened 1 week ago

Gadersd commented 1 week ago

When I added candle-flash-attn to my .toml file the build process seems hang on Building [=======================> ] 114/118: candle-flash-attn(build) and the compilation doesn't proceed.

My .toml file is

[package]
name = "occam"
version = "0.1.0"
edition = "2021"

[dependencies]
candle-core = { git = "https://github.com/huggingface/candle.git", version = "0.6", features = ["cuda"] }
candle-flash-attn = { git = "https://github.com/huggingface/candle.git", version = "0.6", features = [] }
candle-nn = { git = "https://github.com/huggingface/candle.git", version = "0.6", features = ["cuda"]}
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
glob = "0.3"
rand = "0.8"
anyhow = "1.0"
LaurentMazare commented 1 week ago

Compilation can be very long for the cuda kernels flash attn (and easily runs out of memory too). More than 10 minutes wouldn't be surprising. Do you see anything in top / ps (nvcc, cicc, ...)? Also you probably want to set the CANDLE_FLASH_ATTN_BUILD_DIR environment variable to something like $HOME/.candle so that the kernel compilation doesn't trigger too often.