Closed Pox-here closed 6 months ago
Could you provide more details about what you ran? The following model works all well for me
cargo run --profile=release-with-debug --features cuda --example mistral -- --quantized --prompt "Hello "
The main thing that could trigger the error you're seing is the candle-nn
create not having the cuda feature enabled but that should be the case if you're enabling the cuda
feature flag for anything in candle-examples
.
I am not running the example, its a modified code using mistral based model for inference.
However I located the solution based on your feedback. In my Cargo.toml:
- candle-nn = { git = "https://github.com/huggingface/candle.git" }
+ candle-nn = { git = "https://github.com/huggingface/candle.git", features = ["cuda"] }
I didnt need to add any flag, and successfully loads and runs the quantized mistral based model using candle. Tnx,
Closing
I fixed this by ensuring cuda features were enabled for both candle-nn
and candle-core
like:
candle-core = { git = "https://github.com/huggingface/candle.git", version = "0.6.0", features = ["cuda"] }
candle-nn = { git = "https://github.com/huggingface/candle.git", version = "0.6.0", features = ["cuda"] }
I fixed this by ensuring cuda features were enabled for both
candle-nn
andcandle-core
like:candle-core = { git = "https://github.com/huggingface/candle.git", version = "0.6.0", features = ["cuda"] } candle-nn = { git = "https://github.com/huggingface/candle.git", version = "0.6.0", features = ["cuda"] }
@sidharthrajaram
Yes thats correct, if you see my previous comment, I stated this too, but in a slight unclear matter. I take self criticism for this. Tnx
Running quantized mistral,
avx: false, neon: false, simd128: false, f16c: false temp: 0.80 repeat-penalty: 1.10 repeat-last-n: 64 loaded 291 tensors (3.08GB) in 0.03s Current device: Cuda(CudaDevice(DeviceId(1)))
model built successfully
When attempting inference, experiencing following issue:
Error: Cuda("no cuda implementation for rms-norm")
Is this expected and something that will be introduced later, or is there an issue here? I pulled latest main, including the one fixing the issue regarding "not a f64 F32(1e-5)", which I first encountered: https://github.com/huggingface/candle/pull/1913
Any suggestions or statements regarding the missing implementation for rms-norm would be appreciated, tnx