huggingface / candle

Minimalist ML framework for Rust
Apache License 2.0
15.72k stars 942 forks source link

M1 process stuck while trying to run mixtal example #2371

Open danielclough opened 3 months ago

danielclough commented 3 months ago

I tried Mixtral on my 128GB M1 Mac. The process state flips between running and stuck.

The cli output from the example is not progressing past this poing:

Running `target/release/examples/mixtral --prompt 'def print_prime(n): '`
avx: false, neon: true, simd128: false, f16c: false
temp: 0.00 repeat-penalty: 1.10 repeat-last-n: 64
retrieved the files in 83.206834ms
loaded the model in 152.927567834s
def print_prime(n

I started the process with the command from an up to date repo:

cargo run --features metal --example mixtral --release  -- --prompt "def print_prime(n): "
LaurentMazare commented 2 months ago

Actually using mixtral on metal was falling back to f32 as we didn't have bf16 support for matmul on gemm. So this would require > 200GB of memory to work. @ivarflakstad just added support for bf16 matmul in #2364 so you can give it another try with the changes in #2378 (cannot really try it as my mac is not beefy enough). If that works well, we'll generalize that to other bf16 based models.

danielclough commented 2 months ago

My system is using DType::F32 from device.bf16_default_to_f32().

Can I provide any useful info for debugging?

ivarflakstad commented 2 months ago

Had to revert because while it was correct on M3 it was not on M1/M2. I’m looking into it :)