"Illegal Instruction" on Older CPUS

jett06 commented 2 weeks ago

Hey there! I'm reaching out from a machine running the Intel i3-2120 processor.

The llm-rs library (https://github.com/rustformers/llm) can handle inference just fine on this processor (previously failed on older CPUs, works fine nowadays), but candle fails. I tried:

$ cargo run --example quantized-phi --release
    Finished release [optimized] target(s) in 3.59s
     Running `/home/jett/devel/candle/target/release/examples/quantized-phi`
avx: true, neon: false, simd128: false, f16c: false
temp: 0.80 repeat-penalty: 1.10 repeat-last-n: 64
Running on CPU, to run on GPU, build this example with `--features cuda`
loaded 325 tensors (1.79GB) in 0.16s
model built
[1]    13404 illegal hardware instruction  cargo run --example quantized-phi --release

You can see the failure output. An ML/AI library throwing "Illegal instruction" usually means your program needs features that your CPU doesn't offer, so I tried recompiling without AVX2 instructions enabled (https://github.com/huggingface/candle/issues/622#issuecomment-1694726330), but it still failed:

$ RUSTFLAGS="-Ctarget-cpu=native -Ctarget-feature=-avx2" cargo run --example quantized-phi
    Finished dev [unoptimized + debuginfo] target(s) in 9.35s
     Running `/home/jett/devel/candle/target/debug/examples/quantized-phi`
avx: true, neon: false, simd128: false, f16c: false
temp: 0.80 repeat-penalty: 1.10 repeat-last-n: 64
Running on CPU, to run on GPU, build this example with `--features cuda`
loaded 325 tensors (1.79GB) in 0.21s
model built
[1]    18202 illegal hardware instruction  RUSTFLAGS="-Ctarget-cpu=native -Ctarget-feature=-avx2" cargo run --example

Any ideas? I realize it might be a code-specific issue with the way candle is written, but I'm hoping it would have a simple fix (with something like a feature flag to check for CPU feature presence that enables certain code so this doesn't throw "Illegal instruction"). Thanks in advance for any reply, and thanks to all the maintainers for maintaining such an awesome ML/AI library!

TL;DR: Candle uses unsupported CPU features in its quantized LLM inference that is encountered when testing the prewritten examples. This causes execution to fail on older CPUs, such as the Intel i3-2120.

LaurentMazare commented 2 weeks ago

Could you try removing the target-cpu=native flag entirely and remove the .cargo/config.toml file if you're running within the main candle repo? Would be interesting to see if it still get some illegal instructions there.

jett06 commented 2 weeks ago

Could you try removing the target-cpu=native flag entirely and remove the .cargo/config.toml file if you're running within the main candle repo? Would be interesting to see if it still get some illegal instructions there.

Sure! I can't believe I didn't think of removing .cargo/config.toml. It runs fine now:

$ cargo run --example quantized-phi --release
# [...]
Finished release [optimized] target(s) in 5m 36s
     Running `target/release/examples/quantized-phi`
avx: false, neon: false, simd128: false, f16c: false
temp: 0.80 repeat-penalty: 1.10 repeat-last-n: 64
Running on CPU, to run on GPU, build this example with `--features cuda`
loaded 325 tensors (1.79GB) in 0.20s
model built
Write a function to count prime numbers up to N. 
# Here, N = 20
# Output: 8
'''

def prime_numbers(n):
    flag = [True for i in range(n+1)]
    primes = []
    for i in range(2, n + 1):
        if flag[i^C

I'm confused why this implies that avx won't work on my machine (avx: true becomes avx: false and is the only noticeable difference between the -Ctarget-cpu=native and not builds' outputs), though? It looks like my machine supports avx:

grep "avx" /proc/cpuinfo 
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer xsave avx lahf_lm epb pti tpr_shadow flexpriority ept vpid xsaveopt dtherm arat pln pts vnmi
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer xsave avx lahf_lm epb pti tpr_shadow flexpriority ept vpid xsaveopt dtherm arat pln pts vnmi
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer xsave avx lahf_lm epb pti tpr_shadow flexpriority ept vpid xsaveopt dtherm arat pln pts vnmi
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer xsave avx lahf_lm epb pti tpr_shadow flexpriority ept vpid xsaveopt dtherm arat pln pts vnmi

Unless avx: true in candle's output means both avx and avx2 features are called, which is why it throws Illegal instruction, since my CPU does not support avx2?

huggingface / candle

"Illegal Instruction" on Older CPUS #2140