Open jett06 opened 2 weeks ago
Could you try removing the target-cpu=native
flag entirely and remove the .cargo/config.toml
file if you're running within the main candle repo? Would be interesting to see if it still get some illegal instructions there.
Could you try removing the
target-cpu=native
flag entirely and remove the.cargo/config.toml
file if you're running within the main candle repo? Would be interesting to see if it still get some illegal instructions there.
Sure! I can't believe I didn't think of removing .cargo/config.toml.
It runs fine now:
$ cargo run --example quantized-phi --release
# [...]
Finished release [optimized] target(s) in 5m 36s
Running `target/release/examples/quantized-phi`
avx: false, neon: false, simd128: false, f16c: false
temp: 0.80 repeat-penalty: 1.10 repeat-last-n: 64
Running on CPU, to run on GPU, build this example with `--features cuda`
loaded 325 tensors (1.79GB) in 0.20s
model built
Write a function to count prime numbers up to N.
# Here, N = 20
# Output: 8
'''
def prime_numbers(n):
flag = [True for i in range(n+1)]
primes = []
for i in range(2, n + 1):
if flag[i^C
I'm confused why this implies that avx
won't work on my machine (avx: true
becomes avx: false
and is the only noticeable difference between the -Ctarget-cpu=native
and not builds' outputs), though? It looks like my machine supports avx
:
grep "avx" /proc/cpuinfo
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer xsave avx lahf_lm epb pti tpr_shadow flexpriority ept vpid xsaveopt dtherm arat pln pts vnmi
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer xsave avx lahf_lm epb pti tpr_shadow flexpriority ept vpid xsaveopt dtherm arat pln pts vnmi
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer xsave avx lahf_lm epb pti tpr_shadow flexpriority ept vpid xsaveopt dtherm arat pln pts vnmi
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer xsave avx lahf_lm epb pti tpr_shadow flexpriority ept vpid xsaveopt dtherm arat pln pts vnmi
Unless avx: true
in candle's output means both avx
and avx2
features are called, which is why it throws Illegal instruction, since my CPU does not support avx2
?
Hey there! I'm reaching out from a machine running the Intel i3-2120 processor.
The
llm-rs
library (https://github.com/rustformers/llm) can handle inference just fine on this processor (previously failed on older CPUs, works fine nowadays), butcandle
fails. I tried:You can see the failure output. An ML/AI library throwing "Illegal instruction" usually means your program needs features that your CPU doesn't offer, so I tried recompiling without AVX2 instructions enabled (https://github.com/huggingface/candle/issues/622#issuecomment-1694726330), but it still failed:
Any ideas? I realize it might be a code-specific issue with the way
candle
is written, but I'm hoping it would have a simple fix (with something like a feature flag to check for CPU feature presence that enables certain code so this doesn't throw "Illegal instruction"). Thanks in advance for any reply, and thanks to all the maintainers for maintaining such an awesome ML/AI library!TL;DR: Candle uses unsupported CPU features in its quantized LLM inference that is encountered when testing the prewritten examples. This causes execution to fail on older CPUs, such as the Intel i3-2120.