Closed AutonomicPerfectionist closed 6 months ago
Thank you for the issue! I'm a bit hesitant to set LLAMA_NATIVE
to OFF
. I think this disables both AVX and AVX2, which would be a significant hit to performance, even though both are widely supported. In build-args.cmake the cmake arguments are processed (copied from llama.cpp). AVX_512 should be off by default and also not be affected by LLAMA_NATIVE
. It smells to me a bit like AVX512 is not the root cause of the problem. But I'll try disabling it and see if the problem still occurs.
There are options to enable AVX and AVX2 separately. While LLAMA_AVX512 is indeed disabled, by enabling LLAMA_NATIVE the flag -march=native
is a passed, which when run on a CPU with AVX512 will generate code using the newer EVEX encoding scheme for the legacy AVX instructions, which is not supported on non-AVX512 CPUs. At least that's how I understand it, it's very confusing
I just released version 3.0 which upgraded to the newest llama.cpp version. There was a huge amount of change, so I'm not sure if this issue still applies. To reduce old issue, I'll close this one for now, but feel free to re-open if the problem still occurs.
Attempting to use the pre-packaged
libjllama.so
on x86_64 Linux on CPUs that don't support AVX512 causes a SIGILL crash. I dumped the object code and found that an AVX2 instruction was encoded using the EVEX encoding scheme, which was added with the AVX512 extension. I don't have any CPUs that support AVX512 so I can't truly verify that this is the cause of the SIGILL crash, but it seems pretty likely. I think I've pinpointed the cause of this change to https://github.com/ggerganov/llama.cpp/pull/3273, so we probably need to addLLAMA_NATIVE=OFF
for linux x86 now