turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.2k stars 235 forks source link

Illegal instruction crash due to AVX2 compile time opts #391

Closed AndrewRyanChama closed 2 months ago

AndrewRyanChama commented 3 months ago

This project added AVX2 usage and also a way to detect CPU support and fall back if it's missing. But the package still crashes anyways because the compiler opts are configured to generate AVX2 instructions via the -mavx2 option.

When this option is selected, the C code may be compiled into AVX2 instructions at the compiler's discretion and is no longer safe to run on a cpu that does not support these instructions. This happens in the ahead of time compilation here https://github.com/turboderp/exllamav2/blob/master/setup.py#L14C63-L14C69 and the jit here: https://github.com/turboderp/exllamav2/blob/f6b7faa429080cd7c7e394ec301442fbf137658f/exllamav2/ext.py#L82

If we want to build this for non-AVX2 we need to build the gcc binary without these options as well. Or potentially because the main c code parts aren't on the hot path, it may be reasonable to remove these compiler opts entirely.

turboderp commented 3 months ago

I've investigated this a bit, and it doesn't appear to be -mavx2 that's the issue but rather the combination with -O3. I'm not keen on getting rid of either flag since the AVX2 optimizations do make a substantial difference. I'll have to dig a bit more to see if Ninja has some easy option for allowing intrinsics without the automatic AVX2 vectorization. Though it's a bit tricky to test because I don't have a CPU handy that's old enough to not have AVX2 support.

turboderp commented 3 months ago

So it turns out there's a very easy solution with GCC. __attribute__((target_clones("avx2", "default"))) will compile any function with optional AVX2 support and dispatch to the right version at runtime.

MSVC seems to be a few years behind though? Not sure what the equivalent would be, or how to set up Torch/Ninja to build both an AVX2 and non-AVX2 version of the module.

turboderp commented 3 months ago

Should be fixed now for both Linux and Windows. Really messy, but I guess that's how portable code goes. :|

(In the dev branch.)

turboderp commented 2 months ago

This should be fixed now. Feel free to reopen otherwise.