Initial implementations of AVX2 vectorized distance functions for linux-x86_64 (L1, L2, L2SQ, cosine). It should be fairly easy to extend it to BSD and intel macs, windows will require a separate implementation. I tested the code, and I'm reasonably confident it works. I have yet to profile it, but I wanted to get eyes on it before going much further. The choice of dim=8 to switch to the vectorized path was convenient for testing but is almost certainly premature. My hope is that this will provide a significant performance increase on vectors of higher dimension. I added a CMake file to detect AVX to support building distfunc.c with avx2 enabled as well which makes up the bulk of this PR
Initial implementations of AVX2 vectorized distance functions for linux-x86_64 (L1, L2, L2SQ, cosine). It should be fairly easy to extend it to BSD and intel macs, windows will require a separate implementation. I tested the code, and I'm reasonably confident it works. I have yet to profile it, but I wanted to get eyes on it before going much further. The choice of dim=8 to switch to the vectorized path was convenient for testing but is almost certainly premature. My hope is that this will provide a significant performance increase on vectors of higher dimension. I added a CMake file to detect AVX to support building distfunc.c with avx2 enabled as well which makes up the bulk of this PR