Closed ghost closed 5 years ago
The libdivide.h
SSE2 code can be considered legacy by now because Intel/AMD have released AVX, AVX2 and AVX512 in the meantime.
Personally I favor not to turn on SSE2 by default in order to increase the portability of the code and to prevent future compiler warnings. I am very confident that libdivide.h
will compile fine without any code changes using all major C/C++ compilers for the 5 years, if we turn on SSE2 by default then I am much less confident...
Question: are using using the libdivide SSE2 code?
@kimwalisch If you have SSE2 why not use it(esp. for highly vectored code)? The idea is that check should be done within libdivide.h instead of Cmake script.
Another reason to turn off SSE2 by default is to reduce compile time. Because libdivide.h
uses lots of C++ templates it increases compile time considerably for projects that include libdivide.h
. If we turn on SSE2 by default compile time will be even worse.
@kimwalisch C++ compile times are irrelevant one-time cost. Final code performance is what matters. Maybe a check for older/slower compilers should be done instead?
The __SSE2__
, etc. macros reflect whether the compiler may generate SSE2 instructions. But LIBDIVIDE_USE_SSE2
is meant to track whether the SSE2 intrinsics header emmintrin.h
is available. I'm not sure how these relate exactly, but it doesn't feel right to say that __SSE2__
determines emmintrin.h
header availability.
I think the right way to fix this is to have CMake detect the presence of emmintrin.h
or its VC equivalent.
I think the right way to fix this is to have CMake detect the presence of emmintrin.h or its VC equivalent.
I assume that most people that use libdivide.h
simply put the header into their source tree (I am doing this as well). For this reason the CMake approach is not ideal.
The ideal solution would be to use the __has_include()
macro supported by recent GCC and Clang versions. This macro is part of C++17 so newer versions of MSVC should support it as well.
I am just not sure if it is worth implementing this because I don't think there many people using the SSE2 code. If I get more feedback from users using the libdivide SSE2 code, I will implement it. For the time being I leave the SSE2 code as is.
Now that libdivide supports SSE2, AVX2 and AVX512 it is clear that we only want to enable one of these vector instruction sets. And the user has to specify which instruction set he wants to use by defining one of the macros below:
LIBDIVIDE_SSE2
LIBDIVIDE_AVX2
LIBDIVIDE_AVX512
Hence we leave everything as is (disable vector division by default).
you can use -march=native and following predefined macros inside libdivide.h:
gcc,clang and most compilers: https://stackoverflow.com/questions/28939652/how-to-detect-sse-sse2-avx-avx2-avx-512-avx-128-fma-kcvi-availability-at-compile
__SSE2_MATH__==1 __SSE2__ ==1
MSVC: https://msdn.microsoft.com/en-us/library/b0084kay.aspx_M_IX86_FP ==2
I've added a pull request to demonstrate the idea: https://github.com/ridiculousfish/libdivide/pull/40