ridiculousfish / libdivide

Official git repository for libdivide: optimized integer division
http://libdivide.com
Other
1.09k stars 77 forks source link

Unnecessary SSE2 check in Cmake #39

Closed ghost closed 5 years ago

ghost commented 6 years ago

you can use -march=native and following predefined macros inside libdivide.h:

gcc,clang and most compilers: https://stackoverflow.com/questions/28939652/how-to-detect-sse-sse2-avx-avx2-avx-512-avx-128-fma-kcvi-availability-at-compile __SSE2_MATH__==1 __SSE2__ ==1 MSVC: https://msdn.microsoft.com/en-us/library/b0084kay.aspx

_M_IX86_FP ==2

I've added a pull request to demonstrate the idea: https://github.com/ridiculousfish/libdivide/pull/40

kimwalisch commented 6 years ago

The libdivide.h SSE2 code can be considered legacy by now because Intel/AMD have released AVX, AVX2 and AVX512 in the meantime.

Personally I favor not to turn on SSE2 by default in order to increase the portability of the code and to prevent future compiler warnings. I am very confident that libdivide.h will compile fine without any code changes using all major C/C++ compilers for the 5 years, if we turn on SSE2 by default then I am much less confident...

Question: are using using the libdivide SSE2 code?

ghost commented 6 years ago

@kimwalisch If you have SSE2 why not use it(esp. for highly vectored code)? The idea is that check should be done within libdivide.h instead of Cmake script.

kimwalisch commented 6 years ago

Another reason to turn off SSE2 by default is to reduce compile time. Because libdivide.h uses lots of C++ templates it increases compile time considerably for projects that include libdivide.h. If we turn on SSE2 by default compile time will be even worse.

ghost commented 6 years ago

@kimwalisch C++ compile times are irrelevant one-time cost. Final code performance is what matters. Maybe a check for older/slower compilers should be done instead?

ridiculousfish commented 6 years ago

The __SSE2__, etc. macros reflect whether the compiler may generate SSE2 instructions. But LIBDIVIDE_USE_SSE2 is meant to track whether the SSE2 intrinsics header emmintrin.h is available. I'm not sure how these relate exactly, but it doesn't feel right to say that __SSE2__ determines emmintrin.h header availability.

ridiculousfish commented 6 years ago

I think the right way to fix this is to have CMake detect the presence of emmintrin.h or its VC equivalent.

kimwalisch commented 6 years ago

I think the right way to fix this is to have CMake detect the presence of emmintrin.h or its VC equivalent.

I assume that most people that use libdivide.h simply put the header into their source tree (I am doing this as well). For this reason the CMake approach is not ideal.

The ideal solution would be to use the __has_include() macro supported by recent GCC and Clang versions. This macro is part of C++17 so newer versions of MSVC should support it as well.

I am just not sure if it is worth implementing this because I don't think there many people using the SSE2 code. If I get more feedback from users using the libdivide SSE2 code, I will implement it. For the time being I leave the SSE2 code as is.

kimwalisch commented 5 years ago

Now that libdivide supports SSE2, AVX2 and AVX512 it is clear that we only want to enable one of these vector instruction sets. And the user has to specify which instruction set he wants to use by defining one of the macros below:

Hence we leave everything as is (disable vector division by default).