Closed norbertwenzel closed 6 years ago
My question is now if this SSE dependency is actually uneccessary and if there is any interest in a PR that enables libdivide to build on ARM devices (without any SIMD support)?
Yes, the SSE2 dependency is optional. By default even on x86 CPUs SSE2 is not enabled (if you simply include libdivide.h in your program). SSE is now actually already considered legacy, the newest vector instruction set for x86 CPUs beeing AVX512. I guess there are very few people out there that are still using the SSE2 libdivide feature, but it is kept for backwards compatibility.
The Makefile will work fine for most people as most developers have an x86 CPU. The problem that I see is if you want to add functionality to the Makefile to detect whether the CPU is an x86 CPU it will probably be using some king of dirty hack?! I already thought about using CMake instead the current Makefile where CPU detection should be much simpler.
What's your suggestion for fixing the build system on ARM?
What's your suggestion for fixing the build system on ARM?
You are right, I was thinking about some detection inside the Makefile to be minimally invasive. But since you brought it up I'd rather prefer CMake. I'd need that anyway for another project so I'd be willing to give it a try if you don't mind.
Yes let's allow building on ARM by default. The build system is up to whoever wants to put in the work :)
But since you brought it up I'd rather prefer CMake.
Great choice :-) The good thing about using CMake instead of a plain Makefile is that we can also add support for Microsoft's Visual C++ compiler.
As a starting point you can re-use the CMakeLists.txt I wrote for my libpopcnt project.
Then you actually don't need to check the CPU architecture, instead you can check whether the compiler supports -msse2
on the current CPU architecture. If the compiler supports -msse2
then you add -msse2 -DLIBDIVIDE_USE_SSE2=1
to the compiler flags.
include(CheckCXXCompilerFlag)
include(CMakePushCheckState)
cmake_push_check_state()
set(CMAKE_REQUIRED_FLAGS -Werror)
check_cxx_compiler_flag(-msse2 msse2)
cmake_pop_check_state()
if(msse2)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -msse2")
add_definitions(-DLIBDIVIDE_USE_SSE2=1)
endif()
This is more portable than checking the CPU architecture because e.g. Microsoft's Visual C++ compiler does not support -msse2
on x86 CPUs. The other option is to google for a CMake module for detecting CPU architecture instruction sets (i.e. SSE, AVX, AVX2, NEON, ...). Personally I would try to keep the build system as simple as possible (only one CMakeLists.txt with no other modules), hence I favour the first option.
As a starting point you can re-use the CMakeLists.txt I wrote for my libpopcnt project.
@kimwalisch Thanks for your hint and sorry I did not read that earlier. Detecting SSE2 was the only thing I was still struggling with. I was thinking about CMakes try_compile()
with the -msse2
option enabled, but since you already have a script that is working I'll be gladly looking into that. Thanks.
But it does not detect SSE2 support for MSVC
We don't need that for now, it is just important that we don't use -msse
when compiling using MSVC ;-)
Fixed by switching build system to CMake, see CMakeLists.txt#L29.
I was testing libdivide on ARM where no SSE is available. Nevertheless I was still able to achieve a measurable speedup by using libdivide without any SIMD in my specific case.
I had to do the following changes to make the code compile:
-msse2 -DLIBDIVIDE_USE_SSE2=1
inMakefile
test_four()
function inlibdivide_test.cpp
All tests run fine afterwards.
My question is now if this SSE dependency is actually uneccessary and if there is any interest in a PR that enables libdivide to build on ARM devices (without any SIMD support)?