jrouwe / JoltPhysics

A multi core friendly rigid body physics and collision detection library. Written in C++. Suitable for games and VR applications. Used by Horizon Forbidden West.
MIT License
6k stars 374 forks source link

-mfpmath=sse is not added in 32bits build #1133

Closed SirLynix closed 1 month ago

SirLynix commented 1 month ago

Hi,

-mfpmath=sse is not added in 32bits build, and from a comment in Jolt.cmake this seems to be a mistake

https://github.com/jrouwe/JoltPhysics/blob/4cd52055e09160affcafa557b39520331bf0d034/Jolt/Jolt.cmake#L647

This block is still part of the 64bits check, same for EMIT_X86_INSTRUCTION_SET_DEFINITIONS.

Also, shouldn't -msse2 be added too in 32bits build? -mfpmath=sse is not enough

jrouwe commented 1 month ago

Thanks for the report, I'll investigate.

jrouwe commented 1 month ago

Hello,

The errors in the referenced ticket don't seem to be related to enabling sse:

/opt/homebrew/Cellar/mingw-w64/12.0.0/toolchain-i686/lib/gcc/i686-w64-mingw32/14.1.0/include/xmmintrin.h: In member function 'float JPH::Vec3::GetX() const':
/opt/homebrew/Cellar/mingw-w64/12.0.0/toolchain-i686/lib/gcc/i686-w64-mingw32/14.1.0/include/xmmintrin.h:973:1: error: inlining failed in call to 'always_inline' 'float _mm_cvtss_f32(__m128)': target specific option mismatch
  973 | _mm_cvtss_f32 (__m128 __A)
      | ^~~~~~~~~~~~~

It's a warning about inlining which I already fixed in 4a08d8d4e1ce7e87fdb16b0708a54a15f9a5379e (which is not in an official release yet).

The -mfpmath=sse flag seems to be sufficient for GCC to start creating SSE instructions. I've created a sample in [godbolt](https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename:'1',fontScale:14,fontUsePx:'0',j:1,lang:c%2B%2B,selection:(endColumn:6,endLineNumber:1,positionColumn:6,positionLineNumber:1,selectionStartColumn:6,selectionStartLineNumber:1,startColumn:6,startLineNumber:1),source:'float+function()%0A%7B%0A++++float+x+%3D+1.0f%3B%0A++++float+y+%3D+2.0f%3B%0A++++return+x+*+y%3B%0A%7D'),l:'5',n:'1',o:'C%2B%2B+source+%231',t:'0')),k:37.184204071621295,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:g141,filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1',verboseDemangling:'0'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:c%2B%2B,libs:!(),options:'-m32+-mfpmath%3Dsse',overrides:!(),selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'+x86-64+gcc+14.1+(Editor+%231)',t:'0')),k:29.482462595045376,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:output,i:(compilerName:'x86-64+clang+(trunk)',editorid:1,fontScale:14,fontUsePx:'0',j:1,wrap:'1'),l:'5',n:'0',o:'Output+of+x86-64+gcc+14.1+(Compiler+%231)',t:'0')),k:33.33333333333333,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4). You can see that the generated assembly uses SSE instructions. If you remove the -mfpmath=sse flag, or replace it with -msse, it starts creating old style x87 instructions.

If this is the 64-bit section you're referring to:

https://github.com/jrouwe/JoltPhysics/blob/d3c4e3058a98a4b6ca03076e8555ff4fb17899a6/Jolt/Jolt.cmake#L619

then what it is checking is CMAKE_SYSTEM_PROCESSOR, which in my experience for x86 is the processor that is compiling the code, not what architecture it is targetting. I'm hoping you're not compiling on a 32-bit OS, but if you are, or if I'm wrong about the value, we can add an extra check for that.

SirLynix commented 1 month ago

Thanks for all the details!

I will use your inline fix to try and see if it fixes the build on MinGW on macOS.

If this is the 64-bit section you're referring to:

https://github.com/jrouwe/JoltPhysics/blob/d3c4e3058a98a4b6ca03076e8555ff4fb17899a6/Jolt/Jolt.cmake#L619

then what it is checking is CMAKE_SYSTEM_PROCESSOR, which in my experience for x86 is the processor that is compiling the code, not what architecture it is targetting. I'm hoping you're not compiling on a 32-bit OS, but if you are, or if I'm wrong about the value, we can add an extra check for that.

However I find this confusing, why is the processor compiling the code any relevant? Compilers running on 32bits system can produce 64bits binaries (and compilers running on 64bits system can produce 32bits binaries as well). Only the target architecture should matter.

I find the CMake doc a bit confusing on that as well

jrouwe commented 1 month ago

Yes, some of the cmake variables are unintuitive. How are you enabling the 32-bit build from cmake? If I simply add -m32 to the CMAKE_CXX_FLAGS then CMAKE_SYSTEM_PROCESSOR remains x86_64, which to me is logical since otherwise it would somehow need to parse the compiler options and change the value of CMAKE_SYSTEM_PROCESSOR on the fly based on this option.

SirLynix commented 1 month ago

yes from what I see xmake passes -m32 and set the compiler path.

/Users/runner/.xmake/packages/c/cmake/3.29.2/c83e9763d2e54cf3bc634700a848c33b/bin/cmake -DENABLE_ALL_WARNINGS=OFF -DINTERPROCEDURAL_OPTIMIZATION=OFF -DTARGET_UNIT_TESTS=OFF -DTARGET_HELLO_WORLD=OFF -DTARGET_PERFORMANCE_TEST=OFF -DTARGET_SAMPLES=OFF -DTARGET_VIEWER=OFF -DUSE_STATIC_MSVC_RUNTIME_LIBRARY=OFF -DOVERRIDE_CXX_FLAGS=OFF -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=ON -DCROSS_PLATFORM_DETERMINISTIC=OFF -DDOUBLE_PRECISION=OFF -DGENERATE_DEBUG_SYMBOLS=OFF -DOBJECT_LAYER_BITS=16 -DUSE_AVX=OFF -DUSE_AVX2=OFF -DUSE_AVX512=OFF -DUSE_F16C=OFF -DUSE_FMADD=OFF -DUSE_LZCNT=OFF -DUSE_SSE4_1=OFF -DUSE_SSE4_2=OFF -DUSE_TZCNT=OFF -DCMAKE_INSTALL_PREFIX=/Users/runner/.xmake/packages/j/joltphysics/v5.0.0/d18c2625a84a4ebe80128018d7fb713c -DCMAKE_INSTALL_LIBDIR:PATH=lib -G "Unix Makefiles" -DCMAKE_C_COMPILER=/opt/homebrew/opt/mingw-w64/bin/i686-w64-mingw32-gcc -DCMAKE_OSX_SYSROOT= -DCMAKE_ASM_FLAGS=-m32 -DCMAKE_RC_COMPILER=/opt/homebrew/opt/mingw-w64/bin/i686-w64-mingw32-windres -DCMAKE_STATIC_LINKER_FLAGS= -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=BOTH -DCMAKE_SHARED_LINKER_FLAGS=-m32 -DCMAKE_SYSTEM_NAME=Windows -DCMAKE_FIND_ROOT_PATH_MODE_INCLUDE=BOTH -DCMAKE_ASM_COMPILER=/opt/homebrew/opt/mingw-w64/bin/i686-w64-mingw32-gcc -DCMAKE_CXX_FLAGS=-m32 -DCMAKE_RANLIB=/opt/homebrew/opt/mingw-w64/bin/i686-w64-mingw32-ranlib -DHAVE_FLAG_SEARCH_PATHS_FIRST=0 -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER -DCMAKE_FIND_ROOT_PATH=/opt/homebrew/opt/mingw-w64 -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=BOTH -DCMAKE_CXX_COMPILER=/opt/homebrew/opt/mingw-w64/bin/i686-w64-mingw32-g++ -DCMAKE_C_FLAGS=-m32 -DCMAKE_AR=/opt/homebrew/opt/mingw-w64/bin/i686-w64-mingw32-ar -DCMAKE_EXE_LINKER_FLAGS=-m32 "-DCMAKE_CXX_FLAGS_RELEASE=-m32 -O3 -DNDEBUG" "-DCMAKE_C_FLAGS_RELEASE=-m32 -O3 -DNDEBUG" -DCMAKE_EXE_LINKER_FLAGS_RELEASE=-m32 -DCMAKE_SHARED_LINKER_FLAGS_RELEASE=-m32 -DCMAKE_STATIC_LINKER_FLAGS_RELEASE= /Users/runner/.xmake/cache/packages/2406/j/joltphysics/v5.0.0/source/Build

however, from the failed build log, there are also warnings about sse not being enabled

/Users/runner/.xmake/cache/packages/2406/j/joltphysics/v5.0.0/source/Build/../Jolt/Core/Profiler.inl:71:50: warning: SSE vector return without SSE enabled changes the ABI [-Wpsabi]
   71 |                 __m128i val = _mm_loadu_si128(src);
      |                                                  ^

notice also that no SSE option is passed to the compiler:

/opt/homebrew/opt/mingw-w64/bin/i686-w64-mingw32-g++ -DJPH_BUILD_SHARED_LIBRARY -DJPH_DEBUG_RENDERER -DJPH_OBJECT_LAYER_BITS=16 -DJPH_PROFILE_ENABLED -DJPH_SHARED_LIBRARY -DJolt_EXPORTS -DNDEBUG @CMakeFiles/Jolt.dir/includes_CXX.rsp -m32 -Wno-stringop-overflow -ffp-contract=off -m32 -O3 -DNDEBUG -std=c++17 -Winvalid-pch -include /Users/runner/.xmake/cache/packages/2406/j/joltphysics/v5.0.0/source/Build/build_d18c2625/CMakeFiles/Jolt.dir/cmake_pch.hxx -MD -MT CMakeFiles/Jolt.dir/Users/runner/.xmake/cache/packages/2406/j/joltphysics/v5.0.0/source/Jolt/Core/JobSystemThreadPool.cpp.obj -MF CMakeFiles/Jolt.dir/Users/runner/.xmake/cache/packages/2406/j/joltphysics/v5.0.0/source/Jolt/Core/JobSystemThreadPool.cpp.obj.d -o CMakeFiles/Jolt.dir/Users/runner/.xmake/cache/packages/2406/j/joltphysics/v5.0.0/source/Jolt/Core/JobSystemThreadPool.cpp.obj -c /Users/runner/.xmake/cache/packages/2406/j/joltphysics/v5.0.0/source/Jolt/Core/JobSystemThreadPool.cpp

Probably because in that case, CMAKE_SYSTEM_PROCESSOR is aarch64 (Apple M1), but the target arch is x86.

it looks like detecting the target arch is not easy with cmake: https://stackoverflow.com/questions/11944060/how-to-detect-target-architecture-using-cmake

Edit : I see your fix targets only debug compilation, but here it's compiling using release

jrouwe commented 1 month ago

Cross compiling is indeed difficult using cmake.

Can you place:

MESSAGE(${CMAKE_SYSTEM_PROCESSOR})

at the end of CMakeLists.txt to see what it is reporting? I'm hoping it's not aarch64, that would be really bad.

SirLynix commented 1 month ago

Sure.

I added

MESSAGE(CMAKE_SYSTEM_PROCESSOR="${CMAKE_SYSTEM_PROCESSOR}")

and the output is:

CMAKE_SYSTEM_PROCESSOR=""

😓

https://github.com/xmake-io/xmake-repo/actions/runs/9489680619/job/26151521672?pr=4330#step:5:478

jrouwe commented 1 month ago

Ok, that was unexpected! I'm not really sure how to proceed now.

I can probably blanket disable the always_inline warning on gcc, but I can't really add -msse if I don't know what processor we're compiling for.

Do we really need to cross compile from Apple M1 to a 32-bit x86 platform? (seems like a super fringe case)

SirLynix commented 1 month ago

Do we really need to cross compile from Apple M1 to a 32-bit x86 platform? (seems like a super fringe case)

The problem is not 32-bit but target architecture detection, as CMAKE_SYSTEM_PROCESSOR is empty on MinGW 64bits on macOS too, and on MinGW 64bits on Windows too (I'm not sure why the error doesn't trigger).

Notice that MinGW has no SSE option even on Windows when targeting 64bits

/D/a/_temp/msys64/mingw64/bin/x86_64-w64-mingw32-g++.exe -DJPH_BUILD_SHARED_LIBRARY -DJPH_DEBUG_RENDERER -DJPH_OBJECT_LAYER_BITS=16 -DJPH_PROFILE_ENABLED -DJPH_SHARED_LIBRARY -DJolt_EXPORTS -DNDEBUG @CMakeFiles/Jolt.dir/includes_CXX.rsp -m64 -Wno-stringop-overflow -ffp-contract=off -m64 -O3 -DNDEBUG -std=c++17 -Winvalid-pch -include C:/Users/runneradmin/AppData/Local/.xmake/cache/packages/2406/j/joltphysics/v5.0.0/source/Build/build_07cbf58e/CMakeFiles/Jolt.dir/cmake_pch.hxx -MD -MT CMakeFiles/Jolt.dir/4560404b44facb1d4119b9286f48a459/packages/2406/j/joltphysics/v5.0.0/source/Jolt/Physics/Body/MotionProperties.cpp.obj -MF CMakeFiles/Jolt.dir/4560404b44facb1d4119b9286f48a459/packages/2406/j/joltphysics/v5.0.0/source/Jolt/Physics/Body/MotionProperties.cpp.obj.d -o CMakeFiles/Jolt.dir/4560404b44facb1d4119b9286f48a459/packages/2406/j/joltphysics/v5.0.0/source/Jolt/Physics/Body/MotionProperties.cpp.obj -c /C/Users/runneradmin/AppData/Local/.xmake/cache/packages/2406/j/joltphysics/v5.0.0/source/Jolt/Physics/Body/MotionProperties.cpp

I suppose the best option would be to detect arch from compiler, or to expect the user to set CMAKE_SYSTEM_PROCESSOR from the command-line.

I can submit a pull request to xmake to set CMAKE_SYSTEM_PROCESSOR quite easily, but that won't fix 32bits compilation as sse options are only added in the x86_64 check

SirLynix commented 1 month ago

After some more research, it appears CMAKE_SYSTEM_PROCESSOR is empty because we set CMAKE_SYSTEM_NAME, which enables cross-compilation. It's our duty to set CMAKE_SYSTEM_PROCESSOR. (edit: interesting read)

I will fix that on xmake side, and I think a fix is needed here:

https://github.com/jrouwe/JoltPhysics/blob/4cd52055e09160affcafa557b39520331bf0d034/Jolt/Jolt.cmake#L619-L655

maybe just check if CMAKE_SYSTEM_PROCESSOR equals x86 too (and update the 32bits comment), as I don't think any of this is specific for 64bits.

What do you think?