jrouwe / JoltPhysics

A multi core friendly rigid body physics and collision detection library. Written in C++. Suitable for games and VR applications. Used by Horizon Forbidden West.
MIT License
6.77k stars 452 forks source link

`-mfpmath=sse` is not supported by Emscripten #1173

Closed tetious closed 4 months ago

tetious commented 4 months ago

Hi!

The change in this commit to enable sse with 32 bit builds seems to have broken building with Emscripten: https://github.com/godot-jolt/jolt/commit/ebccdcbfae6b60e37480aa203d0781d26e437fbc

It throws: error: unknown FP unit 'sse'

I'm not sure if this is the correct path, but I can workaround by excluding it like this:

if (NOT MSVC AND NOT EMSCRIPTEN)
    target_compile_options(Jolt PUBLIC -mfpmath=sse)
endif()

Perhaps there is a better way? I could file a PR to make this change, if you like.

jrouwe commented 4 months ago

Sorry about that. I get an error on all the other instruction set flags too, so I opted to bypass the entire code block.

tetious commented 4 months ago

Awesome! Thank you!

mihe commented 4 months ago

I'm a bit confused by this.

Shouldn't Jolt be adding the -msimd128 flag itself instead, as well as any -msse* flags, based on the relevant CMake options? From what I can tell from Emscripten's documentation it still supports all these flags just fine, albeit with some gaps/emulation in the instructions themselves by the looks of it.

Clearly mfpmath=sse isn't supported though, judging by the error here, but seeing as how the user is forced to append the rest of them through something like CMAKE_CXX_FLAGS anyway I don't quite see the point in omitting them from Jolt altogether.

I'm also curious about the specific mention of -msse4.2 here:

https://github.com/jrouwe/JoltPhysics/blob/6e0b867a246731d69e7535440aff436e54dc99bf/Jolt/Jolt.cmake#L623

Was SSE 4.2 just brought up as an example, or is there some limitation with Jolt and Emscripten that prevents targeting SSE2?

(I realize SSE 4.2 has pretty much 100% coverage these days, but Godot is still hanging on to SSE2 for now.)

EDIT: I see now that the browser coverage isn't great for WebAssembly SIMD in general, but that's beside the point I guess.

jrouwe commented 4 months ago

Shouldn't Jolt be adding the -msimd128 flag itself instead, as well as any -msse* flags, based on the relevant CMake options?

Yes it should. I've just submitted a change that adds a cmake option USE_WASM_SIMD which will add the needed flags.

Was SSE 4.2 just brought up as an example, or is there some limitation with Jolt and Emscripten that prevents targeting SSE2?

Emscripten just translates the SSE/AVX instructions to WASM SIMD instructions. The reason why I didn't pick AVX is that Jolt uses 256 bit SIMD instructions when AVX is enabled, which is not supported by emscripten. I picked the next highest which is SSE 4.2. I don't think it matters which SSE version you pick as it all gets converted to WASM instructions and the browser is responsible for translating them to native (SSE / NEON) instructions based on the CPU architecture.

It may be that rewriting Jolt to use the WASM intrinsics gives better performance, but having the SSE instructions automatically convert to WASM instructions was a lot less work.

mihe commented 4 months ago

Yes it should. I've just submitted a change that adds a cmake option USE_WASM_SIMD which will add the needed flags.

Great, thank you! Now there's just the dilemma of whether to actually enable it or not I guess. 😅

I don't think it matters which SSE version you pick as it all gets converted to WASM instructions and the browser is responsible for translating them to native (SSE / NEON) instructions based on the CPU architecture.

I'm a bit surprised that the minimum hardware requirements for WASM SIMD isn't explicitly mentioned anywhere for any of the major browsers (that I could find at least). I had to dig through the V8 source code to find that it's SSE4.1 for Chromium-based browsers. The one SSE4.2 intrinsic that's mappable to WASM SIMD (_mm_cmpgt_epi64) seems to have an SSE3 fallback in V8, as far as I can tell, so I assume they aim for everything to run on the minimum instruction set (i.e. SSE4.1).