simd-everywhere / simde

Implementations of SIMD instruction sets for systems which don't natively support them.
https://simd-everywhere.github.io/blog/
MIT License
2.37k stars 247 forks source link

Problem with simde_mm_loadh_pi on MSVC x64 #1016

Closed ole2410 closed 1 year ago

ole2410 commented 1 year ago

Using MSVC x64, the simde_mm_loadh_pi instruction is not processed correctly. It will not be substituted by _mm_loadh_pi, and poor performance is the result.

Only MSVC x64 compiler seems to be affected. It works properly with MSVC x86 and other compilers I tested.

Here is a short example code for Code Explorer:

simde__m128 foo(simde__m128 vec1, const simde__m64 *vec2)
{
  return simde_mm_loadh_pi(vec1, vec2);
}

Expected behaviour would be, that the simde_mm_loadh_pi instruction should result in a movhps instruction, which will happen on all compilers except for MSVC x64.

The related simde_mm_loadl_pi instruction seems to work fine.

mr-c commented 1 year ago

Hello @ole2410 and thanks for the report

According to our notes in https://github.com/simd-everywhere/simde/commit/b471fcfd5c1602d17d59038e59b1e005f3dc4fd9

MMX is not available on MSVC in 64-bit mode

And _mm_loadh_pi uses __m64 *, which is a MMX type

Maybe things have changed. So I've kicked off a test of skipping those defined(SIMDE_X86_MMX_NATIVE) requirements for SSE* intrinsics at https://ci.appveyor.com/project/mr-c/simde/builds/46987490

That failed, so I tested only ungating simde_mm_loadh_pi (as that does appear on https://learn.microsoft.com/en-us/cpp/intrinsics/x64-amd64-intrinsics-list?view=msvc-170 , unlike every other intrinsic that uses MMX types) over in https://ci.appveyor.com/project/mr-c/simde/builds/46987731 and so far that appears to be working.

If all goes well, I'll include this fix for the upcoming SIMDe 0.7.6 release later this week