simd-everywhere / simde

Implementations of SIMD instruction sets for systems which don't natively support them.
https://simd-everywhere.github.io/blog/
MIT License
2.28k stars 237 forks source link

SIMDE is swapping the definition of SIMDE_MM_HINT_T0 and T2 #1186

Open galanhir opened 2 months ago

galanhir commented 2 months ago

On one hand, the prefetch macros are originally defined _MM_HINT_T0 (resp, T2) by Intel as:

xmmintrin.h:#define _MM_HINT_T0  3
...
xmmintrin.h:#define _MM_HINT_T2  1

On the other hand, SIMDE is defining

  #define SIMDE_MM_HINT_T0   1
...
  #define SIMDE_MM_HINT_T2   3

eventually identifying one for the other in case of SIMDE_X86_SSE_ENABLE_NATIVE_ALIASES:

  #undef  _MM_HINT_T0
  #define _MM_HINT_T0   SIMDE_MM_HINT_T0
...
  #undef  _MM_HINT_T2
  #define _MM_HINT_T2   SIMDE_MM_HINT_T2

This is probably wrong, as it ends up swapping _MM_HINT_T0 and _MM_HINT_T2 when built nativelly for Intel x86 targets.

mr-c commented 2 months ago

Thank you for your report @galanhir ; can you send us a pull request with a fix?

aqrit commented 2 months ago

MSVC and ICC define _MM_HINT_T0 as 1. GCC, CLANG, and ICX define it as 3.

I will confirm that SIMDE_MM_HINT_T0 produces the wrong prefetch (PREFETCHT2) under gcc. But the correct PREFETCHT0 under msvc.

galanhir commented 1 month ago

I hereby confirm that MSVC and ICC header files (xmmintrin.h) are defining _MM_HINT_T0 as 1, whereas both GCC, CLANG, and XCode header files (also xmmintrin.h) are defining _MM_HINT_T0 as 3. I also confirm that all those compilers generate PREFETCHT0 when compiling _mm_prefetch(ptr, _MM_HINT_T0).

I have also noted that Intel documentation (https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_prefetch) states:

    _MM_HINT_T0 // 3, move data using the T0 hint. The PREFETCHT0 instruction will be generated.
...
    _MM_HINT_T2 // 1, move data using the T2 hint. The PREFETCHT2 instruction will be generated.

which seems to indicate at least a mismatch btw MSVC header and Intel documentation.

At the end of the day, that sounds like a great deal of confusion between supported compilers.

Maybe, this calls for a delicate compiler filtering, with different definitions for SIMDE_MM_HINT_T0 depending on the compiler. But not clear what "default" behaviour shall be chosen and which compilers to filter out. A simple approach would be to keep the default as is, and revert the definittion when CLANG or GCC are detected, but, such simple approach may miss other x86_64 compilers detected as native by SIMDE and still behaving like CLANG or GCC.

I am not clear about the situation of MSVN, when CLANG-CL is used as back-end (I guess the same than CLANG, but i have not verified).

A session of GODBOLT to test usual configurations: https://godbolt.org/z/hM7c6E6zj.

I have verified that the mapping on __builtin_prefetch() directive follows (at least) CLANG translation usage, i.e. _mm_prefetch(ptr, _MM_HINT_T0) maps __builtin_prefetch(p, 0, 3) as expected and seems therefore correct on ARM-CLANG. I have not verified on other non x86_64 targets (but, maybe, it matters less).

aqrit commented 1 month ago

Maybe, this calls for a delicate compiler filtering

maybe, something like:

#if defined(SIMDE_X86_SSE_NATIVE) && defined(_MM_HINT_T0)
#  define SIMDE_MM_HINT_T0 _MM_HINT_T0
#else 
#  define SIMDE_MM_HINT_T0 3
#endif

Might get tricky though, some of those defines are extensions, or otherwise not always defined by every compiler... ?

Intel documentation

There are Intel produced pdf's out there that also give the currently used enums.

I think there is some copy & paste going with the online guide... For example, they swapped the arguments of _mm_test_mix_ones_zeros to reflex a typo in llvm/gcc headers. But in the headers, despite the incorrect naming, the args still mapped to testc in the correct order... so the guide is now incorrect.