Open galanhir opened 2 months ago
Thank you for your report @galanhir ; can you send us a pull request with a fix?
MSVC
and ICC
define _MM_HINT_T0
as 1
.
GCC
, CLANG
, and ICX
define it as 3
.
I will confirm that SIMDE_MM_HINT_T0
produces the wrong prefetch (PREFETCHT2
) under gcc.
But the correct PREFETCHT0
under msvc.
I hereby confirm that MSVC
and ICC
header files (xmmintrin.h
) are defining _MM_HINT_T0
as 1
, whereas both GCC
, CLANG
, and XCode
header files (also xmmintrin.h
) are defining _MM_HINT_T0
as 3
. I also confirm that all those compilers generate PREFETCHT0
when compiling _mm_prefetch(ptr, _MM_HINT_T0)
.
I have also noted that Intel documentation (https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_prefetch) states:
_MM_HINT_T0 // 3, move data using the T0 hint. The PREFETCHT0 instruction will be generated.
...
_MM_HINT_T2 // 1, move data using the T2 hint. The PREFETCHT2 instruction will be generated.
which seems to indicate at least a mismatch btw MSVC
header and Intel documentation.
At the end of the day, that sounds like a great deal of confusion between supported compilers.
Maybe, this calls for a delicate compiler filtering, with different definitions for SIMDE_MM_HINT_T0
depending on the compiler. But not clear what "default" behaviour shall be chosen and which compilers to filter out. A simple approach would be to keep the default as is, and revert the definittion when CLANG
or GCC
are detected, but, such simple approach may miss other x86_64 compilers detected as native by SIMDE and still behaving like CLANG
or GCC
.
I am not clear about the situation of MSVN
, when CLANG-CL
is used as back-end (I guess the same than CLANG
, but i have not verified).
A session of GODBOLT
to test usual configurations: https://godbolt.org/z/hM7c6E6zj.
I have verified that the mapping on __builtin_prefetch()
directive follows (at least) CLANG
translation usage, i.e. _mm_prefetch(ptr, _MM_HINT_T0)
maps __builtin_prefetch(p, 0, 3)
as expected and seems therefore correct on ARM-CLANG
. I have not verified on other non x86_64 targets (but, maybe, it matters less).
Maybe, this calls for a delicate compiler filtering
maybe, something like:
#if defined(SIMDE_X86_SSE_NATIVE) && defined(_MM_HINT_T0)
# define SIMDE_MM_HINT_T0 _MM_HINT_T0
#else
# define SIMDE_MM_HINT_T0 3
#endif
Might get tricky though, some of those defines are extensions, or otherwise not always defined by every compiler... ?
Intel documentation
There are Intel produced pdf's out there that also give the currently used enums.
I think there is some copy & paste going with the online guide...
For example, they swapped the arguments of _mm_test_mix_ones_zeros
to reflex a typo in llvm/gcc headers.
But in the headers, despite the incorrect naming, the args still mapped to testc
in the correct order...
so the guide is now incorrect.
On one hand, the prefetch macros are originally defined
_MM_HINT_T0
(resp,T2
) by Intel as:On the other hand, SIMDE is defining
eventually identifying one for the other in case of
SIMDE_X86_SSE_ENABLE_NATIVE_ALIASES
:This is probably wrong, as it ends up swapping
_MM_HINT_T0
and_MM_HINT_T2
when built nativelly for Intel x86 targets.