Closed xinsun4atlas closed 1 year ago
Although I can enable SVML on VS2019 by predefine SIMDE_ARCH_X86_SVML, however, it will automatically enable AVX512 intrinsics in svml.h
. The compile will fail if AVX512 are not available.
It will be great if SVML intrinsics can be partially enabled according to the SSE/AVX settings, that only supported intrinsics are used.
I ran into this too. I don't know why SIMDE_X86_SVML_NATIVE
is hardcoded to imply SIMDE_X86_AVX512F_NATIVE
, since there are plenty of library functions in SVML that don't depend on any AVX code.
Additionally, even if you comment out the implicit AVX512F_NATIVE
and you are targeting something without AVX, then you'll need to do this as well, because the _NATIVE
feature tests for these two functions are wrong:
index 81509e96..d10dcf3f 100644
--- a/simde/x86/svml.h
+++ b/simde/x86/svml.h
@@ -8825,7 +8825,7 @@ simde_mm_clog_ps (simde__m128 a) {
SIMDE_FUNCTION_ATTRIBUTES
simde__m256
simde_mm256_clog_ps (simde__m256 a) {
- #if defined(SIMDE_X86_SVML_NATIVE) && defined(SIMDE_X86_SSE_NATIVE)
+ #if defined(SIMDE_X86_SVML_NATIVE) && defined(SIMDE_X86_AVX_NATIVE)
return _mm256_clog_ps(a);
#else
simde__m256_private
@@ -8880,7 +8880,7 @@ simde_mm_csqrt_ps (simde__m128 a) {
SIMDE_FUNCTION_ATTRIBUTES
simde__m256
simde_mm256_csqrt_ps (simde__m256 a) {
- #if defined(SIMDE_X86_SVML_NATIVE) && defined(SIMDE_X86_SSE_NATIVE)
+ #if defined(SIMDE_X86_SVML_NATIVE) && defined(SIMDE_X86_AVX_NATIVE)
return _mm256_csqrt_ps(a);
#else
simde__m256_private
I would happily receive a PR to fix this
FYI, I tried this in https://github.com/simd-everywhere/simde/commit/7720b1de544c2900c4266027bb3b927ec177d4f6 and 208 tests failed: https://ci.appveyor.com/project/nemequ/simde/builds/46916083/job/82oh98b4w8ae98fm#L6413
FYI, I tried this in 7720b1d and 208 tests failed: https://ci.appveyor.com/project/nemequ/simde/builds/46916083/job/82oh98b4w8ae98fm#L6413
@tycho @xinsun4atlas It feels like the reasons the tests fail is due to implicit enabling of AVX512 (which is not supported by the appveyor runners); can you test my branch https://github.com/simd-everywhere/simde/tree/ci/appveyor/svml on your systems?
Surely there must be a way to mark these as "skipped" tests on hardware that doesn't support AVX512? The test runner does have counters for those types of results, after all:
Ok: 1552
Expected Fail: 0
Fail: 222
Unexpected Pass: 0
Skipped: 0
Timeout: 0
Anyway, I don't have any desktop machines with AVX512 support. I had to resort to a recent AMD Zen 4-based laptop which did support it instead:
Ok: 1774
Expected Fail: 0
Fail: 0
Unexpected Pass: 0
Skipped: 0
Timeout: 0
That felt weird, even a recent 13th Gen Intel laptop doesn't have AVX512...
@tycho Which /arch:
flags did you build with in each scenario?
If you didn't use any, then that would confirm that there seems to be AVX512 opcodes emitted by the MSVC compiler when SVML intrinsics are being used. (Or the SVML library it links against is using AVX512, see the discussion in the comments at https://stackoverflow.com/a/69418832 )
MSVC is a bit dumb and allows you to use intrinsics from any x86 feature set without any special /arch: flags.
Surely there must be a way to mark these as "skipped" tests on hardware that doesn't support AVX512?
Correct, we skip running the tests completely in CI when we use /arch:AVX
or higher:
But if a user of SIMDe is not setting /arch:AVX512
then SIMDe shouldn't be inadvertently causing AVX512 opcodes to be used
It will be great if SVML intrinsics can be partially enabled according to the SSE/AVX settings, that only supported intrinsics are used.
It looks like this is not possible for MSVC
Dear @xinsun4atlas
Hello, SIMDe only enables SVML when using intel compiler. Recent VS2019 comes with SVML as well, so it will be good to enable on that too. Thanks!
Well, we enable it unconditionally for the Intel Compiler, as it used to be unique to them
If your system can support the MSVC bundled SVML library, you can define SIMDE_ARCH_X86_SVML
to indicate this. However it seems to use some advanced AVX intrinsics, so the resulting application/library might not be very portable.
I can't find any documentation and as you can see from the conversation here we aren't having much luck with it.
MSVC is a bit dumb and allows you to use intrinsics from any x86 feature set without any special /arch: flags.
True, but SIMDe won't use native intrinsics just because immintrin.h
was included. The test errors come from outside the SVML code, like x86/avx512/shuffle
even though we don't set /arch:AVX512
so something strange is really going on
The faulting instruction in e.g. shuffle-native-c.exe is:
SIMDE_FUNCTION_ATTRIBUTES
simde__m512i
simde__m512i_from_private(simde__m512i_private v) {
00007FF76A1224F0 mov qword ptr [rsp+8],rcx
00007FF76A1224F5 push rbp
00007FF76A1224F6 sub rsp,0A0h
00007FF76A1224FD lea rbp,[rsp+60h]
00007FF76A122502 and rbp,0FFFFFFFFFFFFFFC0h
simde__m512i r;
simde_memcpy(&r, &v, sizeof(r));
00007FF76A122506 mov r8d,40h
00007FF76A12250C mov rdx,qword ptr [&v]
00007FF76A122514 lea rcx,[rbp]
00007FF76A122518 call memcpy (07FF76A16EE56h)
return r;
00007FF76A12251D vmovdqu32 zmm0,zmmword ptr [rbp] <----
}
00007FF76A122524 add rsp,0A0h
00007FF76A12252B pop rbp
00007FF76A12252C ret
So it looks like it's generating AVX512 instructions because of the simde__m512i
return type. Kind of annoying.
The last comment before this typedef is informative: https://github.com/simd-everywhere/simde/blob/6ce6030ae3b8decadac030b88a8a1868e809c9e0/simde/x86/avx512/types.h#L506-L537
It seems like the !defined(HEDLEY_INTEL_VERSION)
check is rather suspect. Since MSVC != Intel compiler, it enables it regardless of SIMDE_X86_AVX512F_NATIVE. I think that condition should just be:
#if (defined(_MM_CMPINT_GE) || defined(_MM_CMPINT_NLT)) && defined(SIMDE_X86_AVX512F_NATIVE)
Or something like that?
To add on to that, this change does indeed make the tests pass on non-AVX512 hardware (and without additional /arch: flags) with MSVC 2019:
diff --git a/simde/x86/avx512/types.h b/simde/x86/avx512/types.h
index d4e5618a..09e8b21e 100644
--- a/simde/x86/avx512/types.h
+++ b/simde/x86/avx512/types.h
@@ -527,7 +527,7 @@ typedef union {
*
* As for the ICC check, unlike other compilers, merely using the
* AVX-512 types causes ICC to generate AVX-512 instructions. */
-#if (defined(_MM_CMPINT_GE) || defined(_MM_CMPINT_NLT)) && (defined(SIMDE_X86_AVX512F_NATIVE) || !defined(HEDLEY_INTEL_VERSION))
+#if (defined(_MM_CMPINT_GE) || defined(_MM_CMPINT_NLT)) && defined(SIMDE_X86_AVX512F_NATIVE)
typedef __m512 simde__m512;
typedef __m512i simde__m512i;
typedef __m512d simde__m512d;
@tycho Can you write a test program to use `#include "simde/x86/avx512/types.h" from https://github.com/simd-everywhere/simde/tree/ci/appveyor/svml on MSVC without /arch and print the following information?
value of _MM_CMPINT_GE
value of _MM_CMPINT_NLT
value of SIMDE_X86_AVX512F_NATIVE
value of HEDLEY_INTEL_VERSION
type of simde__m512i
value of _AVX512BF16INTRIN_H_INCLUDED
value of __AVX512BF16INTRIN_H
value of SIMDE_X86_AVX512BF16_NATIVE
type of simde__m256bh
value of __AVX__
value of __AVX2__
value of _MSC_VER
value of __AVX512F__
value of SIMDE_ARCH_X86_AVX512F
value of SIMDE_X86_AVX512F_NATIVE
The last comment before this typedef is informative:
It seems like the
!defined(HEDLEY_INTEL_VERSION)
check is rather suspect. Since MSVC != Intel compiler, it enables it regardless of SIMDE_X86_AVX512F_NATIVE. I think that condition should just be:#if (defined(_MM_CMPINT_GE) || defined(_MM_CMPINT_NLT)) && defined(SIMDE_X86_AVX512F_NATIVE)
Or something like that?
Ah, I think it was meant to be && !defined(HEDLEY_INTEL_VERSION))
, AND not OR.
hmm.. maybe not; then ICC would never get the AVX512 types..
Yeah I don't know what the condition was trying to do. I think it should just be the condition I put in the diff, because the Windows version of the Intel compiler matches MSVC behavior.
Here's the original commit: https://github.com/simd-everywhere/simde/commit/4637a303d
So the original thinking was that checking for defined(_MM_CMPINT_GE) || defined(_MM_CMPINT_NLT)
was enough to detect AVX512F being available.
But Intel would still define those and emit AVX512 opcodes when any AVX512 types were used. It seems that MSVC compilers have the same problem. So maybe the answer is
diff --git a/simde/x86/avx512/types.h b/simde/x86/avx512/types.h
index d4e5618a..2df43909 100644
--- a/simde/x86/avx512/types.h
+++ b/simde/x86/avx512/types.h
@@ -527,7 +527,7 @@ typedef union {
*
* As for the ICC check, unlike other compilers, merely using the
* AVX-512 types causes ICC to generate AVX-512 instructions. */
-#if (defined(_MM_CMPINT_GE) || defined(_MM_CMPINT_NLT)) && (defined(SIMDE_X86_AVX512F_NATIVE) || !defined(HEDLEY_INTEL_VERSION))
+#if (defined(_MM_CMPINT_GE) || defined(_MM_CMPINT_NLT)) && (defined(SIMDE_X86_AVX512F_NATIVE) || !(defined(HEDLEY_INTEL_VERSION) || defined(HEDLEY_MSVC_VERSION)))
typedef __m512 simde__m512;
typedef __m512i simde__m512i;
typedef __m512d simde__m512d;
I think that would be wrong with clang-cl though, because that one behaves more sanely with regards to code generation for instruction sets not enabled in compiler flags.
(clang-cl calls itself MSVC, so you'd need something like !(defined(INTEL) || ((defined(MSVC) && !defined(CLANG))
or whatever it would be with the HEDLEY macros.)
Success!
@mr-c By the way, if you're curious, here's what clang (and clang-cl) would do on Windows with __m512 arguments and return type:
https://gcc.godbolt.org/z/qv4eMqWWa
Trying to use an AVX512 intrinsic is only allowed by clang/clang-cl if the appropriate feature is enabled too:
https://gcc.godbolt.org/z/a37vzjPa3
If only this behavior was standard for MSVC as well...
Great to know, thank you!
If you have a bit of time, it would be helpful to add ClangCL to our testing setup. Your choice of CI provider 😊🙏
Thanks for the fix! For some reasons we deprecated the project which was using this feature that I cannot test on my side. Appreciate the work and definitely will take a try if we get a chance in our new projects.
@xinsun4atlas @tycho Due to issues with the SVML pow
functions on 32-bit systems, we are thinking about not automatically enabling SVML on MSVC for x86
(but still enabling native SVML automatically on all x64
systems)
https://github.com/simd-everywhere/simde/issues/1111#issuecomment-2227349342
Do you have any comments on this proposal?
Seems reasonable. Has the issue been reported on devcom? Could just disable it until a version is released with the fix and limit it to specific compiler versions (or if Microsoft doesn't fix it, just leave it off for 32-bit x86).
Seems reasonable. Has the issue been reported on devcom?
Ping @Epixu :-)
Could just disable it until a version is released with the fix and limit it to specific compiler versions (or if Microsoft doesn't fix it, just leave it off for 32-bit x86).
I agree with this plan. I don't know when I'll get to it personally, but a PR from anyone would be welcome!
I will make a minimal repro and post an issue on devcom in a couple of days
Hello, SIMDe only enables SVML when using intel compiler. Recent VS2019 comes with SVML as well, so it will be good to enable on that too. Thanks!