Closed maxim-masterov closed 8 months ago
Hi @maxim-masterov ,
on which AMD CPU are you trying to compile ELPA and which compiler do you use? We do not see this issue on AMD EPYC 9654 with the Intel compiler.
Hi @terminationshock ,
The CPU is AMD EPYC 9654 96-Core Processor and the compiler I use is GCC-11.3.0
.
The error occurs because GCC11 doesn't support zen4
architecture. As a result, the evaluation of existence of AVX512 instructions by the configure.ac
script results in the following output:
checking whether we compile for Xeon... no
checking whether we compile for Xeon PHI... yes
This leads to execution of this branch in the configure.ac
script:
https://github.com/marekandreas/elpa/blob/c394aedd3adaa998a88dd4a09eabe53913fb5dea/configure.ac#L2430-L2435
which defines the HAVE_AVX512_XEON_PHI
variable. As a result, the following part of the source code is getting executed:
https://github.com/marekandreas/elpa/blob/c394aedd3adaa998a88dd4a09eabe53913fb5dea/src/elpa2/kernels/complex_128bit_256bit_512bit_BLOCK_template.c#L2574-L2575
The _XOR_EPI
macro is undefined (or I haven't found a place with its definition). So, the above snippet of code is getting converted to a casting of an integer into a packed double instead of invocation of the XOR operation.
I think something happened between the two releases 2020.05.001 and 2020.11.001
In 2020.05.001 the XOR operation looks correct:
https://github.com/marekandreas/elpa/blob/907b0333414272b4d99d3c136055a4e458542dda/src/elpa2/kernels/complex_128bit_256bit_512bit_BLOCK_template.c#L1729-L1732
whereas in 2020.11.001 the _SIMD
prefix is removed:
https://github.com/marekandreas/elpa/blob/e676706751a10e2748c37a81f9711bc0a2a06c8c/src/elpa2/kernels/complex_128bit_256bit_512bit_BLOCK_template.c#L2573-L2576
I tested compiling with different march
settings with the GCC 13.1 compiler (the one I had easily available on that AMD system):
I can indeed reproduce the problem with -march=knl
and -march=knm
. However, I do not see it with -march=skylake-avx512
or (of course) -march=znver4
.
@maxim-masterov could you please tell us what you set for -march
in your case? Could you try with -march=skylake-avx512
with your GCC 11.3?
I used -march=znver2
. I can try -march=skylake-avx512
, but I don't have access to Genoa CPUs at the moment. Should get it back next week. I will report as soon as I get some results.
Although, -march=skylake-avx512
(I'm sure) will help building the code, it won't resolve the problem with an undefined macro, right?
Yes, we will look at the macro anyway.
However, I am a bit puzzled that the configure detects AVX-512 instructions anyhow with -march=znver2
. In the gcc 11 manpage, it does not list AVX-512 for this flag:
znver2
AMD Family 17h core based CPUs with x86-64 instruction set support. (This supersets BMI, BMI2, CLWB, F16C, FMA, FSGSBASE, AVX, AVX2, ADCX, RDSEED, MWAITX, SHA, CLZERO, AES, PCLMUL, CX16,
MOVBE, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM, XSAVEC, XSAVES, CLFLUSHOPT, POPCNT, RDPID, WBNOINVD, and 64-bit instruction set extensions.)
@terminationshock Just to confirm, the -march=skylake-avx512
indeed allowed the build to finish
Hi, I found this issue when I was building ELPA on AMD zen4 architecture with AVX512. The build broke with multiple errors like:
According to 'configure.ac', either 'HAVE_AVX512_XEON' or 'HAVE_AVX512_XEON_PHI' macro should automatically be defined if CPU has an AVX512 support. This works on Intel CPUs, however, on AMD CPUs the configure script ill-defines
HAVE_AVX512_XEON_PHI
and leavesHAVE_AVX512_XEON
undefined. This leads to execution of the code path guarded by the#ifdef HAVE_AVX512_XEON_PHI
directive, in which an undefined macro_XOR_EPI
is used. I'm sure that the same error will also appear on Xeon-Phi as the logic similar.If I understood correctly, the
_XOR_EPI
macro should actually be replaced with_SIMD_XOR_EPI
that correctly defines_mm512_xor_epi64
for AVX512 registers.I'm attaching the patch with the fix to this issue (created from
ELPA-2023.05.001
): ELPA-2023.05.001_fix_AVX512_support.patch