Open z-ninja opened 5 years ago
zninja notifications@github.com wrote:
Hi there, I am able to compile ref directory with make. but avx2 I getting fallowing error.
In file included from /opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8/include/immintrin.h:43, from fips202x4.c:1: fips202x4.c: In function ‘shake128x4’: /opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline ‘_mm256_xor_si256’: target specific option mismatch _mm256_xor_si256 (__m256i __A, __m256i __B) ^~~~~~~~~~~~~~~~ fips202x4.c:155:12: note: called from here s[i] = _mm256_xor_si256(s[i], s[i]); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ make: *** [PQCgenKAT_cpakem512] Error 1
I would thank for any help.
Just to double-check, compilation is using the following flags?:
/usr/bin/gcc -no-pie -Wall -Wextra -g -O3 -fomit-frame-pointer -msse2avx -mavx2 -march=native
a year and a half later - yes, these exact flags, and with this exact result.
What CPU are you building on?
One machine (4 CPUs), CentOS 8:
. . . . .
$ make
/usr/bin/gcc -no-pie -Wall -Wextra -g -O3 -fomit-frame-pointer -msse2avx -mavx2 -march=native -c keccak4x/KeccakP-1600-times4-SIMD256.c -o keccak4x/KeccakP-1600-times4-SIMD256.o
ln -sf cpakem.h api.h
/usr/bin/gcc -O3 -fomit-frame-pointer -march=native -fPIC -no-pie -o PQCgenKAT_cpakem512 -DNEWHOPE_N=512 poly.c reduce.c fips202.c verify.c cpapke.c ntt_double.s ntt.c precomp.c fips202x4.c keccak4x/KeccakP-1600-times4-SIMD256.o cpakem.c -I. rng.c PQCgenKAT_kem.c -lcrypto
In file included from /usr/lib/gcc/x86_64-redhat-linux/8/include/immintrin.h:43,
from fips202x4.c:1:
fips202x4.c: In function ‘shake128x4’:
/usr/lib/gcc/x86_64-redhat-linux/8/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline ‘_mm256_xor_si256’: target specific option mismatch
_mm256_xor_si256 (__m256i __A, __m256i __B)
^~~~~~~~~~~~~~~~
fips202x4.c:155:12: note: called from here
s[i] = _mm256_xor_si256(s[i], s[i]);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
make: *** [Makefile:26: PQCgenKAT_cpakem512] Error 1
$ botan cpuid
CPUID flags: sse2 ssse3 sse41 sse42 rdtsc aes_ni clmul
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
stepping : 2
microcode : 0x43
cpu MHz : 2299.998
cache size : 25600 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx hypervisor lahf_lm pti ssbd ibrs ibpb stibp tsc_adjust arat md_clear flush_l1d arch_capabilities
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 4599.99
clflush size : 64
cache_alignment : 64
address sizes : 43 bits physical, 48 bits virtual
power management:
Anther machine (2 CPUs), Ubuntu-20.04.1:
ur20980@uri-u20:~/src/newhope/avx2$ make
/usr/bin/gcc -no-pie -Wall -Wextra -g -O3 -fomit-frame-pointer -msse2avx -mavx2 -march=native -c keccak4x/KeccakP-1600-times4-SIMD256.c -o keccak4x/KeccakP-1600-times4-SIMD256.o
ln -sf cpakem.h api.h
/usr/bin/gcc -O3 -fomit-frame-pointer -march=native -fPIC -no-pie -o PQCgenKAT_cpakem512 -DNEWHOPE_N=512 poly.c reduce.c fips202.c verify.c cpapke.c ntt_double.s ntt.c precomp.c fips202x4.c keccak4x/KeccakP-1600-times4-SIMD256.o cpakem.c -I. rng.c PQCgenKAT_kem.c -lcrypto
In file included from /usr/lib/gcc/x86_64-linux-gnu/9/include/immintrin.h:53,
from fips202x4.c:1:
fips202x4.c: In function ‘shake128x4’:
/usr/lib/gcc/x86_64-linux-gnu/9/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline ‘_mm256_xor_si256’: target specific option mismatch
913 | _mm256_xor_si256 (__m256i __A, __m256i __B)
| ^~~~~~~~~~~~~~~~
fips202x4.c:155:12: note: called from here
155 | s[i] = _mm256_xor_si256(s[i], s[i]);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
PQCgenKAT_kem.c: In function ‘main’:
PQCgenKAT_kem.c:75:13: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result]
75 | fscanf(fp_req, "%d", &count);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
make: *** [Makefile:26: PQCgenKAT_cpakem512] Error 1
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
stepping : 2
microcode : 0x43
cpu MHz : 2299.998
cache size : 25600 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx hypervisor lahf_lm pti ssbd ibrs ibpb stibp tsc_adjust arat md_clear flush_l1d arch_capabilities
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 4599.99
clflush size : 64
cache_alignment : 64
address sizes : 43 bits physical, 48 bits virtual
power management:
It looks like GCC (and Clang) on those machines do not do what we'd expect from -march=native
. Adding -mavx2
to CFLAGS
and NISTFLAGS
alleviates my compilation problem, and the resulting binaries run.
On the E5-2650 I would expect march=native to not work. If I'm not mistaken that's a Sandy Bridge, which doesn't support AVX2, but only AVX.
Allow me to disagree. I have the same problem on MacOS with this CPU:
$ botan cpuid
CPUID flags: sse2 ssse3 sse41 sse42 avx2 avx512f avx512dq avx512bw rdtsc bmi1 bmi2 adx aes_ni clmul rdrand rdseed
Just checking, on OSX you're compiling with clang, right? What flags do you use there? The -msse2avx flag doesn't exist for clang as far as I know.
On MacOS I'm adding -mavx2 -mavx -msse2
.
I'll try to take a look, soon, but at the moment I don't have access to a machine with OSX.
The problem seems to touch all of my machines. It is - regardless of the actual architecture, clang (and sometimes GCC) doesn't enable extended instructions unless the corresponding explicit flag (like -mavx2
) is given.
I think there's no problem in adding that flag explicitly in Makefile
in avx2
subdir.
I fully agree. It was already included in CFLAGS, I now also added it to NISTFLAGS (to build PQCgetKAT). Does this solve this issue?
Hi there, I am able to compile ref directory with make. but avx2 I getting fallowing error.
In file included from /opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8/include/immintrin.h:43, from fips202x4.c:1: fips202x4.c: In function ‘shake128x4’: /opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline ‘_mm256_xor_si256’: target specific option mismatch _mm256_xor_si256 (__m256i __A, __m256i __B) ^~~~~~~~~~~~~~~~ fips202x4.c:155:12: note: called from here s[i] = _mm256_xor_si256(s[i], s[i]); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ make: *** [PQCgenKAT_cpakem512] Error 1
I would thank for any help.