newhopecrypto / newhope

Software of the NIST Post-Quantum submission NewHope
https://newhopecrypto.org
43 stars 9 forks source link

Unable to compile, AVX2 error on centos 7 #4

Open z-ninja opened 5 years ago

z-ninja commented 5 years ago

Hi there, I am able to compile ref directory with make. but avx2 I getting fallowing error.

In file included from /opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8/include/immintrin.h:43, from fips202x4.c:1: fips202x4.c: In function ‘shake128x4’: /opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline ‘_mm256_xor_si256’: target specific option mismatch _mm256_xor_si256 (__m256i __A, __m256i __B) ^~~~~~~~~~~~~~~~ fips202x4.c:155:12: note: called from here s[i] = _mm256_xor_si256(s[i], s[i]); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ make: *** [PQCgenKAT_cpakem512] Error 1 I would thank for any help.

cryptojedi commented 5 years ago

zninja notifications@github.com wrote:

Hi there, I am able to compile ref directory with make. but avx2 I getting fallowing error.

In file included from /opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8/include/immintrin.h:43, from fips202x4.c:1: fips202x4.c: In function ‘shake128x4’: /opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline ‘_mm256_xor_si256’: target specific option mismatch _mm256_xor_si256 (__m256i __A, __m256i __B) ^~~~~~~~~~~~~~~~ fips202x4.c:155:12: note: called from here s[i] = _mm256_xor_si256(s[i], s[i]); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ make: *** [PQCgenKAT_cpakem512] Error 1 I would thank for any help.

Just to double-check, compilation is using the following flags?:

/usr/bin/gcc -no-pie -Wall -Wextra -g -O3 -fomit-frame-pointer -msse2avx -mavx2 -march=native

mouse07410 commented 3 years ago

a year and a half later - yes, these exact flags, and with this exact result.

cryptojedi commented 3 years ago

What CPU are you building on?

mouse07410 commented 3 years ago

One machine (4 CPUs), CentOS 8:

.  .  .  .  .
$ make
/usr/bin/gcc -no-pie -Wall -Wextra -g -O3 -fomit-frame-pointer -msse2avx -mavx2 -march=native -c keccak4x/KeccakP-1600-times4-SIMD256.c -o keccak4x/KeccakP-1600-times4-SIMD256.o
ln -sf cpakem.h api.h
/usr/bin/gcc -O3 -fomit-frame-pointer -march=native -fPIC -no-pie -o PQCgenKAT_cpakem512 -DNEWHOPE_N=512 poly.c reduce.c fips202.c  verify.c cpapke.c ntt_double.s ntt.c precomp.c fips202x4.c  keccak4x/KeccakP-1600-times4-SIMD256.o cpakem.c -I. rng.c PQCgenKAT_kem.c -lcrypto
In file included from /usr/lib/gcc/x86_64-redhat-linux/8/include/immintrin.h:43,
                 from fips202x4.c:1:
fips202x4.c: In function ‘shake128x4’:
/usr/lib/gcc/x86_64-redhat-linux/8/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline ‘_mm256_xor_si256’: target specific option mismatch
 _mm256_xor_si256 (__m256i __A, __m256i __B)
 ^~~~~~~~~~~~~~~~
fips202x4.c:155:12: note: called from here
     s[i] = _mm256_xor_si256(s[i], s[i]);
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
make: *** [Makefile:26: PQCgenKAT_cpakem512] Error 1
$ botan cpuid
CPUID flags: sse2 ssse3 sse41 sse42 rdtsc aes_ni clmul
$ cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 45
model name  : Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
stepping    : 2
microcode   : 0x43
cpu MHz     : 2299.998
cache size  : 25600 KB
physical id : 0
siblings    : 1
core id     : 0
cpu cores   : 1
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx hypervisor lahf_lm pti ssbd ibrs ibpb stibp tsc_adjust arat md_clear flush_l1d arch_capabilities
bugs        : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips    : 4599.99
clflush size    : 64
cache_alignment : 64
address sizes   : 43 bits physical, 48 bits virtual
power management:

Anther machine (2 CPUs), Ubuntu-20.04.1:

ur20980@uri-u20:~/src/newhope/avx2$ make
/usr/bin/gcc -no-pie -Wall -Wextra -g -O3 -fomit-frame-pointer -msse2avx -mavx2 -march=native -c keccak4x/KeccakP-1600-times4-SIMD256.c -o keccak4x/KeccakP-1600-times4-SIMD256.o
ln -sf cpakem.h api.h
/usr/bin/gcc -O3 -fomit-frame-pointer -march=native -fPIC -no-pie -o PQCgenKAT_cpakem512 -DNEWHOPE_N=512 poly.c reduce.c fips202.c  verify.c cpapke.c ntt_double.s ntt.c precomp.c fips202x4.c  keccak4x/KeccakP-1600-times4-SIMD256.o cpakem.c -I. rng.c PQCgenKAT_kem.c -lcrypto
In file included from /usr/lib/gcc/x86_64-linux-gnu/9/include/immintrin.h:53,
                 from fips202x4.c:1:
fips202x4.c: In function ‘shake128x4’:
/usr/lib/gcc/x86_64-linux-gnu/9/include/avx2intrin.h:913:1: error: inlining failed in call to always_inline ‘_mm256_xor_si256’: target specific option mismatch
  913 | _mm256_xor_si256 (__m256i __A, __m256i __B)
      | ^~~~~~~~~~~~~~~~
fips202x4.c:155:12: note: called from here
  155 |     s[i] = _mm256_xor_si256(s[i], s[i]);
      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
PQCgenKAT_kem.c: In function ‘main’:
PQCgenKAT_kem.c:75:13: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result]
   75 |             fscanf(fp_req, "%d", &count);
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
make: *** [Makefile:26: PQCgenKAT_cpakem512] Error 1
$ cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 45
model name  : Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
stepping    : 2
microcode   : 0x43
cpu MHz     : 2299.998
cache size  : 25600 KB
physical id : 0
siblings    : 2
core id     : 0
cpu cores   : 2
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx hypervisor lahf_lm pti ssbd ibrs ibpb stibp tsc_adjust arat md_clear flush_l1d arch_capabilities
bugs        : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips    : 4599.99
clflush size    : 64
cache_alignment : 64
address sizes   : 43 bits physical, 48 bits virtual
power management:

It looks like GCC (and Clang) on those machines do not do what we'd expect from -march=native. Adding -mavx2 to CFLAGS and NISTFLAGS alleviates my compilation problem, and the resulting binaries run.

cryptojedi commented 3 years ago

On the E5-2650 I would expect march=native to not work. If I'm not mistaken that's a Sandy Bridge, which doesn't support AVX2, but only AVX.

mouse07410 commented 3 years ago

Allow me to disagree. I have the same problem on MacOS with this CPU:

$ botan cpuid
CPUID flags: sse2 ssse3 sse41 sse42 avx2 avx512f avx512dq avx512bw rdtsc bmi1 bmi2 adx aes_ni clmul rdrand rdseed
cryptojedi commented 3 years ago

Just checking, on OSX you're compiling with clang, right? What flags do you use there? The -msse2avx flag doesn't exist for clang as far as I know.

mouse07410 commented 3 years ago

On MacOS I'm adding -mavx2 -mavx -msse2.

cryptojedi commented 3 years ago

I'll try to take a look, soon, but at the moment I don't have access to a machine with OSX.

mouse07410 commented 3 years ago

The problem seems to touch all of my machines. It is - regardless of the actual architecture, clang (and sometimes GCC) doesn't enable extended instructions unless the corresponding explicit flag (like -mavx2) is given.

I think there's no problem in adding that flag explicitly in Makefile in avx2 subdir.

cryptojedi commented 3 years ago

I fully agree. It was already included in CFLAGS, I now also added it to NISTFLAGS (to build PQCgetKAT). Does this solve this issue?