Assembler messages raised after running make

ittpmichael commented 5 years ago

I got an error after running make below this line: gcc -DDOUBLE_PREC -DVERSION=\"2.2.0\" -DUSE_UNICODE -std=c99 -m64 -g -Wsign-compare -Wall -Wextra -Wshadow -Wunused -fPIC -D_POSIX_SOURCE=200809L -D_GNU_SOURCE -D_DARWIN_C_SOURCE -O3 -ftree-vectorize -funroll-loops -fprefetch-loop-arrays --param simultaneous-prefetches=4 -fopenmp -funroll-loops -march=native -fno-strict-aliasing -Wformat=2 -Wpacked -Wnested-externs -Wpointer-arith -Wredundant-decls -Wfloat-equal -Wcast-qual -Wcast-align -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wstrict-prototypes -Wno-unused-local-typedefs -I../../io -I../../utils -c countpairs_impl_double.c -o countpairs_impl_double.o

The error messages are

/tmp/cc4t5XZ2.s: Assembler messages:
/tmp/cc4t5XZ2.s:155: Error: no such instruction: `shrx %rcx,%rsp,%rdi'
/tmp/cc4t5XZ2.s:1581: Error: no such instruction: `vinserti128 $0x1,16(%r12),%ymm5,%ymm6'
/tmp/cc4t5XZ2.s:1582: Error: suffix or operands invalid for `vpaddq'
/tmp/cc4t5XZ2.s:1602: Error: no such instruction: `vinserti128 $0x1,48(%r12),%ymm3,%ymm7'
/tmp/cc4t5XZ2.s:1603: Error: suffix or operands invalid for `vpaddq'
/tmp/cc4t5XZ2.s:1608: Error: no such instruction: `vinserti128 $0x1,16(%r12,%r13),%ymm10,%ymm11'
/tmp/cc4t5XZ2.s:1609: Error: suffix or operands invalid for `vpaddq'

and so on ... then ended up with

make[2]: *** [countpairs_impl_double.o] Error 1
make[2]: Leaving directory `/home/ittipat/src/Corrfunc/theory/DD'
make[1]: *** [DD] Error 2
make[1]: Leaving directory `/home/ittipat/src/Corrfunc/theory'
make: *** [theory] Error 2
 #

The codes ware compiled on CentOS 6.5 with gcc-4.8.0 and gcc-7.2.0. Compiling processes work fine on

Ubuntu 18.04 with GCC 7.3.0
CentOS 6.5 with intel compiler 2013 (icc)

manodeep commented 5 years ago

While I investigate, you can solve the issue temporarily by changing this line to CFLAGS :=-mno-avx2. This will disable AVX2 instructions and should resolve the assembly failure.

ittpmichael commented 5 years ago

Thank you for the answer, but it didn't work. Instead, I got the following message: gcc -DPERIODIC -DUSE_OMP -mno-axv2 -DVERSION=\"2.2.0\" -DUSE_UNICODE -std=c99 -m64 -g -Wsign-compare -Wall -Wextra -Wshadow -Wunused -fPIC -D_POSIX_SOURCE=200809L -D_GNU_SOURCE -D_DARWIN_C_SOURCE -O3 -ftree-vectorize -funroll-loops -fprefetch-loop-arrays --param simultaneous-prefetches=4 -fopenmp -funroll-loops -march=native -fno-strict-aliasing -Wformat=2 -Wpacked -Wnested-externs -Wpointer-arith -Wredundant-decls -Wfloat-equal -Wcast-qual -Wcast-align -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wstrict-prototypes -Wno-unused-local-typedefs -I../../io -I../../utils -c DD.c -o DD.o

gcc: error: unrecognized command line option ‘-mno-axv2’; did you mean ‘-mno-avx’?

The gcc also do not recognize the flag -mno-avx. gcc: error: unrecognized command line option ‘-mno-axv’; did you mean ‘-mno-abm’?

lgarrison commented 5 years ago

So gcc is suggesting a flag that it doesn't actually know about? Weird!

You wrote above that the codes were compiled with gcc-4.8.0 and gcc-7.2.0. Do you mean that both of those versions are failing?

The CentOS platform is x86, right? Not PowerPC or another architecture? uname -a should give enough info.

manodeep commented 5 years ago

@ittpmichael There is a typo. The flag should be -mno-avx2, currently it is -mno-axv2.

Will you also please add the output of these two commands:

cat /proc/cpuinfo | grep flags | tail -n 1
gcc -march=native -dM -E - < /dev/null | grep -i avx

ittpmichael commented 5 years ago

@manodeep @lgarrison I was wrong with the typo. However. after fixing this flag, make still failed with another assembler message after this line: gcc -DDOUBLE_PREC -mno-avx2 -DVERSION=\"2.2.0\" -DUSE_UNICODE -std=c99 -m64 -g -Wsign-compare -Wall -Wextra -Wshadow -Wunused -fPIC -D_POSIX_SOURCE=200809L -D_GNU_SOURCE -D_DARWIN_C_SOURCE -O3 -ftree-vectorize -funroll-loops -fprefetch-loop-arrays --param simultaneous-prefetches=4 -fopenmp -funroll-loops -march=native -fno-strict-aliasing -Wformat=2 -Wpacked -Wnested-externs -Wpointer-arith -Wredundant-decls -Wfloat-equal -Wcast-qual -Wcast-align -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wstrict-prototypes -Wno-unused-local-typedefs -I../../io -I../../utils -c countpairs_impl_double.c -o countpairs_impl_double.o

/tmp/ccIlPdx5.s: Assembler messages:
/tmp/ccIlPdx5.s:160: Error: no such instruction: 'shrx %rdx,%rsp,%rdi'
make[2]: *** [countpairs_impl_double.o] Error 1
make[2]: Leaving directory `/home/ittipat/src/Corrfunc/theory/DD'
make[1]: *** [DD] Error 2
make[1]: Leaving directory `/home/ittipat/src/Corrfunc/theory'
make: *** [theory] Error 2

For the command: cat /proc/cpuinfo | grep flags | tail -n 1. It prints flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 avx2 smep bmi2 erms invpcid

and for gcc, gcc -march=native -dM -E - < /dev/null | grep -i avx. It prints

#define __core_avx2__ 1
#define __AVX__ 1
#define __AVX2__ 1
#define __core_avx2 1

manodeep commented 5 years ago

Thanks for the response!

That shrx instruction looks like a bit shift opcode. Might need to disable all relevant bit manipulation codes. Will you post the entire output of gcc -march=native -dM -E - < /dev/null. Might require some variant of -mno-bmi[1|2] to fix the assembler error

ittpmichael commented 5 years ago

Sure. The full output of gcc -march=native -dM -E - < /dev/null are

#define __DBL_MIN_EXP__ (-1021)
#define __UINT_LEAST16_MAX__ 65535
#define __ATOMIC_ACQUIRE 2
#define __FLT_MIN__ 1.17549435082228750797e-38F
#define __UINT_LEAST8_TYPE__ unsigned char
#define __INTMAX_C(c) c ## L
#define __CHAR_BIT__ 8
#define __UINT8_MAX__ 255
#define __WINT_MAX__ 4294967295U
#define __ORDER_LITTLE_ENDIAN__ 1234
#define __SIZE_MAX__ 18446744073709551615UL
#define __SSE4_1__ 1
#define __WCHAR_MAX__ 2147483647
#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1
#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1
#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1
#define __DBL_DENORM_MIN__ ((double)4.94065645841246544177e-324L)
#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1
#define __GCC_ATOMIC_CHAR_LOCK_FREE 2
#define __FLT_EVAL_METHOD__ 0
#define __unix__ 1
#define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2
#define __x86_64 1
#define __core_avx2__ 1
#define __UINT_FAST64_MAX__ 18446744073709551615UL
#define __SIG_ATOMIC_TYPE__ int
#define __DBL_MIN_10_EXP__ (-307)
#define __FINITE_MATH_ONLY__ 0
#define __GNUC_PATCHLEVEL__ 2
#define __UINT_FAST8_MAX__ 255
#define __DEC64_MAX_EXP__ 385
#define __INT8_C(c) c
#define __UINT_LEAST64_MAX__ 18446744073709551615UL
#define __SHRT_MAX__ 32767
#define __LDBL_MAX__ 1.18973149535723176502e+4932L
#define __POPCNT__ 1
#define __UINT_LEAST8_MAX__ 255
#define __GCC_ATOMIC_BOOL_LOCK_FREE 2
#define __UINTMAX_TYPE__ long unsigned int
#define __linux 1
#define __DEC32_EPSILON__ 1E-6DF
#define __unix 1
#define __UINT32_MAX__ 4294967295U
#define __LDBL_MAX_EXP__ 16384
#define __WINT_MIN__ 0U
#define __linux__ 1
#define __SCHAR_MAX__ 127
#define __WCHAR_MIN__ (-__WCHAR_MAX__ - 1)
#define __INT64_C(c) c ## L
#define __DBL_DIG__ 15
#define __GCC_ATOMIC_POINTER_LOCK_FREE 2
#define __SIZEOF_INT__ 4
#define __SIZEOF_POINTER__ 8
#define __USER_LABEL_PREFIX__ 
#define __STDC_HOSTED__ 1
#define __LDBL_HAS_INFINITY__ 1
#define __FLT_EPSILON__ 1.19209289550781250000e-7F
#define __ABM__ 1
#define __LDBL_MIN__ 3.36210314311209350626e-4932L
#define __DEC32_MAX__ 9.999999E96DF
#define __F16C__ 1
#define __INT32_MAX__ 2147483647
#define __SIZEOF_LONG__ 8
#define __UINT16_C(c) c
#define __DECIMAL_DIG__ 21
#define __gnu_linux__ 1
#define __LDBL_HAS_QUIET_NAN__ 1
#define __GNUC__ 4
#define __MMX__ 1
#define __FLT_HAS_DENORM__ 1
#define __SIZEOF_LONG_DOUBLE__ 16
#define __XSAVEOPT__ 1
#define __BIGGEST_ALIGNMENT__ 32
#define __DBL_MAX__ ((double)1.79769313486231570815e+308L)
#define __INT_FAST32_MAX__ 9223372036854775807L
#define __DBL_HAS_INFINITY__ 1
#define __SSE4_2__ 1
#define __DEC32_MIN_EXP__ (-94)
#define __INT_FAST16_TYPE__ long int
#define __LDBL_HAS_DENORM__ 1
#define __DEC128_MAX__ 9.999999999999999999999999999999999E6144DL
#define __INT_LEAST32_MAX__ 2147483647
#define __DEC32_MIN__ 1E-95DF
#define __DBL_MAX_EXP__ 1024
#define __DEC128_EPSILON__ 1E-33DL
#define __SSE2_MATH__ 1
#define __ATOMIC_HLE_RELEASE 131072
#define __PTRDIFF_MAX__ 9223372036854775807L
#define __amd64 1
#define __AVX__ 1
#define __ATOMIC_HLE_ACQUIRE 65536
#define __LONG_LONG_MAX__ 9223372036854775807LL
#define __SIZEOF_SIZE_T__ 8
#define __LZCNT__ 1
#define __SIZEOF_WINT_T__ 4
#define __GCC_HAVE_DWARF2_CFI_ASM 1
#define __GXX_ABI_VERSION 1002
#define __FLT_MIN_EXP__ (-125)
#define __INT_FAST64_TYPE__ long int
#define __FP_FAST_FMAF 1
#define __DBL_MIN__ ((double)2.22507385850720138309e-308L)
#define __PCLMUL__ 1
#define __LP64__ 1
#define __DECIMAL_BID_FORMAT__ 1
#define __DEC128_MIN__ 1E-6143DL
#define __REGISTER_PREFIX__ 
#define __UINT16_MAX__ 65535
#define __DBL_HAS_DENORM__ 1
#define __UINT8_TYPE__ unsigned char
#define __XSAVE__ 1
#define __NO_INLINE__ 1
#define __FLT_MANT_DIG__ 24
#define __VERSION__ "4.8.2"
#define __UINT64_C(c) c ## UL
#define __FMA__ 1
#define __GCC_ATOMIC_INT_LOCK_FREE 2
#define __FLOAT_WORD_ORDER__ __ORDER_LITTLE_ENDIAN__
#define __AVX2__ 1
#define __INT32_C(c) c
#define __DEC64_EPSILON__ 1E-15DD
#define __ORDER_PDP_ENDIAN__ 3412
#define __DEC128_MIN_EXP__ (-6142)
#define __INT_FAST32_TYPE__ long int
#define __UINT_LEAST16_TYPE__ short unsigned int
#define unix 1
#define __INT16_MAX__ 32767
#define __SIZE_TYPE__ long unsigned int
#define __UINT64_MAX__ 18446744073709551615UL
#define __INT8_TYPE__ signed char
#define __ELF__ 1
#define __FLT_RADIX__ 2
#define __INT_LEAST16_TYPE__ short int
#define __LDBL_EPSILON__ 1.08420217248550443401e-19L
#define __UINTMAX_C(c) c ## UL
#define __SSE_MATH__ 1
#define __SIG_ATOMIC_MAX__ 2147483647
#define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2
#define __SIZEOF_PTRDIFF_T__ 8
#define __x86_64__ 1
#define __DEC32_SUBNORMAL_MIN__ 0.000001E-95DF
#define __INT_FAST16_MAX__ 9223372036854775807L
#define __UINT_FAST32_MAX__ 18446744073709551615UL
#define __UINT_LEAST64_TYPE__ long unsigned int
#define __FLT_HAS_QUIET_NAN__ 1
#define __FLT_MAX_10_EXP__ 38
#define __LONG_MAX__ 9223372036854775807L
#define __DEC128_SUBNORMAL_MIN__ 0.000000000000000000000000000000001E-6143DL
#define __FLT_HAS_INFINITY__ 1
#define __UINT_FAST16_TYPE__ long unsigned int
#define __DEC64_MAX__ 9.999999999999999E384DD
#define __CHAR16_TYPE__ short unsigned int
#define __PRAGMA_REDEFINE_EXTNAME 1
#define __INT_LEAST16_MAX__ 32767
#define __DEC64_MANT_DIG__ 16
#define __INT64_MAX__ 9223372036854775807L
#define __UINT_LEAST32_MAX__ 4294967295U
#define __GCC_ATOMIC_LONG_LOCK_FREE 2
#define __INT_LEAST64_TYPE__ long int
#define __INT16_TYPE__ short int
#define __INT_LEAST8_TYPE__ signed char
#define __DEC32_MAX_EXP__ 97
#define __INT_FAST8_MAX__ 127
#define __INTPTR_MAX__ 9223372036854775807L
#define linux 1
#define __SSE2__ 1
#define __SSSE3__ 1
#define __RDRND__ 1
#define __LDBL_MANT_DIG__ 64
#define __DBL_HAS_QUIET_NAN__ 1
#define __SIG_ATOMIC_MIN__ (-__SIG_ATOMIC_MAX__ - 1)
#define __code_model_small__ 1
#define __INTPTR_TYPE__ long int
#define __UINT16_TYPE__ short unsigned int
#define __WCHAR_TYPE__ int
#define __SIZEOF_FLOAT__ 4
#define __UINTPTR_MAX__ 18446744073709551615UL
#define __DEC64_MIN_EXP__ (-382)
#define __INT_FAST64_MAX__ 9223372036854775807L
#define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1
#define __FLT_DIG__ 6
#define __UINT_FAST64_TYPE__ long unsigned int
#define __INT_MAX__ 2147483647
#define __amd64__ 1
#define __INT64_TYPE__ long int
#define __FLT_MAX_EXP__ 128
#define __core_avx2 1
#define __ORDER_BIG_ENDIAN__ 4321
#define __DBL_MANT_DIG__ 53
#define __INT_LEAST64_MAX__ 9223372036854775807L
#define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2
#define __DEC64_MIN__ 1E-383DD
#define __WINT_TYPE__ unsigned int
#define __UINT_LEAST32_TYPE__ unsigned int
#define __SIZEOF_SHORT__ 2
#define __SSE__ 1
#define __LDBL_MIN_EXP__ (-16381)
#define __INT_LEAST8_MAX__ 127
#define __SIZEOF_INT128__ 16
#define __LDBL_MAX_10_EXP__ 4932
#define __ATOMIC_RELAXED 0
#define __DBL_EPSILON__ ((double)2.22044604925031308085e-16L)
#define _LP64 1
#define __UINT8_C(c) c
#define __INT_LEAST32_TYPE__ int
#define __SIZEOF_WCHAR_T__ 4
#define __UINT64_TYPE__ long unsigned int
#define __INT_FAST8_TYPE__ signed char
#define __DBL_DECIMAL_DIG__ 17
#define __FXSR__ 1
#define __DEC_EVAL_METHOD__ 2
#define __UINT32_C(c) c ## U
#define __INTMAX_MAX__ 9223372036854775807L
#define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__
#define __FLT_DENORM_MIN__ 1.40129846432481707092e-45F
#define __INT8_MAX__ 127
#define __UINT_FAST32_TYPE__ long unsigned int
#define __CHAR32_TYPE__ unsigned int
#define __FLT_MAX__ 3.40282346638528859812e+38F
#define __FP_FAST_FMA 1
#define __INT32_TYPE__ int
#define __SIZEOF_DOUBLE__ 8
#define __FLT_MIN_10_EXP__ (-37)
#define __INTMAX_TYPE__ long int
#define __DEC128_MAX_EXP__ 6145
#define __FSGSBASE__ 1
#define __ATOMIC_CONSUME 1
#define __GNUC_MINOR__ 8
#define __UINTMAX_MAX__ 18446744073709551615UL
#define __DEC32_MANT_DIG__ 7
#define __DBL_MAX_10_EXP__ 308
#define __LDBL_DENORM_MIN__ 3.64519953188247460253e-4951L
#define __BMI2__ 1
#define __INT16_C(c) c
#define __STDC__ 1
#define __AES__ 1
#define __PTRDIFF_TYPE__ long int
#define __ATOMIC_SEQ_CST 5
#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 1
#define __UINT32_TYPE__ unsigned int
#define __UINTPTR_TYPE__ long unsigned int
#define __DEC64_SUBNORMAL_MIN__ 0.000000000000001E-383DD
#define __DEC128_MANT_DIG__ 34
#define __LDBL_MIN_10_EXP__ (-4931)
#define __SIZEOF_LONG_LONG__ 8
#define __GCC_ATOMIC_LLONG_LOCK_FREE 2
#define __LDBL_DIG__ 18
#define __FLT_DECIMAL_DIG__ 9
#define __UINT_FAST16_MAX__ 18446744073709551615UL
#define __GNUC_GNU_INLINE__ 1
#define __GCC_ATOMIC_SHORT_LOCK_FREE 2
#define __BMI__ 1
#define __SSE3__ 1
#define __UINT_FAST8_TYPE__ unsigned char
#define __ATOMIC_ACQ_REL 4
#define __ATOMIC_RELEASE 3

manodeep commented 5 years ago

Try with -mno-bmi2 added to CFLAGS first. If that does not fix the compile failure, then try with -mno-bmi2 -mno-bmi1

ittpmichael commented 5 years ago

Thank you very much. By adding -mno-bmi2 to CFLAGS, it was compiled successfully for both GCC-4.8 and GCC-7.2.

I would like to know what is actually a cause for this problem? CPU architecture, GCC, or a bug in your code? I presume that is GCC, because when I compiled the code with icc, it works fine.

manodeep commented 5 years ago

@ittpmichael Of course there are no bugs in Corrfunc :)

Jokes apart, there are probably no bugs in Corrfunc that show up with assembler failures. If anything, the generous use of intrinsics (think portable assembly instructions) within exposes compiler bugs. For this issue, my money is on at least gcc, and possibly a combination of gcc and OS. We have had to work around one or two compiler bugs in the software; but unfortunately I don't think we can work around an assembler issue in the software.

I am also not certain how to fix this issue for the entire repo. Seems like the combination of gcc and OS requires adding -mno-avx2 -mno-bmi2 to the compilation line, but this installation issue might occur in other compiler+OS combinations as well.

manodeep / Corrfunc

Assembler messages raised after running make #177