manodeep / Corrfunc

⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
https://corrfunc.readthedocs.io
MIT License
163 stars 50 forks source link

Compiling on Apple M1 #241

Closed manodeep closed 11 months ago

manodeep commented 3 years ago

Is your feature request related to a problem? Please describe. Ability to compile and run Corrfunc on the (late-2020) Apple M1 laptops. Does not compile

Describe the solution you'd like Run Corrfunc on the new M1 laptops (preferably optimised kernels - which needs kernels with Neon ISA)

Describe alternatives you've considered N/A

Additional context The new M1 chip supports ARMv8 (128 bits) instruction set. Most of the codebase was written assuming __DARWIN__ implies x86_64 but that is no longer the case (unsure if the new platform defines __aarch64__ , __arm64__ or both)

@karlglazebrook has very kindly debugged the install from a source tarball (pip install failed). The current error occurs when compiling cpu_features.c

/usr/local/bin/gcc  -DVERSION=\"2.3.4\" -DUSE_UNICODE -std=c99 -g -Wsign-compare -Wall -Wextra -Wshadow -Wunused -fPIC -D_POSIX_SOURCE=200809L -D_GNU_SOURCE -D_DARWIN_C_SOURCE -O3  -ftree-vectorize -funroll-loops -fprefetch-loop-arrays --param simultaneous-prefetches=4  -Wa,-q -fopenmp -funroll-loops -march=native -fno-strict-aliasing -Wformat=2  -Wpacked  -Wnested-externs -Wpointer-arith  -Wredundant-decls  -Wfloat-equal -Wcast-qual -Wcast-align -Wmissing-declarations -Wmissing-prototypes  -Wnested-externs -Wstrict-prototypes   -Wno-unused-local-typedefs  -I../../io -I../../utils  -c ../../utils/cpu_features.c -o ../../utils/cpu_features.o
In file included from ../../utils/cpu_features.c:13:
../../utils/cpu_features.c: In function ‘runtime_instrset_detect’:
../../utils/cpu_features.h:41:4:error: impossible constraint in ‘asm’   
41 |    __asm("cpuid" : "=a"(a),"=b"(b),"=c"(c),"=d"(d) : "a"(functionnumber),"c"(0) );      
                                                                |    ^~~~~../../utils/cpu_features.h:41:4:error: impossible constraint in ‘asm’  
41 |    __asm("cpuid" : "=a"(a),"=b"(b),"=c"(c),"=d"(d) : "a"(functionnumber),"c"(0) );

One solution could be to protect this line :

#if defined(__GNUC__) || defined(__clang__)              // use inline assembly, Gnu/AT&T syntax

to something like

#if defined(__GNUC__) || defined(__clang__) and !defined(__arm64__)             // use inline assembly, Gnu/AT&T syntax
karlglazebrook commented 3 years ago

I've tried making that line both explicitly true and explicitly false. If I set it to '#if 0' I get a different error:

../../utils/cpu_features.h:49:11: error: expected ‘(’ before ‘{’ token
   49 |     __asm {
      |           ^
      |           (
../../utils/cpu_features.h:50:9: error: unknown type name ‘mov’
   50 |         mov eax, functionnumber
      |         ^~~
../../utils/cpu_features.h:51:9: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘xor’
   51 |         xor ecx, ecx
      |         ^~~
../../utils/cpu_features.h:53:9: error: unknown type name ‘mov’
   53 |         mov esi, output
      |         ^~~
../../utils/cpu_features.h:54:9: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘mov’
   54 |         mov [esi],    eax
      |         ^~~
../../utils/cpu_features.h:53:13: warning: unused variable ‘esi’ [-Wunused-variable]
   53 |         mov esi, output
      |             ^~~
../../utils/cpu_features.h:50:13: warning: unused variable ‘eax’ [-Wunused-variable]
   50 |         mov eax, functionnumber
      |             ^~~
../../utils/cpu_features.h:64:23: error: invalid storage class for function ‘xgetbv’
   64 | static inline int64_t xgetbv (int ctr) {
      |                       ^~~~~~
../../utils/cpu_features.h:86:12: warning: nested extern declaration of ‘runtime_instrset_detect’ [-Wnested-externs]
   86 | extern int runtime_instrset_detect(void);
      |            ^~~~~~~~~~~~~~~~~~~~~~~
../../utils/cpu_features.h:87:12: warning: nested extern declaration of ‘get_max_usable_isa’ [-Wnested-externs]
   87 | extern int get_max_usable_isa(void);

Is it just wrong C syntax that has not been seen due to not being invoked in many years?

manodeep commented 3 years ago

@karlglazebrook Huh - now this is erroring in the next function. I wonder if the compilation works for only these lines?

karlglazebrook commented 3 years ago

It is also erroring at line 49 which is part of the first #ifdef #else branch. Yes it is also erroring in the next lot but one thing at a time?

For lines 38..60 The #else branch it seems to me is clearly intel only from the comments, however the first branch gives the error initially reported. It says it is ' inline assembly, Gnu/AT&T syntax'. I am guessing the problem here is that neither type of assembly is correct for ARM.

manodeep commented 3 years ago

Ahh yes - thanks for clarifying! I got thrown off by the line numbers. You are quite right - the assembly syntax is different for ARM. Let me try to come up with a solution...

Documenting what I have found so far. One solution that works for clang, is to add the following line everywhere:

#if defined (__ARM_NEON__)
    return 0/FALLBACK  /* only compiles the "FALLBACK" kernels */
#elif ...

Looks like these registers might have the appropriate values. Relevant SO entry. ARM docs for writing inline assembly

manodeep commented 3 years ago

@karlglazebrook Do you mind copy-pasting the output of:

#!/bin/bash                                                                                                                                                            
declare -a compilers=("/usr/bin/clang" "/usr/local/bin/gcc -fopenmp")
for cc in "${compilers[@]}"
do
    echo "*** $cc ***"
    $cc -std=c99 -march=native -O3 -dM -E - < /dev/null
    echo "*** $cc done ***"
done

This will give a hint as to what compiler flags are being defined for the OS + instruction set.

karlglazebrook commented 3 years ago

Sure, noting I had to change the line to

eval "$cc -std=c99 -march=native -O3 -dM -E - < /dev/null"

otherwise I got the error

test.sh:6: no such file or directory: /usr/local/bin/gcc -fopenmp

1) as is

/usr/bin/clang clang: error: the clang compiler does not support '-march=native' /usr/bin/clang done /usr/local/bin/gcc -fopenmp

define DBL_MIN_EXP (-1021)

define UINT_LEAST16_MAX 0xffff

define __ARM_SIZEOF_WCHAR_T 4

define DBL_DECIMAL_DIG 17

define __ATOMIC_ACQUIRE 2

define __FLT_MIN__ 1.1754943508222875e-38F

define __GCC_IEC_559_COMPLEX 2

define UINT_LEAST8_TYPE unsigned char

define __INTMAX_C(c) c ## L

define __UINT8_MAX__ 0xff

define __WCHAR_MAX__ 0x7fffffff

define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1

define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1

define __GCC_ATOMIC_CHAR_LOCK_FREE 2

define __GCC_IEC_559 2

define FLT32X_DECIMAL_DIG 17

define FLT_EVAL_METHOD 0

define FLT64_DECIMAL_DIG 17

define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2

define UINT_FAST32_TYPE unsigned int

define UINT_FAST64_MAX 0xffffffffffffffffULL

define __DBL_MIN_10_EXP__ (-307)

define FINITE_MATH_ONLY 0

define FLT32X_MAX_EXP 1024

define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1

define __GNUC_PATCHLEVEL__ 0

define FLT32_HAS_DENORM 1

define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1

define UINT_FAST8_MAX 0xff

define __INT8_C(c) c

define __ARM_64BIT_STATE 1

define INT_LEAST8_WIDTH 8

define __INTMAX_TYPE__ long int

define UINT_LEAST64_MAX 0xffffffffffffffffULL

define __SHRT_MAX__ 0x7fff

define __LDBL_MAX__ 1.7976931348623157e+308L

define __ARM_FEATURE_IDIV 1

define __LDBL_IS_IEC_60559__ 2

define __ARM_FP 14

define DYNAMIC 1

define UINT_LEAST8_MAX 0xff

define __APPLE_CC__ 1

define __UINTMAX_TYPE__ long unsigned int

define __FLT_EVAL_METHOD_TS_18661_3__ 0

define __UINT32_MAX__ 0xffffffffU

define DBL_DENORM_MIN ((double)4.9406564584124654e-324L)

define AARCH64_CMODEL_SMALL 1

define LDBL_MAX_EXP 1024

define __CHAR_BIT__ 8

define __FLT32X_IS_IEC_60559__ 2

define INT_LEAST16_WIDTH 16

define __ARM_ALIGN_MAX_STACK_PWR 16

define __SCHAR_MAX__ 0x7f

define __DBL_MAX__ ((double)1.7976931348623157e+308L)

define WCHAR_MIN (-WCHAR_MAX - 1)

define __INT64_C(c) c ## LL

define __GCC_ATOMIC_POINTER_LOCK_FREE 2

define __SIZEOF_INT__ 4

define INT_FAST64_WIDTH 64

define __PRAGMA_REDEFINE_EXTNAME 1

define FLT32X_MANT_DIG 53

define USER_LABEL_PREFIX _

define __FLT32_MAX_10_EXP__ 38

define __STDC_HOSTED__ 1

define __DBL_DIG__ 15

define __FLT32_DIG__ 6

define __FLT_EPSILON__ 1.1920928955078125e-7F

define __SHRT_WIDTH__ 16

define __FLT32_IS_IEC_60559__ 2

define __LDBL_MIN__ 2.2250738585072014e-308L

define __WINT_TYPE__ int

define __FLT16_HAS_QUIET_NAN__ 1

define __strong

define __ARM_SIZEOF_MINIMAL_ENUM 4

define __FP_FAST_FMA 1

define FLT32X_HAS_INFINITY 1

define __INT32_MAX__ 0x7fffffff

define __INT_WIDTH__ 32

define __SIZEOF_LONG__ 8

define APPLE 1

define __UINT16_C(c) c

define __DECIMAL_DIG__ 17

define __FLT64_EPSILON__ 2.2204460492503131e-16F64

define __INT16_MAX__ 0x7fff

define __LDBL_HAS_QUIET_NAN__ 1

define FLT16_MIN_EXP (-13)

define FLT64_MANT_DIG 53

define LDBL_MANT_DIG 53

define GNUC 11

define FLT_HAS_DENORM 1

define SIZEOF_LONG_DOUBLE 8

define LDBL_MIN_EXP (-1021)

define __FLT64_MAX_10_EXP__ 308

define __FLT16_MAX_10_EXP__ 4

define __DBL_IS_IEC_60559__ 2

define FLT32_HAS_INFINITY 1

define LDBL_HAS_DENORM 1

define DBL_HAS_INFINITY 1

define __HAVE_SPECULATION_SAFE_VALUE 1

define __INTPTR_WIDTH__ 64

define FLT32X_HAS_DENORM 1

define INT_FAST16_TYPE short int

define __STRICT_ANSI__ 1

define FLT32_DECIMAL_DIG 9

define INT_LEAST32_MAX 0x7fffffff

define __weak

define DBL_MAX_EXP 1024

define __WCHAR_WIDTH__ 32

define __FLT32_MAX__ 3.4028234663852886e+38F32

define __GCC_ATOMIC_LONG_LOCK_FREE 2

define FLT16_DECIMAL_DIG 5

define __FLT_IS_IEC_60559__ 2

define __FLT32_HAS_QUIET_NAN__ 1

define LONG_LONG_MAX 0x7fffffffffffffffLL

define SIZEOF_SIZE_T 8

define SIG_ATOMIC_WIDTH 32

define __ARM_ALIGN_MAX_PWR 28

define SIZEOF_WINT_T 4

define LONG_LONG_WIDTH 64

define FLT32_MAX_EXP 128

define __ARM_FP16_FORMAT_IEEE 1

define FLT_MIN_EXP (-125)

define FLT64_NORM_MAX 1.7976931348623157e+308F64

define FLT32X_MIN_EXP (-1021)

define INT_FAST64_TYPE long long int

define __ARM_FP16_ARGS 1

define __FP_FAST_FMAF 1

define __FP_FAST_FMAL 1

define FLT64_DENORM_MIN 4.9406564584124654e-324F64

define __DBL_MIN__ ((double)2.2250738585072014e-308L)

define __ARM_FEATURE_CLZ 1

define FLT16_DENORM_MIN 5.9604644775390625e-8F16

define __SIZEOF_POINTER__ 8

define __GXX_ABI_VERSION 1015

define __SIZE_TYPE__ long unsigned int

define LP64 1

define __DBL_HAS_QUIET_NAN__ 1

define __FLT_EVAL_METHOD_C99__ 0

define __FLT32X_EPSILON__ 2.2204460492503131e-16F32x

define FLT64_MIN_EXP (-1021)

define __UINT64_MAX__ 0xffffffffffffffffULL

define LDBL_DECIMAL_DIG 17

define __FLT_MAX__ 3.4028234663852886e+38F

define aarch64 1

define __FLT64_MIN_10_EXP__ (-307)

define __REGISTER_PREFIX__

define __UINT16_MAX__ 0xffff

define LDBL_HAS_INFINITY 1

define __FLT_DIG__ 6

define DEC_EVAL_METHOD 2

define FLT_MANT_DIG 24

define __FLT16_MIN_10_EXP__ (-4)

define VERSION "11.0.0 20201128 (experimental)"

define __UINT64_C(c) c ## ULL

define __WINT_MAX__ 0x7fffffff

define __GCC_ATOMIC_INT_LOCK_FREE 2

define __FLT32X_MIN__ 2.2250738585072014e-308F32x

define FLT32_MANT_DIG 24

define AARCH64EL 1

define FLOAT_WORD_ORDER ORDER_LITTLE_ENDIAN

define FLT16_MAX_EXP 16

define __BIGGEST_ALIGNMENT__ 16

define __INT32_C(c) c

define __FLT16_DIG__ 3

define __SCHAR_WIDTH__ 8

define ORDER_PDP_ENDIAN 3412

define INT_FAST32_TYPE int

define UINT_LEAST16_TYPE short unsigned int

define ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED 110000

define __ARM_FEATURE_FMA 1

define __INT8_TYPE__ signed char

define SIG_ATOMIC_TYPE int

define __GCC_ASM_FLAG_OUTPUTS__ 1

define arm64 1

define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1

define __FLT_RADIX__ 2

define INT_LEAST16_TYPE short int

define __ARM_ARCH_PROFILE 65

define __LDBL_EPSILON__ 2.2204460492503131e-16L

define __UINTMAX_C(c) c ## UL

define __ARM_PCS_AAPCS64 1

define SIG_ATOMIC_MAX 0x7fffffff

define OPTIMIZE 1

define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2

define SIZEOF_PTRDIFF_T 8

define __arm64 1

define __ATOMIC_RELAXED 0

define INT_FAST32_WIDTH 32

define __LDBL_DIG__ 15

define __FLT64_IS_IEC_60559__ 2

define __FLT16_IS_IEC_60559__ 2

define __FLT64_DIG__ 15

define UINT_FAST32_MAX 0xffffffffU

define UINT_LEAST64_TYPE long long unsigned int

define __FLT16_EPSILON__ 9.7656250000000000e-4F16

define __FLT_HAS_QUIET_NAN__ 1

define __FLT_MAX_10_EXP__ 38

define __LONG_MAX__ 0x7fffffffffffffffL

define FLT_HAS_INFINITY 1

define DBL_HAS_DENORM 1

define UINT_FAST16_TYPE short unsigned int

define __FLT32X_HAS_QUIET_NAN__ 1

define __CHAR16_TYPE__ short unsigned int

define __SIZE_WIDTH__ 64

define __INTMAX_WIDTH__ 64

define INT_LEAST16_MAX 0x7fff

define FLT16_NORM_MAX 6.5504000000000000e+4F16

define __INT64_MAX__ 0x7fffffffffffffffLL

define FLT32_DENORM_MIN 1.4012984643248171e-45F32

define INT_LEAST64_TYPE long long int

define __INT16_TYPE__ short int

define INT_LEAST8_TYPE signed char

define __FLT16_MAX__ 6.5504000000000000e+4F16

define __STDC_VERSION__ 199901L

define INT_FAST8_MAX 0x7f

define __ARM_ARCH 8

define __INTPTR_MAX__ 0x7fffffffffffffffL

define __ARM_FEATURE_UNALIGNED 1

define __FLT64_HAS_QUIET_NAN__ 1

define __FLT32X_DIG__ 15

define __UINT8_TYPE__ unsigned char

define __PTRDIFF_WIDTH__ 64

define __CONSTANT_CFSTRINGS__ 1

define FLT64_HAS_INFINITY 1

define FLT16_HAS_INFINITY 1

define SIG_ATOMIC_MIN (-SIG_ATOMIC_MAX - 1)

define __PTRDIFF_MAX__ 0x7fffffffffffffffL

define FLT16_MANT_DIG 11

define __INTPTR_TYPE__ long int

define __UINT16_TYPE__ short unsigned int

define __WCHAR_TYPE__ int

define pic 2

define __UINTPTR_MAX__ 0xffffffffffffffffUL

define __ARM_ARCH_8A 1

define INT_FAST64_MAX 0x7fffffffffffffffLL

define FLT_NORM_MAX 3.4028234663852886e+38F

define UINT_FAST64_TYPE long long unsigned int

define __INT_MAX__ 0x7fffffff

define __INT64_TYPE__ long long int

define FLT_MAX_EXP 128

define ORDER_BIG_ENDIAN 4321

define DBL_MANT_DIG 53

define INT_LEAST64_MAX 0x7fffffffffffffffLL

define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2

define __FP_FAST_FMAF32 1

define UINT_LEAST32_TYPE unsigned int

define __SIZEOF_SHORT__ 2

define FLT32_NORM_MAX 3.4028234663852886e+38F32

define __GCC_ATOMIC_BOOL_LOCK_FREE 2

define __FLT64_MAX__ 1.7976931348623157e+308F64

define MACH 1

define __LITTLE_ENDIAN__ 1

define __WINT_WIDTH__ 32

define __FP_FAST_FMAF64 1

define INT_LEAST8_MAX 0x7f

define INT_LEAST64_WIDTH 64

define __FLT32X_MAX_10_EXP__ 308

define INT_FAST16_MAX 0x7fff

define __SIZEOF_INT128__ 16

define __FLT16_MIN__ 6.1035156250000000e-5F16

define __LDBL_MAX_10_EXP__ 308

define __DBL_EPSILON__ ((double)2.2204460492503131e-16L)

define FLT32_MIN_EXP (-125)

define _LP64 1

define __UINT8_C(c) c

define FLT64_MAX_EXP 1024

define INT_LEAST32_TYPE int

define __UINT64_TYPE__ long long unsigned int

define __ARM_NEON 1

define INT_FAST32_MAX 0x7fffffff

define __INTMAX_MAX__ 0x7fffffffffffffffL

define UINT_FAST8_TYPE unsigned char

define INT_FAST8_TYPE signed char

define GNUC_STDC_INLINE 1

define FLT64_HAS_DENORM 1

define _OPENMP 201511

define __FLT32_EPSILON__ 1.1920928955078125e-7F32

define __FP_FAST_FMAF32x 1

define FLT16_HAS_DENORM 1

define INT_FAST8_WIDTH 8

define __FLT32X_MAX__ 1.7976931348623157e+308F32x

define DBL_NORM_MAX ((double)1.7976931348623157e+308L)

define __BYTE_ORDER ORDER_LITTLE_ENDIAN__

define LDBL_DENORM_MIN 4.9406564584124654e-324L

define SIZEOF_WCHAR_T 4

define __UINT32_C(c) c ## U

define FLT_DENORM_MIN 1.4012984643248171e-45F

define WINT_MIN (-WINT_MAX - 1)

define __INT8_MAX__ 0x7f

define __LONG_WIDTH__ 64

define PIC 2

define FLT32X_NORM_MAX 1.7976931348623157e+308F32x

define __CHAR32_TYPE__ unsigned int

define __FLT32_MIN_10_EXP__ (-37)

define __ARM_FEATURE_NUMERIC_MAXMIN 1

define __INT32_TYPE__ int

define __SIZEOF_DOUBLE__ 8

define __FLT_MIN_10_EXP__ (-37)

define __FLT64_MIN__ 2.2250738585072014e-308F64

define INT_LEAST32_WIDTH 32

define __SIZEOF_FLOAT__ 4

define __ATOMIC_CONSUME 1

define __GNUC_MINOR__ 0

define INT_FAST16_WIDTH 16

define __UINTMAX_MAX__ 0xffffffffffffffffUL

define FLT32X_DENORM_MIN 4.9406564584124654e-324F32x

define __DBL_MAX_10_EXP__ 308

define __INT16_C(c) c

define __ARM_ARCH_ISA_A64 1

define STDC 1

define __PTRDIFF_TYPE__ long int

define __FLT32_MIN__ 1.1754943508222875e-38F32

define __ATOMIC_SEQ_CST 5

define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 1

define __UINT32_TYPE__ unsigned int

define __FLT32X_MIN_10_EXP__ (-307)

define __UINTPTR_TYPE__ long unsigned int

define __LDBL_MIN_10_EXP__ (-307)

define SIZEOF_LONG_LONG 8

define __GCC_ATOMIC_LLONG_LOCK_FREE 2

define FLT_DECIMAL_DIG 9

define UINT_FAST16_MAX 0xffff

define LDBL_NORM_MAX 1.7976931348623157e+308L

define __GCC_ATOMIC_SHORT_LOCK_FREE 2

define ORDER_LITTLE_ENDIAN 1234

define __SIZE_MAX__ 0xffffffffffffffffUL

define UINT_LEAST32_MAX 0xffffffffU

define __ATOMIC_ACQ_REL 4

define __ATOMIC_RELEASE 3

/usr/local/bin/gcc -fopenmp done

2) removing -march=native :

/usr/bin/clang

define _LP64 1

define AARCH64EL 1

define __AARCH64_SIMD__ 1

define __APPLE_CC__ 6000

define APPLE 1

define ARM64_ARCH_8 1

define __ARM_64BIT_STATE 1

define __ARM_ACLE 200

define __ARM_ALIGN_MAX_STACK_PWR 4

define __ARM_ARCH 8

define __ARM_ARCH_8_3__ 1

define __ARM_ARCH_ISA_A64 1

define __ARM_ARCH_PROFILE 'A'

define __ARM_FEATURE_CLZ 1

define __ARM_FEATURE_COMPLEX 1

define __ARM_FEATURE_CRC32 1

define __ARM_FEATURE_CRYPTO 1

define __ARM_FEATURE_DIRECTED_ROUNDING 1

define __ARM_FEATURE_DIV 1

define __ARM_FEATURE_FMA 1

define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1

define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1

define __ARM_FEATURE_IDIV 1

define __ARM_FEATURE_JCVT 1

define __ARM_FEATURE_LDREX 0xF

define __ARM_FEATURE_NUMERIC_MAXMIN 1

define __ARM_FEATURE_QRDMX 1

define __ARM_FEATURE_UNALIGNED 1

define __ARM_FP 0xE

define __ARM_FP16_ARGS 1

define __ARM_FP16_FORMAT_IEEE 1

define __ARM_NEON 1

define __ARM_NEON_FP 0xE

define __ARM_NEON__ 1

define __ARM_PCS_AAPCS64 1

define __ARM_SIZEOF_MINIMAL_ENUM 4

define __ARM_SIZEOF_WCHAR_T 4

define __ATOMIC_ACQUIRE 2

define __ATOMIC_ACQ_REL 4

define __ATOMIC_CONSUME 1

define __ATOMIC_RELAXED 0

define __ATOMIC_RELEASE 3

define __ATOMIC_SEQ_CST 5

define __BIGGEST_ALIGNMENT__ 8

define BLOCKS 1

define __BYTE_ORDER ORDER_LITTLE_ENDIAN__

define __CHAR16_TYPE__ unsigned short

define __CHAR32_TYPE__ unsigned int

define __CHAR_BIT__ 8

define __CLANG_ATOMIC_BOOL_LOCK_FREE 2

define __CLANG_ATOMIC_CHAR16_T_LOCK_FREE 2

define __CLANG_ATOMIC_CHAR32_T_LOCK_FREE 2

define __CLANG_ATOMIC_CHAR_LOCK_FREE 2

define __CLANG_ATOMIC_INT_LOCK_FREE 2

define __CLANG_ATOMIC_LLONG_LOCK_FREE 2

define __CLANG_ATOMIC_LONG_LOCK_FREE 2

define __CLANG_ATOMIC_POINTER_LOCK_FREE 2

define __CLANG_ATOMIC_SHORT_LOCK_FREE 2

define __CLANG_ATOMIC_WCHAR_T_LOCK_FREE 2

define __CONSTANT_CFSTRINGS__ 1

define DBL_DECIMAL_DIG 17

define DBL_DENORM_MIN 4.9406564584124654e-324

define __DBL_DIG__ 15

define __DBL_EPSILON__ 2.2204460492503131e-16

define DBL_HAS_DENORM 1

define DBL_HAS_INFINITY 1

define __DBL_HAS_QUIET_NAN__ 1

define DBL_MANT_DIG 53

define __DBL_MAX_10_EXP__ 308

define DBL_MAX_EXP 1024

define __DBL_MAX__ 1.7976931348623157e+308

define __DBL_MIN_10_EXP__ (-307)

define DBL_MIN_EXP (-1021)

define __DBL_MIN__ 2.2250738585072014e-308

define __DECIMAL_DIG LDBL_DECIMAL_DIG__

define DYNAMIC 1

define ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED 110000

define FINITE_MATH_ONLY 0

define FLT16_DECIMAL_DIG 5

define FLT16_DENORM_MIN 5.9604644775390625e-8F16

define __FLT16_DIG__ 3

define __FLT16_EPSILON__ 9.765625e-4F16

define FLT16_HAS_DENORM 1

define FLT16_HAS_INFINITY 1

define __FLT16_HAS_QUIET_NAN__ 1

define FLT16_MANT_DIG 11

define __FLT16_MAX_10_EXP__ 4

define FLT16_MAX_EXP 16

define __FLT16_MAX__ 6.5504e+4F16

define __FLT16_MIN_10_EXP__ (-4)

define FLT16_MIN_EXP (-13)

define __FLT16_MIN__ 6.103515625e-5F16

define FLT_DECIMAL_DIG 9

define FLT_DENORM_MIN 1.40129846e-45F

define __FLT_DIG__ 6

define __FLT_EPSILON__ 1.19209290e-7F

define FLT_EVAL_METHOD 0

define FLT_HAS_DENORM 1

define FLT_HAS_INFINITY 1

define __FLT_HAS_QUIET_NAN__ 1

define FLT_MANT_DIG 24

define __FLT_MAX_10_EXP__ 38

define FLT_MAX_EXP 128

define __FLT_MAX__ 3.40282347e+38F

define __FLT_MIN_10_EXP__ (-37)

define FLT_MIN_EXP (-125)

define __FLT_MIN__ 1.17549435e-38F

define __FLT_RADIX__ 2

define __GCC_ATOMIC_BOOL_LOCK_FREE 2

define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2

define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2

define __GCC_ATOMIC_CHAR_LOCK_FREE 2

define __GCC_ATOMIC_INT_LOCK_FREE 2

define __GCC_ATOMIC_LLONG_LOCK_FREE 2

define __GCC_ATOMIC_LONG_LOCK_FREE 2

define __GCC_ATOMIC_POINTER_LOCK_FREE 2

define __GCC_ATOMIC_SHORT_LOCK_FREE 2

define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1

define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2

define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1

define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1

define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1

define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1

define __GNUC_MINOR__ 2

define __GNUC_PATCHLEVEL__ 1

define GNUC_STDC_INLINE 1

define GNUC 4

define __GXX_ABI_VERSION 1002

define INT16_C_SUFFIX

define __INT16_FMTd__ "hd"

define __INT16_FMTi__ "hi"

define __INT16_MAX__ 32767

define __INT16_TYPE__ short

define INT32_C_SUFFIX

define __INT32_FMTd__ "d"

define __INT32_FMTi__ "i"

define __INT32_MAX__ 2147483647

define __INT32_TYPE__ int

define INT64_C_SUFFIX LL

define __INT64_FMTd__ "lld"

define __INT64_FMTi__ "lli"

define __INT64_MAX__ 9223372036854775807LL

define __INT64_TYPE__ long long int

define INT8_C_SUFFIX

define __INT8_FMTd__ "hhd"

define __INT8_FMTi__ "hhi"

define __INT8_MAX__ 127

define __INT8_TYPE__ signed char

define INTMAX_C_SUFFIX L

define __INTMAX_FMTd__ "ld"

define __INTMAX_FMTi__ "li"

define __INTMAX_MAX__ 9223372036854775807L

define __INTMAX_TYPE__ long int

define __INTMAX_WIDTH__ 64

define __INTPTR_FMTd__ "ld"

define __INTPTR_FMTi__ "li"

define __INTPTR_MAX__ 9223372036854775807L

define __INTPTR_TYPE__ long int

define __INTPTR_WIDTH__ 64

define INT_FAST16_FMTd "hd"

define INT_FAST16_FMTi "hi"

define INT_FAST16_MAX 32767

define INT_FAST16_TYPE short

define INT_FAST32_FMTd "d"

define INT_FAST32_FMTi "i"

define INT_FAST32_MAX 2147483647

define INT_FAST32_TYPE int

define INT_FAST64_FMTd "lld"

define INT_FAST64_FMTi "lli"

define INT_FAST64_MAX 9223372036854775807LL

define INT_FAST64_TYPE long long int

define INT_FAST8_FMTd "hhd"

define INT_FAST8_FMTi "hhi"

define INT_FAST8_MAX 127

define INT_FAST8_TYPE signed char

define INT_LEAST16_FMTd "hd"

define INT_LEAST16_FMTi "hi"

define INT_LEAST16_MAX 32767

define INT_LEAST16_TYPE short

define INT_LEAST32_FMTd "d"

define INT_LEAST32_FMTi "i"

define INT_LEAST32_MAX 2147483647

define INT_LEAST32_TYPE int

define INT_LEAST64_FMTd "lld"

define INT_LEAST64_FMTi "lli"

define INT_LEAST64_MAX 9223372036854775807LL

define INT_LEAST64_TYPE long long int

define INT_LEAST8_FMTd "hhd"

define INT_LEAST8_FMTi "hhi"

define INT_LEAST8_MAX 127

define INT_LEAST8_TYPE signed char

define __INT_MAX__ 2147483647

define LDBL_DECIMAL_DIG 17

define LDBL_DENORM_MIN 4.9406564584124654e-324L

define __LDBL_DIG__ 15

define __LDBL_EPSILON__ 2.2204460492503131e-16L

define LDBL_HAS_DENORM 1

define LDBL_HAS_INFINITY 1

define __LDBL_HAS_QUIET_NAN__ 1

define LDBL_MANT_DIG 53

define __LDBL_MAX_10_EXP__ 308

define LDBL_MAX_EXP 1024

define __LDBL_MAX__ 1.7976931348623157e+308L

define __LDBL_MIN_10_EXP__ (-307)

define LDBL_MIN_EXP (-1021)

define __LDBL_MIN__ 2.2250738585072014e-308L

define __LITTLE_ENDIAN__ 1

define LONG_LONG_MAX 9223372036854775807LL

define __LONG_MAX__ 9223372036854775807L

define LP64 1

define MACH 1

define __OBJC_BOOL_IS_BOOL 1

define __OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES 3

define __OPENCL_MEMORY_SCOPE_DEVICE 2

define __OPENCL_MEMORY_SCOPE_SUB_GROUP 4

define __OPENCL_MEMORY_SCOPE_WORK_GROUP 1

define __OPENCL_MEMORY_SCOPE_WORK_ITEM 0

define OPTIMIZE 1

define ORDER_BIG_ENDIAN 4321

define ORDER_LITTLE_ENDIAN 1234

define ORDER_PDP_ENDIAN 3412

define PIC 2

define __POINTER_WIDTH__ 64

define __PRAGMA_REDEFINE_EXTNAME 1

define __PTRDIFF_FMTd__ "ld"

define __PTRDIFF_FMTi__ "li"

define __PTRDIFF_MAX__ 9223372036854775807L

define __PTRDIFF_TYPE__ long int

define __PTRDIFF_WIDTH__ 64

define __REGISTER_PREFIX__

define __SCHAR_MAX__ 127

define __SHRT_MAX__ 32767

define SIG_ATOMIC_MAX 2147483647

define SIG_ATOMIC_WIDTH 32

define __SIZEOF_DOUBLE__ 8

define __SIZEOF_FLOAT__ 4

define __SIZEOF_INT128__ 16

define __SIZEOF_INT__ 4

define SIZEOF_LONG_DOUBLE 8

define SIZEOF_LONG_LONG 8

define __SIZEOF_LONG__ 8

define __SIZEOF_POINTER__ 8

define SIZEOF_PTRDIFF_T 8

define __SIZEOF_SHORT__ 2

define SIZEOF_SIZE_T 8

define SIZEOF_WCHAR_T 4

define SIZEOF_WINT_T 4

define __SIZE_FMTX__ "lX"

define __SIZE_FMTo__ "lo"

define __SIZE_FMTu__ "lu"

define __SIZE_FMTx__ "lx"

define __SIZE_MAX__ 18446744073709551615UL

define __SIZE_TYPE__ long unsigned int

define __SIZE_WIDTH__ 64

define SSP 1

define __STDC_HOSTED__ 1

define STDC_NO_THREADS 1

define STDC_UTF_16 1

define STDC_UTF_32 1

define __STDC_VERSION__ 199901L

define STDC 1

define __STRICT_ANSI__ 1

define UINT16_C_SUFFIX

define __UINT16_FMTX__ "hX"

define __UINT16_FMTo__ "ho"

define __UINT16_FMTu__ "hu"

define __UINT16_FMTx__ "hx"

define __UINT16_MAX__ 65535

define __UINT16_TYPE__ unsigned short

define UINT32_C_SUFFIX U

define __UINT32_FMTX__ "X"

define __UINT32_FMTo__ "o"

define __UINT32_FMTu__ "u"

define __UINT32_FMTx__ "x"

define __UINT32_MAX__ 4294967295U

define __UINT32_TYPE__ unsigned int

define UINT64_C_SUFFIX ULL

define __UINT64_FMTX__ "llX"

define __UINT64_FMTo__ "llo"

define __UINT64_FMTu__ "llu"

define __UINT64_FMTx__ "llx"

define __UINT64_MAX__ 18446744073709551615ULL

define __UINT64_TYPE__ long long unsigned int

define UINT8_C_SUFFIX

define __UINT8_FMTX__ "hhX"

define __UINT8_FMTo__ "hho"

define __UINT8_FMTu__ "hhu"

define __UINT8_FMTx__ "hhx"

define __UINT8_MAX__ 255

define __UINT8_TYPE__ unsigned char

define UINTMAX_C_SUFFIX UL

define __UINTMAX_FMTX__ "lX"

define __UINTMAX_FMTo__ "lo"

define __UINTMAX_FMTu__ "lu"

define __UINTMAX_FMTx__ "lx"

define __UINTMAX_MAX__ 18446744073709551615UL

define __UINTMAX_TYPE__ long unsigned int

define __UINTMAX_WIDTH__ 64

define __UINTPTR_FMTX__ "lX"

define __UINTPTR_FMTo__ "lo"

define __UINTPTR_FMTu__ "lu"

define __UINTPTR_FMTx__ "lx"

define __UINTPTR_MAX__ 18446744073709551615UL

define __UINTPTR_TYPE__ long unsigned int

define __UINTPTR_WIDTH__ 64

define UINT_FAST16_FMTX "hX"

define UINT_FAST16_FMTo "ho"

define UINT_FAST16_FMTu "hu"

define UINT_FAST16_FMTx "hx"

define UINT_FAST16_MAX 65535

define UINT_FAST16_TYPE unsigned short

define UINT_FAST32_FMTX "X"

define UINT_FAST32_FMTo "o"

define UINT_FAST32_FMTu "u"

define UINT_FAST32_FMTx "x"

define UINT_FAST32_MAX 4294967295U

define UINT_FAST32_TYPE unsigned int

define UINT_FAST64_FMTX "llX"

define UINT_FAST64_FMTo "llo"

define UINT_FAST64_FMTu "llu"

define UINT_FAST64_FMTx "llx"

define UINT_FAST64_MAX 18446744073709551615ULL

define UINT_FAST64_TYPE long long unsigned int

define UINT_FAST8_FMTX "hhX"

define UINT_FAST8_FMTo "hho"

define UINT_FAST8_FMTu "hhu"

define UINT_FAST8_FMTx "hhx"

define UINT_FAST8_MAX 255

define UINT_FAST8_TYPE unsigned char

define UINT_LEAST16_FMTX "hX"

define UINT_LEAST16_FMTo "ho"

define UINT_LEAST16_FMTu "hu"

define UINT_LEAST16_FMTx "hx"

define UINT_LEAST16_MAX 65535

define UINT_LEAST16_TYPE unsigned short

define UINT_LEAST32_FMTX "X"

define UINT_LEAST32_FMTo "o"

define UINT_LEAST32_FMTu "u"

define UINT_LEAST32_FMTx "x"

define UINT_LEAST32_MAX 4294967295U

define UINT_LEAST32_TYPE unsigned int

define UINT_LEAST64_FMTX "llX"

define UINT_LEAST64_FMTo "llo"

define UINT_LEAST64_FMTu "llu"

define UINT_LEAST64_FMTx "llx"

define UINT_LEAST64_MAX 18446744073709551615ULL

define UINT_LEAST64_TYPE long long unsigned int

define UINT_LEAST8_FMTX "hhX"

define UINT_LEAST8_FMTo "hho"

define UINT_LEAST8_FMTu "hhu"

define UINT_LEAST8_FMTx "hhx"

define UINT_LEAST8_MAX 255

define UINT_LEAST8_TYPE unsigned char

define USER_LABEL_PREFIX _

define VERSION "Apple LLVM 12.0.0 (clang-1200.0.32.28)"

define __WCHAR_MAX__ 2147483647

define __WCHAR_TYPE__ int

define __WCHAR_WIDTH__ 32

define __WINT_MAX__ 2147483647

define __WINT_TYPE__ int

define __WINT_WIDTH__ 32

define aarch64 1

define apple_build_version 12000032

define __arm64 1

define arm64 1

define block attribute((blocks__(byref)))

define clang 1

define __clang_major__ 12

define __clang_minor__ 0

define __clang_patchlevel__ 0

define __clang_version__ "12.0.0 (clang-1200.0.32.28)"

define llvm 1

define __nonnull _Nonnull

define __null_unspecified _Null_unspecified

define __nullable _Nullable

define pic 2

define __strong

define __unsafe_unretained

define weak attribute__((objc_gc(weak)))

/usr/bin/clang done /usr/local/bin/gcc -fopenmp

define DBL_MIN_EXP (-1021)

define UINT_LEAST16_MAX 0xffff

define __ARM_SIZEOF_WCHAR_T 4

define DBL_DECIMAL_DIG 17

define __ATOMIC_ACQUIRE 2

define __FLT_MIN__ 1.1754943508222875e-38F

define __GCC_IEC_559_COMPLEX 2

define UINT_LEAST8_TYPE unsigned char

define __INTMAX_C(c) c ## L

define __UINT8_MAX__ 0xff

define __WCHAR_MAX__ 0x7fffffff

define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1

define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1

define __GCC_ATOMIC_CHAR_LOCK_FREE 2

define __GCC_IEC_559 2

define FLT32X_DECIMAL_DIG 17

define FLT_EVAL_METHOD 0

define FLT64_DECIMAL_DIG 17

define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2

define UINT_FAST32_TYPE unsigned int

define UINT_FAST64_MAX 0xffffffffffffffffULL

define __DBL_MIN_10_EXP__ (-307)

define FINITE_MATH_ONLY 0

define FLT32X_MAX_EXP 1024

define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1

define __GNUC_PATCHLEVEL__ 0

define FLT32_HAS_DENORM 1

define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1

define UINT_FAST8_MAX 0xff

define __INT8_C(c) c

define __ARM_64BIT_STATE 1

define INT_LEAST8_WIDTH 8

define __INTMAX_TYPE__ long int

define UINT_LEAST64_MAX 0xffffffffffffffffULL

define __SHRT_MAX__ 0x7fff

define __LDBL_MAX__ 1.7976931348623157e+308L

define __ARM_FEATURE_IDIV 1

define __LDBL_IS_IEC_60559__ 2

define __ARM_FP 14

define DYNAMIC 1

define UINT_LEAST8_MAX 0xff

define __APPLE_CC__ 1

define __UINTMAX_TYPE__ long unsigned int

define __FLT_EVAL_METHOD_TS_18661_3__ 0

define __UINT32_MAX__ 0xffffffffU

define DBL_DENORM_MIN ((double)4.9406564584124654e-324L)

define AARCH64_CMODEL_SMALL 1

define LDBL_MAX_EXP 1024

define __CHAR_BIT__ 8

define __FLT32X_IS_IEC_60559__ 2

define INT_LEAST16_WIDTH 16

define __ARM_ALIGN_MAX_STACK_PWR 16

define __SCHAR_MAX__ 0x7f

define __DBL_MAX__ ((double)1.7976931348623157e+308L)

define WCHAR_MIN (-WCHAR_MAX - 1)

define __INT64_C(c) c ## LL

define __GCC_ATOMIC_POINTER_LOCK_FREE 2

define __SIZEOF_INT__ 4

define INT_FAST64_WIDTH 64

define __PRAGMA_REDEFINE_EXTNAME 1

define FLT32X_MANT_DIG 53

define USER_LABEL_PREFIX _

define __FLT32_MAX_10_EXP__ 38

define __STDC_HOSTED__ 1

define __DBL_DIG__ 15

define __FLT32_DIG__ 6

define __FLT_EPSILON__ 1.1920928955078125e-7F

define __SHRT_WIDTH__ 16

define __FLT32_IS_IEC_60559__ 2

define __LDBL_MIN__ 2.2250738585072014e-308L

define __WINT_TYPE__ int

define __FLT16_HAS_QUIET_NAN__ 1

define __strong

define __ARM_SIZEOF_MINIMAL_ENUM 4

define __FP_FAST_FMA 1

define FLT32X_HAS_INFINITY 1

define __INT32_MAX__ 0x7fffffff

define __INT_WIDTH__ 32

define __SIZEOF_LONG__ 8

define APPLE 1

define __UINT16_C(c) c

define __DECIMAL_DIG__ 17

define __FLT64_EPSILON__ 2.2204460492503131e-16F64

define __INT16_MAX__ 0x7fff

define __LDBL_HAS_QUIET_NAN__ 1

define FLT16_MIN_EXP (-13)

define FLT64_MANT_DIG 53

define LDBL_MANT_DIG 53

define GNUC 11

define FLT_HAS_DENORM 1

define SIZEOF_LONG_DOUBLE 8

define LDBL_MIN_EXP (-1021)

define __FLT64_MAX_10_EXP__ 308

define __FLT16_MAX_10_EXP__ 4

define __DBL_IS_IEC_60559__ 2

define FLT32_HAS_INFINITY 1

define LDBL_HAS_DENORM 1

define DBL_HAS_INFINITY 1

define __HAVE_SPECULATION_SAFE_VALUE 1

define __INTPTR_WIDTH__ 64

define FLT32X_HAS_DENORM 1

define INT_FAST16_TYPE short int

define __STRICT_ANSI__ 1

define FLT32_DECIMAL_DIG 9

define INT_LEAST32_MAX 0x7fffffff

define __weak

define DBL_MAX_EXP 1024

define __WCHAR_WIDTH__ 32

define __FLT32_MAX__ 3.4028234663852886e+38F32

define __GCC_ATOMIC_LONG_LOCK_FREE 2

define FLT16_DECIMAL_DIG 5

define __FLT_IS_IEC_60559__ 2

define __FLT32_HAS_QUIET_NAN__ 1

define LONG_LONG_MAX 0x7fffffffffffffffLL

define SIZEOF_SIZE_T 8

define SIG_ATOMIC_WIDTH 32

define __ARM_ALIGN_MAX_PWR 28

define SIZEOF_WINT_T 4

define LONG_LONG_WIDTH 64

define FLT32_MAX_EXP 128

define __ARM_FP16_FORMAT_IEEE 1

define FLT_MIN_EXP (-125)

define FLT64_NORM_MAX 1.7976931348623157e+308F64

define FLT32X_MIN_EXP (-1021)

define INT_FAST64_TYPE long long int

define __ARM_FP16_ARGS 1

define __FP_FAST_FMAF 1

define __FP_FAST_FMAL 1

define FLT64_DENORM_MIN 4.9406564584124654e-324F64

define __DBL_MIN__ ((double)2.2250738585072014e-308L)

define __ARM_FEATURE_CLZ 1

define FLT16_DENORM_MIN 5.9604644775390625e-8F16

define __SIZEOF_POINTER__ 8

define __GXX_ABI_VERSION 1015

define __SIZE_TYPE__ long unsigned int

define LP64 1

define __DBL_HAS_QUIET_NAN__ 1

define __FLT_EVAL_METHOD_C99__ 0

define __FLT32X_EPSILON__ 2.2204460492503131e-16F32x

define FLT64_MIN_EXP (-1021)

define __UINT64_MAX__ 0xffffffffffffffffULL

define LDBL_DECIMAL_DIG 17

define __FLT_MAX__ 3.4028234663852886e+38F

define aarch64 1

define __FLT64_MIN_10_EXP__ (-307)

define __REGISTER_PREFIX__

define __UINT16_MAX__ 0xffff

define LDBL_HAS_INFINITY 1

define __FLT_DIG__ 6

define DEC_EVAL_METHOD 2

define FLT_MANT_DIG 24

define __FLT16_MIN_10_EXP__ (-4)

define VERSION "11.0.0 20201128 (experimental)"

define __UINT64_C(c) c ## ULL

define __WINT_MAX__ 0x7fffffff

define __GCC_ATOMIC_INT_LOCK_FREE 2

define __FLT32X_MIN__ 2.2250738585072014e-308F32x

define FLT32_MANT_DIG 24

define AARCH64EL 1

define FLOAT_WORD_ORDER ORDER_LITTLE_ENDIAN

define FLT16_MAX_EXP 16

define __BIGGEST_ALIGNMENT__ 16

define __INT32_C(c) c

define __FLT16_DIG__ 3

define __SCHAR_WIDTH__ 8

define ORDER_PDP_ENDIAN 3412

define INT_FAST32_TYPE int

define UINT_LEAST16_TYPE short unsigned int

define ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED 110000

define __ARM_FEATURE_FMA 1

define __INT8_TYPE__ signed char

define SIG_ATOMIC_TYPE int

define __GCC_ASM_FLAG_OUTPUTS__ 1

define arm64 1

define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1

define __FLT_RADIX__ 2

define INT_LEAST16_TYPE short int

define __ARM_ARCH_PROFILE 65

define __LDBL_EPSILON__ 2.2204460492503131e-16L

define __UINTMAX_C(c) c ## UL

define __ARM_PCS_AAPCS64 1

define SIG_ATOMIC_MAX 0x7fffffff

define OPTIMIZE 1

define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2

define SIZEOF_PTRDIFF_T 8

define __arm64 1

define __ATOMIC_RELAXED 0

define INT_FAST32_WIDTH 32

define __LDBL_DIG__ 15

define __FLT64_IS_IEC_60559__ 2

define __FLT16_IS_IEC_60559__ 2

define __FLT64_DIG__ 15

define UINT_FAST32_MAX 0xffffffffU

define UINT_LEAST64_TYPE long long unsigned int

define __FLT16_EPSILON__ 9.7656250000000000e-4F16

define __FLT_HAS_QUIET_NAN__ 1

define __FLT_MAX_10_EXP__ 38

define __LONG_MAX__ 0x7fffffffffffffffL

define FLT_HAS_INFINITY 1

define DBL_HAS_DENORM 1

define UINT_FAST16_TYPE short unsigned int

define __FLT32X_HAS_QUIET_NAN__ 1

define __CHAR16_TYPE__ short unsigned int

define __SIZE_WIDTH__ 64

define __INTMAX_WIDTH__ 64

define INT_LEAST16_MAX 0x7fff

define FLT16_NORM_MAX 6.5504000000000000e+4F16

define __INT64_MAX__ 0x7fffffffffffffffLL

define FLT32_DENORM_MIN 1.4012984643248171e-45F32

define INT_LEAST64_TYPE long long int

define __INT16_TYPE__ short int

define INT_LEAST8_TYPE signed char

define __FLT16_MAX__ 6.5504000000000000e+4F16

define __STDC_VERSION__ 199901L

define INT_FAST8_MAX 0x7f

define __ARM_ARCH 8

define __INTPTR_MAX__ 0x7fffffffffffffffL

define __ARM_FEATURE_UNALIGNED 1

define __FLT64_HAS_QUIET_NAN__ 1

define __FLT32X_DIG__ 15

define __UINT8_TYPE__ unsigned char

define __PTRDIFF_WIDTH__ 64

define __CONSTANT_CFSTRINGS__ 1

define FLT64_HAS_INFINITY 1

define FLT16_HAS_INFINITY 1

define SIG_ATOMIC_MIN (-SIG_ATOMIC_MAX - 1)

define __PTRDIFF_MAX__ 0x7fffffffffffffffL

define FLT16_MANT_DIG 11

define __INTPTR_TYPE__ long int

define __UINT16_TYPE__ short unsigned int

define __WCHAR_TYPE__ int

define pic 2

define __UINTPTR_MAX__ 0xffffffffffffffffUL

define __ARM_ARCH_8A 1

define INT_FAST64_MAX 0x7fffffffffffffffLL

define FLT_NORM_MAX 3.4028234663852886e+38F

define UINT_FAST64_TYPE long long unsigned int

define __INT_MAX__ 0x7fffffff

define __INT64_TYPE__ long long int

define FLT_MAX_EXP 128

define ORDER_BIG_ENDIAN 4321

define DBL_MANT_DIG 53

define INT_LEAST64_MAX 0x7fffffffffffffffLL

define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2

define __FP_FAST_FMAF32 1

define UINT_LEAST32_TYPE unsigned int

define __SIZEOF_SHORT__ 2

define FLT32_NORM_MAX 3.4028234663852886e+38F32

define __GCC_ATOMIC_BOOL_LOCK_FREE 2

define __FLT64_MAX__ 1.7976931348623157e+308F64

define MACH 1

define __LITTLE_ENDIAN__ 1

define __WINT_WIDTH__ 32

define __FP_FAST_FMAF64 1

define INT_LEAST8_MAX 0x7f

define INT_LEAST64_WIDTH 64

define __FLT32X_MAX_10_EXP__ 308

define INT_FAST16_MAX 0x7fff

define __SIZEOF_INT128__ 16

define __FLT16_MIN__ 6.1035156250000000e-5F16

define __LDBL_MAX_10_EXP__ 308

define __DBL_EPSILON__ ((double)2.2204460492503131e-16L)

define FLT32_MIN_EXP (-125)

define _LP64 1

define __UINT8_C(c) c

define FLT64_MAX_EXP 1024

define INT_LEAST32_TYPE int

define __UINT64_TYPE__ long long unsigned int

define __ARM_NEON 1

define INT_FAST32_MAX 0x7fffffff

define __INTMAX_MAX__ 0x7fffffffffffffffL

define UINT_FAST8_TYPE unsigned char

define INT_FAST8_TYPE signed char

define GNUC_STDC_INLINE 1

define FLT64_HAS_DENORM 1

define _OPENMP 201511

define __FLT32_EPSILON__ 1.1920928955078125e-7F32

define __FP_FAST_FMAF32x 1

define FLT16_HAS_DENORM 1

define INT_FAST8_WIDTH 8

define __FLT32X_MAX__ 1.7976931348623157e+308F32x

define DBL_NORM_MAX ((double)1.7976931348623157e+308L)

define __BYTE_ORDER ORDER_LITTLE_ENDIAN__

define LDBL_DENORM_MIN 4.9406564584124654e-324L

define SIZEOF_WCHAR_T 4

define __UINT32_C(c) c ## U

define FLT_DENORM_MIN 1.4012984643248171e-45F

define WINT_MIN (-WINT_MAX - 1)

define __INT8_MAX__ 0x7f

define __LONG_WIDTH__ 64

define PIC 2

define FLT32X_NORM_MAX 1.7976931348623157e+308F32x

define __CHAR32_TYPE__ unsigned int

define __FLT32_MIN_10_EXP__ (-37)

define __ARM_FEATURE_NUMERIC_MAXMIN 1

define __INT32_TYPE__ int

define __SIZEOF_DOUBLE__ 8

define __FLT_MIN_10_EXP__ (-37)

define __FLT64_MIN__ 2.2250738585072014e-308F64

define INT_LEAST32_WIDTH 32

define __SIZEOF_FLOAT__ 4

define __ATOMIC_CONSUME 1

define __GNUC_MINOR__ 0

define INT_FAST16_WIDTH 16

define __UINTMAX_MAX__ 0xffffffffffffffffUL

define FLT32X_DENORM_MIN 4.9406564584124654e-324F32x

define __DBL_MAX_10_EXP__ 308

define __INT16_C(c) c

define __ARM_ARCH_ISA_A64 1

define STDC 1

define __PTRDIFF_TYPE__ long int

define __FLT32_MIN__ 1.1754943508222875e-38F32

define __ATOMIC_SEQ_CST 5

define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 1

define __UINT32_TYPE__ unsigned int

define __FLT32X_MIN_10_EXP__ (-307)

define __UINTPTR_TYPE__ long unsigned int

define __LDBL_MIN_10_EXP__ (-307)

define SIZEOF_LONG_LONG 8

define __GCC_ATOMIC_LLONG_LOCK_FREE 2

define FLT_DECIMAL_DIG 9

define UINT_FAST16_MAX 0xffff

define LDBL_NORM_MAX 1.7976931348623157e+308L

define __GCC_ATOMIC_SHORT_LOCK_FREE 2

define ORDER_LITTLE_ENDIAN 1234

define __SIZE_MAX__ 0xffffffffffffffffUL

define UINT_LEAST32_MAX 0xffffffffU

define __ATOMIC_ACQ_REL 4

define __ATOMIC_RELEASE 3

/usr/local/bin/gcc -fopenmp done

On 16 Jan 2021, at 9:32 am, Manodeep Sinha notifications@github.com wrote:

@karlglazebrook https://github.com/karlglazebrook Do you mind copy-pasting the output of:

!/bin/bash

declare -a compilers=("/usr/bin/clang" "/usr/local/bin/gcc -fopenmp") for cc in "${compilers[@]}" do echo " $cc " $cc -std=c99 -march=native -O3 -dM -E - < /dev/null echo " $cc done " done This will give a hint as to what compiler flags are being defined for the OS + instruction set.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/manodeep/Corrfunc/issues/241#issuecomment-761232174, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADU7FGW2FPXPM5QZ4GEOYFTS2C7BPANCNFSM4WDRL27Q.

manodeep commented 3 years ago

This is great - thanks @karlglazebrook!

Looks like __aarch64__, __arm64, and __arm64__ are all defined (equal to 1) in these three compiler flags. For the ISA, looks like __ARM_NEON (=1) is defined for gcc (with or without -march=native) while __ARM_NEON (=1), __ARM_NEON__ (=1), __ARM_NEON_FP (=0xE)

So the platform can be detected with any of __aarch64__, __aarch64__,__aarch64__ (or to be on the safe side, an || between all three) and then returning FALLBACK ISA before running the cpuid call.

If we add any NEON kernels in the future, then those will have to be protected by the #ifdef __ARM_NEON conditions, and the corresponding cpuid check will have to updated to the actual assembly call necessary. (The compile time check is necessary but the runtime cpu may be different)

@lgarrison What do you think?

manodeep commented 3 years ago

I will also note that there might be "undocumented" vectorised calls - someone dug these instructions out. Here's my fork of their secret gist. Found the gist through here

lgarrison commented 3 years ago

That's if any of __aarch64__, __arm64, and __arm64__ are detected? Sounds good to me!

Would be a fun project to try to get those undocumented vector calls to work!

manodeep commented 3 years ago

Here are the (untested) updates to the cpu_features.[ch] files.

cpu_features.h ```c /* File: cpu_features.h */ /* This file is a part of the Corrfunc package Copyright (C) 2015-- Manodeep Sinha (manodeep@gmail.com) License: MIT LICENSE. See LICENSE file under the top-level directory at https://github.com/manodeep/Corrfunc/ Adapted from Agner Fog's vectorclass: http://agner.org/ */ #pragma once #include #include #ifdef __cplusplus extern "C" { #endif typedef enum { DEFAULT=-42,/* present simply to make the enum a signed int*/ FALLBACK=0, /* No special options */ SSE=1, /* 64 bit vectors */ SSE2=2, /* 128 bit vectors */ SSE3=3, /* 128 bit vectors */ SSSE3=4, /* 128 bit vectors */ SSE4=5,/* 128bit vectors */ SSE42=6, /* 128bit vectors with blend operations */ AVX=7, /* 256bit vector width */ AVX2=8, /* AVX2 (integer operations)*/ AVX512F=9,/* AVX 512 Foundation */ NUM_ISA /*NUM_ISA will be the next integer after the last declared enum. AVX512F:=9 (so, NUM_ISA==10)*/ } isa; //name for instruction sets -> corresponds to the return values for functions in cpu_features.c static inline void cpuid (int output[4], int functionnumber) { #if defined(__aarch64__) || defined(__arm64__) || defined(__arm64) || defined(__aarch32__) /* Assuming ARM64( and hopefully also ARM32) */ return; #else /* Assuming x86_64 arch */ #if defined(__GNUC__) || defined(__clang__) // use inline assembly, Gnu/AT&T syntax int a, b, c, d; __asm("cpuid" : "=a"(a),"=b"(b),"=c"(c),"=d"(d) : "a"(functionnumber),"c"(0) ); output[0] = a; output[1] = b; output[2] = c; output[3] = d; #else // unknown platform. try inline assembly with masm/intel syntax __asm { mov eax, functionnumber xor ecx, ecx cpuid; mov esi, output mov [esi], eax mov [esi+4], ebx mov [esi+8], ecx mov [esi+12], edx } #endif #endif /* end of x86_64 arch */ } // Define interface to xgetbv instruction static inline int64_t xgetbv (int ctr) { #if defined(__aarch64__) || defined(__arm64__) || defined(__arm64) || defined(__aarch32__) /* Assuming ARM64 (and hopefully also ARM32) */ return 0; #else /* Assuming x86_64 */ #if (defined (__INTEL_COMPILER) && __INTEL_COMPILER >= 1200) //Intel compiler supporting _xgetbv intrinsic return _xgetbv(ctr); // intrinsic function for XGETBV #elif defined(__GNUC__) // use inline assembly, Gnu/AT&T syntax uint32_t a, d; __asm("xgetbv" : "=a"(a),"=d"(d) : "c"(ctr) : ); return a | (((uint64_t) d) << 32); #else uint32_t a, d; __asm { mov ecx, ctr _emit 0x0f _emit 0x01 _emit 0xd0 ; // xgetbv mov a, eax mov d, edx } return a | (((uint64_t) d) << 32); #endif #endif /* end of x86_64 arch */ } extern int runtime_instrset_detect(void); extern int get_max_usable_isa(void); #ifdef __cplusplus } #endif ```
cpu_features.c ```c /* File: cpu_features.c */ /* This file is a part of the Corrfunc package Copyright (C) 2015-- Manodeep Sinha (manodeep@gmail.com) License: MIT LICENSE. See LICENSE file under the top-level directory at https://github.com/manodeep/Corrfunc/ Adapted from Agner Fog's vectorclass: http://agner.org/ */ #include #include "cpu_features.h" // Use CPUID to detect what instruction sets the CPU supports // The compiler may not support all these features though! // Use get_max_usable_isa() to find the max ISA supported // by both the compiler and CPU int runtime_instrset_detect(void) { static int iset = -1; // remember value for next call if (iset >= 0) { return iset; // called before } iset = FALLBACK; // default value #if defined(__aarch64__) || defined(__arm64__) || defined(__arm64) || defined(__aarch32__) /* assuming ARM (aarch64, and hopefully aarch32) */ return iset; /* should always be FALLBACK*/ #else /* Assuming x86_64 architecture */ int abcd[4] = {0,0,0,0}; // cpuid results cpuid(abcd, 0); // call cpuid function 0 if (abcd[0] == 0) return iset; // no further cpuid function supported cpuid(abcd, 1); // call cpuid function 1 for feature flags if ((abcd[3] & (1 << 0)) == 0) return iset; // no floating point if ((abcd[3] & (1 << 23)) == 0) return iset; // no MMX if ((abcd[3] & (1 << 15)) == 0) return iset; // no conditional move if ((abcd[3] & (1 << 24)) == 0) return iset; // no FXSAVE if ((abcd[3] & (1 << 25)) == 0) return iset; // no SSE iset = SSE; // 1: SSE supported if ((abcd[3] & (1 << 26)) == 0) return iset; // no SSE2 iset = SSE2; // 2: SSE2 supported if ((abcd[2] & (1 << 0)) == 0) return iset; // no SSE3 iset = SSE3; // 3: SSE3 supported if ((abcd[2] & (1 << 9)) == 0) return iset; // no SSSE3 iset = SSSE3; // 4: SSSE3 supported if ((abcd[2] & (1 << 19)) == 0) return iset; // no SSE4.1 iset = SSE4; // 5: SSE4.1 supported if ((abcd[2] & (1 << 23)) == 0) return iset; // no POPCNT if ((abcd[2] & (1 << 20)) == 0) return iset; // no SSE4.2 iset = SSE42; // 6: SSE4.2 supported if ((abcd[2] & (1 << 27)) == 0) return iset; // no OSXSAVE if ((xgetbv(0) & 6) != 6) return iset; // AVX not enabled in O.S. if ((abcd[2] & (1 << 28)) == 0) return iset; // no AVX iset = AVX; // 7: AVX supported cpuid(abcd, 7); // call cpuid leaf 7 for feature flags if ((abcd[1] & (1 << 5)) == 0) return iset; // no AVX2 iset = AVX2; // 8: AVX2 supported cpuid(abcd, 0xD); // call cpuid leaf 0xD for feature flags if ((abcd[0] & 0x60) != 0x60) return iset; // no AVX512 iset = AVX512F; // 9: AVX512F supported return iset; #endif /* end of x86_64 architecture specific code*/ } // Report the max ISA supported by both the CPU and compiler int get_max_usable_isa(void) { static int iset = -1; // remember value for next call if (iset >= 0) { return iset; // called before } #if defined(__aarch64__) || defined(__arm64__) || defined(__arm64) || defined(__aarch32__) iset = FALLBACK; return iset; #endif iset = runtime_instrset_detect(); switch(iset){ case AVX512F: #ifdef __AVX512F__ iset = AVX512F; break; #elif defined(GAS_BUG_DISABLE_AVX512) fprintf(stderr, "[Warning] AVX512F is disabled due to a GNU Assembler bug. Upgrade to binutils >= 2.32 to fix this.\n"); #else fprintf(stderr, "[Warning] The CPU supports AVX512F but the compiler does not. Can you try another compiler?\n"); #endif // fall through case AVX2: #ifdef __AVX2__ iset = AVX2; break; #else fprintf(stderr, "[Warning] The CPU supports AVX2 but the compiler does not. Can you try another compiler?\n"); #endif // fall through case AVX: #ifdef __AVX__ iset = AVX; break; #else fprintf(stderr, "[Warning] The CPU supports AVX but the compiler does not. Can you try another compiler?\n"); #endif // fall through case SSE42: #ifdef __SSE4_2__ iset = SSE42; break; #else fprintf(stderr, "[Warning] The CPU supports SSE4.2 but the compiler does not. Can you try another compiler?\n"); #endif // fall through case SSE4: #ifdef __SSE4_1__ iset = SSE4; break; #else fprintf(stderr, "[Warning] The CPU supports SSE4.1 but the compiler does not. Can you try another compiler?\n"); #endif // fall through case SSSE3: #ifdef __SSSE3__ iset = SSSE3; break; #else fprintf(stderr, "[Warning] The CPU supports SSSE3 but the compiler does not. Can you try another compiler?\n"); #endif // fall through case SSE3: #ifdef __SSE3__ iset = SSE3; break; #else fprintf(stderr, "[Warning] The CPU supports SSE3 but the compiler does not. Can you try another compiler?\n"); #endif // fall through case SSE2: #ifdef __SSE2__ iset = SSE2; break; #else fprintf(stderr, "[Warning] The CPU supports SSE2 but the compiler does not. Can you try another compiler?\n"); #endif // fall through case SSE: #ifdef __SSE__ iset = SSE; break; #else fprintf(stderr, "[Warning] The CPU supports SSE but the compiler does not. Can you try another compiler?\n"); #endif // fall through case FALLBACK: default: iset = FALLBACK; break; } return iset; } ```

@lgarrison Will you please see if the updates and returns make sense?

misharash commented 1 year ago

The updates from the previous comment do help to get the code built and installed. I wonder if a bit more systematic fix will be possible?

manodeep commented 1 year ago

@misharash You might be interested in the initial implementation for the M1 architecture within the arm64 branch on this repo

lgarrison commented 1 year ago

Right, the arm64 branch (PR here) works on the M1 with both fallback and NEON kernels. So you can just use that branch instead of patching your source with the previous code.

@manodeep can confirm, but the fallback kernel from that branch should definitely be safe to use. It sounds like the NEON kernel is working too, but is less tested.

misharash commented 1 year ago

Thank you! Initially it wasn't clear that the NEON pull request was related to Apple Silicon support.

manodeep commented 11 months ago

Solved by #295. Now Corrfunc master branch should compile and run fine on Apple laptops with M1/M2 cpus

Sidenote: Optimised kernels being (slowly) implemented under the arm64 branch.