yaneurao / YaneuraOu

YaneuraOu is the World's Strongest Shogi engine(AI player) , WCSC29 1st winner , educational and USI compliant engine.
GNU General Public License v3.0
525 stars 140 forks source link

extra/bitop.h内のマクロ定義でARM向け向け分岐が足りていない問題を修正 #199

Closed quiver closed 3 years ago

quiver commented 3 years ago

AWSにはGraviton2というARM Neoverse N1系のプロセッサが存在します。

https://www.arm.com/why-arm/partner-ecosystem/aws

$ lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Model:               1
BogoMIPS:            243.75
L1d cache:           64K
L1i cache:           64K
L2 cache:            1024K
L3 cache:            32768K
NUMA node0 CPU(s):   0-3
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs

このプロセッサを利用したインスタンスタイプ(c6g.xlarge)でビルドしたところ、マクロ未定義でエラーが発生しました。

$ git diff Makefile
diff --git a/source/Makefile b/source/Makefile
index 640c1ad..84ef95d 100644
--- a/source/Makefile
+++ b/source/Makefile
@@ -46,13 +46,13 @@ YANEURAOU_EDITION = YANEURAOU_ENGINE_NNUE

 #TARGET_CPU = AVX512VNNI
 #TARGET_CPU = AVX512
-TARGET_CPU = AVX2
+#TARGET_CPU = AVX2
 #TARGET_CPU = SSE42
 #TARGET_CPU = SSE41
 #TARGET_CPU = SSSE3
 #TARGET_CPU = SSE2
 #TARGET_CPU = NO_SSE
-#TARGET_CPU = OTHER
+TARGET_CPU = OTHER
 #TARGET_CPU = ZEN1
 #TARGET_CPU = ZEN2

$ make
...
./extra/bitop.h:309:49: error: use of undeclared identifier 'LSB64'
FORCE_INLINE int pop_lsb(u64 & b) { int index = LSB64(b);  b = BLSR(b); return index; }
                                                ^
In file included from main.cpp:4:
In file included from ./search.h:7:
In file included from ./position.h:6:
./bitboard.h:103:90: error: use of undeclared identifier 'LSB64'
        FORCE_INLINE Square pop_c() const { u64 q0 = extract64<0>();  return (q0 != 0) ? Square(LSB64(q0)) : Square(LSB64(extract64<1>()) + 63); }
                                                                                                ^
./bitboard.h:103:110: error: use of undeclared identifier 'LSB64'
        FORCE_INLINE Square pop_c() const { u64 q0 = extract64<0>();  return (q0 != 0) ? Square(LSB64(q0)) : Square(LSB64(extract64<1>()) + 63); }
                                                                                                                    ^
In file included from main.cpp:4:
In file included from ./search.h:5:
In file included from ./misc.h:12:
In file included from ./extra/../types.h:17:
./extra/bitop.h:308:69: error: use of undeclared identifier 'LSB32'
template <typename T> FORCE_INLINE int pop_lsb(T& b) {  int index = LSB32(b);  b = T(BLSR(b)); return index; }
                                                                    ^
./extra/../types.h:297:63: note: in instantiation of function template specialization 'pop_lsb<Effect8::Directions>' requested here
        static Direct pop_directions(Directions& d) { return (Direct)pop_lsb(d); }
                                                                     ^
4 errors generated.
make: *** [../obj/main.o] Error 1

extra/bitop.h 内で ARM 向けの分岐が足りていないようでしたので、追加しました。

clang++でのビルドの成功とbench コマンドの実行までは動作確認しました。

参考

$ clang -E -dM - < /dev/null | grep ARM
#define __ARM_64BIT_STATE 1
#define __ARM_ACLE 200
#define __ARM_ALIGN_MAX_STACK_PWR 4
#define __ARM_ARCH 8
#define __ARM_ARCH_ISA_A64 1
#define __ARM_ARCH_PROFILE 'A'
#define __ARM_FEATURE_CLZ 1
#define __ARM_FEATURE_DIRECTED_ROUNDING 1
#define __ARM_FEATURE_DIV 1
#define __ARM_FEATURE_FMA 1
#define __ARM_FEATURE_IDIV 1
#define __ARM_FEATURE_LDREX 0xF
#define __ARM_FEATURE_NUMERIC_MAXMIN 1
#define __ARM_FEATURE_UNALIGNED 1
#define __ARM_FP 0xE
#define __ARM_FP16_ARGS 1
#define __ARM_FP16_FORMAT_IEEE 1
#define __ARM_NEON 1
#define __ARM_NEON_FP 0xE
#define __ARM_PCS_AAPCS64 1
#define __ARM_SIZEOF_MINIMAL_ENUM 4
#define __ARM_SIZEOF_WCHAR_T 4

$ gcc -E -dM - < /dev/null | grep ARM
#define __ARM_SIZEOF_WCHAR_T 4
#define __ARM_FEATURE_UNALIGNED 1
#define __ARM_FEATURE_IDIV 1
#define __ARM_FP 14
#define __ARM_ALIGN_MAX_STACK_PWR 16
#define __ARM_SIZEOF_MINIMAL_ENUM 4
#define __ARM_ALIGN_MAX_PWR 28
#define __ARM_FP16_FORMAT_IEEE 1
#define __ARM_FP16_ARGS 1
#define __ARM_FEATURE_FMA 1
#define __ARM_64BIT_STATE 1
#define __ARM_ARCH_PROFILE 65
#define __ARM_PCS_AAPCS64 1
#define __ARM_FEATURE_CLZ 1
#define __ARM_ARCH 8
#define __ARM_ARCH_8A 1
#define __ARM_NEON 1
#define __ARM_FEATURE_NUMERIC_MAXMIN 1
#define __ARM_ARCH_ISA_A64 1
yaneurao commented 3 years ago

AWSのARMインスタンスで動くようになるんですか?! 素晴らしいです!!! ありがとうございます。