official-stockfish / Stockfish

A free and strong UCI chess engine
https://stockfishchess.org/
GNU General Public License v3.0
11.47k stars 2.26k forks source link

Solve compile errors on Raspbian buster #2641

Closed mayweed closed 4 years ago

mayweed commented 4 years ago

Lo If anyone encounters the same compile errors while attempting on building sf on raspbian buster:

/usr/bin/ld: /tmp/cciyPLsk.ltrans0.ltrans.o: in function `ThreadPool::start_thinking(Position&, std::unique_ptr<std::deque<StateInfo, std::allocator<StateInfo> >, std::default_delete<std::deque<StateInfo, std::allocator<StateInfo> > > >&, Search::LimitsType const&, bool) [clone .constprop.62]':
<artificial>:(.text+0x3e34): undefined reference to `__atomic_store_8'
/usr/bin/ld: <artificial>:(.text+0x3e48): undefined reference to `__atomic_store_8'
/usr/bin/ld: /tmp/cciyPLsk.ltrans0.ltrans.o: in function `TimeManagement::elapsed() const [clone .isra.105] [clone .constprop.36]':
<artificial>:(.text+0x6294): undefined reference to `__atomic_load_8'
/usr/bin/ld: /tmp/cciyPLsk.ltrans0.ltrans.o: in function `Value (anonymous namespace)::search<((anonymous namespace)::NodeType)1>(Position&, Search::Stack*, Value, Value, int, bool) [clone .constprop.31]':
<artificial>:(.text+0x6f9c): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0x7108): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0x7cbc): undefined reference to `__atomic_fetch_add_8'
/usr/bin/ld: <artificial>:(.text+0x7da8): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0x8278): undefined reference to `__atomic_store_8'
/usr/bin/ld: <artificial>:(.text+0x8390): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0x885c): undefined reference to `__atomic_fetch_add_8'
/usr/bin/ld: /tmp/cciyPLsk.ltrans5.ltrans.o: in function `dbg_print()':
<artificial>:(.text+0x8e44): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0x8e58): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0x8e84): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0x8eb0): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0x8edc): undefined reference to `__atomic_load_8'
/usr/bin/ld: /tmp/cciyPLsk.ltrans5.ltrans.o:<artificial>:(.text+0x8ef0): more undefined references to `__atomic_load_8' follow
/usr/bin/ld: /tmp/cciyPLsk.ltrans3.ltrans.o: in function `Thread::search()':
<artificial>:(.text+0x45e8): undefined reference to `__atomic_store_8'
/usr/bin/ld: /tmp/cciyPLsk.ltrans3.ltrans.o: in function `Position::do_move(Move, StateInfo&, bool)':
<artificial>:(.text+0x74a8): undefined reference to `__atomic_fetch_add_8'
/usr/bin/ld: /tmp/cciyPLsk.ltrans3.ltrans.o: in function `MainThread::search()':
<artificial>:(.text+0xa578): undefined reference to `__atomic_store_8'
/usr/bin/ld: <artificial>:(.text+0xa698): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0xa920): undefined reference to `__atomic_store_8'
/usr/bin/ld: <artificial>:(.text+0xa954): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0xaab8): undefined reference to `__atomic_store_8'
/usr/bin/ld: <artificial>:(.text+0xb090): undefined reference to `__atomic_store_8'
/usr/bin/ld: <artificial>:(.text+0xb138): undefined reference to `__atomic_store_8'
/usr/bin/ld: /tmp/cciyPLsk.ltrans3.ltrans.o: in function `Value (anonymous namespace)::search<((anonymous namespace)::NodeType)0>(Position&, Search::Stack*, Value, Value, int, bool) [clone .lto_priv.281]':
<artificial>:(.text+0xd748): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0xd810): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0xeb28): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0xf0a0): undefined reference to `__atomic_fetch_add_8'
/usr/bin/ld: <artificial>:(.text+0xf248): undefined reference to `__atomic_store_8'
/usr/bin/ld: <artificial>:(.text+0xf2ec): undefined reference to `__atomic_load_8'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:541: stockfish] Error 1
make[1]: Leaving directory '/home/pi/build/Stockfish-fishnet-180120/src'
make: *** [Makefile:458: build] Error 2

I solved it by those commands:

$ sudo apt-get install gcc-5 g++-5
$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 60 --slave /usr/bin/g++ g++ /usr/bin/g++-5
$ sudo update-alternatives --set gcc "/usr/bin/gcc-5"

And it finally compiles...

Dantist commented 4 years ago

I can confirm that it is compiling well on Raspbian Stretch (GCC 6.3) and failing on Raspbian Buster (GCC 8.3).

The solution is pretty straightforward. You still can build with GCC8, by adding -latomic to the linker flags:

make build ARCH=armv7 EXTRALDFLAGS=-latomic

Don't know whether there are any drawbacks. I think this flag can be added into Makefile for ARM arch.

~P.S. On, now obsolete, Raspberry Pi 1: executable produced by GCC8 (with -latomic) is 23% bigger and 9% slower than the one produced by GCC6 (tested on SF10). But it still hits the 23 kNps ))~

Correction: I forgot to strip GCC8 executable, so the bigger size was a wrong conclusion. GCC8 (with -latomic) and GCC6 (with -latomic) are pretty much the same in terms of produced file sizes and performance.

-latomic itself is what causes performance drop (~3.6% on a RPi1 - average among 16 bench runs on different depths).

Update 2: GCC on Raspbian Buster:

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/8/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Raspbian 8.3.0-6+rpi1' --with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-8 --program-prefix=arm-linux-gnueabihf- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm --disable-libquadmath --disable-libquadmath-support --enable-plugin --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-sjlj-exceptions --with-arch=armv6 --with-fpu=vfp --with-float=hard --disable-werror --enable-checking=release --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf
Thread model: posix
gcc version 8.3.0 (Raspbian 8.3.0-6+rpi1)
vondele commented 4 years ago

can -latomic also be used together with gcc 6.3 on raspbian stretch?

vondele commented 4 years ago

some related gcc issues : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81358

Dantist commented 4 years ago

can -latomic also be used together with gcc 6.3 on raspbian stretch?

Just tested. Compiled fine on GCC 6.3 with and without -latomic. Produced executables are same in terms of file size, but performance differs (checksums are differrent also, as expected). I've updated my statement regarding performance in a previous comment.

Best runs:

$ ./stockfish-11-gcc6-latomic bench
===========================
Total time (ms) : 247409
Nodes searched  : 5156767
Nodes/second    : 20843
$ ./stockfish-11-gcc6 bench
===========================
Total time (ms) : 238671
Nodes searched  : 5156767
Nodes/second    : 21606

Hardware (of Raspberry Pi 1):

$ lscpu
Architecture:          armv6l
Byte Order:            Little Endian
CPU(s):                1
On-line CPU(s) list:   0
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             1
Model:                 7
Model name:            ARMv6-compatible processor rev 7 (v6l)
CPU max MHz:           700.0000
CPU min MHz:           700.0000
BogoMIPS:              697.95
Flags:                 half thumb fastmult vfp edsp java tls
$ cat /proc/cpuinfo
processor       : 0
model name      : ARMv6-compatible processor rev 7 (v6l)
BogoMIPS        : 697.95
Features        : half thumb fastmult vfp edsp java tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xb76
CPU revision    : 7

Hardware        : BCM2835
Revision        : 000e
Serial          : 00000000b346df22

GCC on Raspbian Stretch:

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/6/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Raspbian 6.3.0-18+rpi1+deb9u1' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-6 --program-prefix=arm-linux-gnueabihf- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm --disable-libquadmath --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-armhf/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-armhf --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-armhf --with-arch-directory=arm --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-sjlj-exceptions --with-arch=armv6 --with-fpu=vfp --with-float=hard --enable-checking=release --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf
Thread model: posix
gcc version 6.3.0 20170516 (Raspbian 6.3.0-18+rpi1+deb9u1)
MichaelB7 commented 4 years ago

Everything here with my custom Makefile on the Honey fork compiles fine on both 32 bit and 64 bit Buster, different flags of course , with GCC 8.3. Atomic flag required. GCC 10.1 is currently problematic on 64 bit Buster, but it appears to be a distro still in development. GCC 10.1 works fine in 32 bit Buster. Tested both RP3+ and RP4. RP4 now approaches 900K/nps classical and 330K/nps NNUE, single core.

gsobala commented 4 years ago

An update on compilation on a Raspberry Pi running 64-bit Gentoo on a raspi 4, a profile-build NNUE gets about 850,000 nps when running on all four cores. Only 36% the speed of classical but fast enough to have an Elo about 70 higher at STC. Straightforward compilation make -j profile-build ARCH=armv8 - compiles "out of the box". gcc version is 10.1

Dantist commented 4 years ago

Just wanted to say that: @mayweed, most probably, had armv7 hardware (RPi 2) I had armv6 hardware (RPi 1), but have no issues compiling it with ARCH=armv7 for years. @gsobala , @MichaelB7 both have armv8 hardware (RPi 3, RPi 4).

I will check the newest master on RPi 1 this weekend, but if someone can check it earlier - you may give it a try..

vondele commented 4 years ago

I'm trying to understand the issue. For me the issue would be resolved if we can compile with the unmodified Makefile. 'out-of-the-box' Is this now (after NNUE merge, and the required compiler versions >6) the case, or not yet?

Dantist commented 4 years ago

OK, I have tested now :-)

I have now tried to build again on Raspbian Buster (gcc 8.3) on RPi 1 (armv6) and it fails as in the first message PLUS few new warnings from NNUE code emerged:

pi@pi:~/sf_nnue/src $ make build ARCH=armv7

Config:
debug: 'no'
sanitize: 'no'
optimize: 'yes'
arch: 'armv7'
bits: '32'
kernel: 'Linux'
os: 'GNU/Linux'
prefetch: 'yes'
popcnt: 'no'
sse: 'no'
ssse3: 'no'
sse41: 'no'
avx2: 'no'
pext: 'no'
avx512: 'no'
vnni: 'no'
neon: 'no'

Flags:
CXX: g++
CXXFLAGS: -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto
LDFLAGS:  -Wl,--no-as-needed -lpthread -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto -flto=jobserver

Testing config sanity. If this fails, try 'make help' ...

make ARCH=armv7 COMP=gcc all
make[1]: Entering directory '/home/pi/sf_nnue/src'
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto   -c -o benchmark.o benchmark.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto   -c -o bitbase.o bitbase.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto   -c -o bitboard.o bitboard.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto   -c -o endgame.o endgame.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto   -c -o evaluate.o evaluate.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto   -c -o main.o main.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto   -c -o material.o material.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto   -c -o misc.o misc.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto   -c -o movegen.o movegen.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto   -c -o movepick.o movepick.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto   -c -o pawns.o pawns.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto   -c -o position.o position.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto   -c -o psqt.o psqt.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto   -c -o search.o search.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto   -c -o thread.o thread.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto   -c -o timeman.o timeman.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto   -c -o tt.o tt.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto   -c -o uci.o uci.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto   -c -o ucioption.o ucioption.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto   -c -o tune.o tune.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto   -c -o tbprobe.o syzygy/tbprobe.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto   -c -o evaluate_nnue.o nnue/evaluate_nnue.cpp
nnue/evaluate_nnue.cpp: In function ‘Value Eval::NNUE::ComputeScore(const Position&, bool)’:
nnue/evaluate_nnue.cpp:135:61: warning: requested alignment 64 is larger than 8 [-Wattributes]
         transformed_features[FeatureTransformer::kBufferSize];
                                                             ^
nnue/evaluate_nnue.cpp:137:61: warning: requested alignment 64 is larger than 8 [-Wattributes]
     alignas(kCacheLineSize) char buffer[Network::kBufferSize];
                                                             ^
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto   -c -o half_kp.o nnue/features/half_kp.cpp
g++ -o stockfish benchmark.o bitbase.o bitboard.o endgame.o evaluate.o main.o material.o misc.o movegen.o movepick.o pawns.o position.o psqt.o search.o thread.o timeman.o tt.o uci.o ucioption.o tune.o tbprobe.o evaluate_nnue.o half_kp.o  -Wl,--no-as-needed -lpthread -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto -flto=jobserver
/usr/bin/ld: /tmp/ccNhu0Kx.ltrans0.ltrans.o: in function `ThreadPool::start_thinking(Position&, std::unique_ptr<std::deque<StateInfo, std::allocator<StateInfo> >, std::default_delete<std::deque<StateInfo, std::allocator<StateInfo> > > >&, Search::LimitsType const&, bool) [clone .constprop.49]':
<artificial>:(.text+0x858c): undefined reference to `__atomic_store_8'
/usr/bin/ld: <artificial>:(.text+0x85a8): undefined reference to `__atomic_store_8'
/usr/bin/ld: <artificial>:(.text+0x85bc): undefined reference to `__atomic_store_8'
/usr/bin/ld: /tmp/ccNhu0Kx.ltrans0.ltrans.o: in function `TimeManagement::elapsed() const [clone .isra.75] [clone .constprop.37]':
<artificial>:(.text+0x948c): undefined reference to `__atomic_load_8'
/usr/bin/ld: /tmp/ccNhu0Kx.ltrans0.ltrans.o: in function `Value (anonymous namespace)::search<((anonymous namespace)::NodeType)1>(Position&, Search::Stack*, Value, Value, int, bool) [clone .constprop.34]':
<artificial>:(.text+0xa184): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0xa32c): undefined reference to `__atomic_fetch_add_8'
/usr/bin/ld: <artificial>:(.text+0xa558): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0xacec): undefined reference to `__atomic_fetch_add_8'
/usr/bin/ld: <artificial>:(.text+0xb074): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0xb4d4): undefined reference to `__atomic_store_8'
/usr/bin/ld: <artificial>:(.text+0xb580): undefined reference to `__atomic_load_8'
/usr/bin/ld: /tmp/ccNhu0Kx.ltrans4.ltrans.o: in function `dbg_print()':
<artificial>:(.text+0x3c40): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0x3c54): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0x3c88): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0x3cc0): undefined reference to `__atomic_load_8'
/usr/bin/ld: /tmp/ccNhu0Kx.ltrans4.ltrans.o:<artificial>:(.text+0x3cf8): more undefined references to `__atomic_load_8' follow
/usr/bin/ld: /tmp/ccNhu0Kx.ltrans2.ltrans.o: in function `Value (anonymous namespace)::search<((anonymous namespace)::NodeType)0>(Position&, Search::Stack*, Value, Value, int, bool) [clone .lto_priv.233]':
<artificial>:(.text+0x7218): undefined reference to `__atomic_store_8'
/usr/bin/ld: <artificial>:(.text+0x7308): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0x78e8): undefined reference to `__atomic_fetch_add_8'
/usr/bin/ld: /tmp/ccNhu0Kx.ltrans3.ltrans.o: in function `Thread::search()':
<artificial>:(.text+0x3cb8): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0x3f38): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0x564c): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0x5670): undefined reference to `__atomic_store_8'
/usr/bin/ld: /tmp/ccNhu0Kx.ltrans3.ltrans.o: in function `MainThread::search()':
<artificial>:(.text+0x5fc8): undefined reference to `__atomic_load_8'
/usr/bin/ld: <artificial>:(.text+0x63bc): undefined reference to `__atomic_store_8'
/usr/bin/ld: <artificial>:(.text+0x63f8): undefined reference to `__atomic_load_8'
/usr/bin/ld: /tmp/ccNhu0Kx.ltrans3.ltrans.o: in function `Position::do_move(Move, StateInfo&, bool)':
<artificial>:(.text+0x8d54): undefined reference to `__atomic_fetch_add_8'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:699: stockfish] Error 1
make[1]: Leaving directory '/home/pi/sf_nnue/src'
make: *** [Makefile:594: build] Error 2
pi@pi:~/sf_nnue/src $

Adding -latomic still helps (no make clean between build):

pi@pi:~/sf_nnue/src $ make build ARCH=armv7 EXTRALDFLAGS=-latomic

Config:
debug: 'no'
sanitize: 'no'
optimize: 'yes'
arch: 'armv7'
bits: '32'
kernel: 'Linux'
os: 'GNU/Linux'
prefetch: 'yes'
popcnt: 'no'
sse: 'no'
ssse3: 'no'
sse41: 'no'
avx2: 'no'
pext: 'no'
avx512: 'no'
vnni: 'no'
neon: 'no'

Flags:
CXX: g++
CXXFLAGS: -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto
LDFLAGS: -latomic -Wl,--no-as-needed -lpthread -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto -flto=jobserver

Testing config sanity. If this fails, try 'make help' ...

make ARCH=armv7 COMP=gcc all
make[1]: Entering directory '/home/pi/sf_nnue/src'
g++ -o stockfish benchmark.o bitbase.o bitboard.o endgame.o evaluate.o main.o material.o misc.o movegen.o movepick.o pawns.o position.o psqt.o search.o thread.o timeman.o tt.o uci.o ucioption.o tune.o tbprobe.o evaluate_nnue.o half_kp.o -latomic -Wl,--no-as-needed -lpthread -Wall -Wcast-qual -fno-exceptions -std=c++17  -pedantic -Wextra -Wshadow -DNDEBUG -O3 -flto -flto=jobserver
make[1]: Leaving directory '/home/pi/sf_nnue/src'
pi@pi:~/sf_nnue/src $ 
vondele commented 4 years ago

the new warning is most likely innocent, we're requesting over-alignment. They should go away with gcc 9

so is the need to add -latomic specific to armv7 or is it conditional on the compiler version that happens to be available on that system that requires that. If we need -latomic for all armv7 builds we could easily add it. I see that also in the android issue https://github.com/official-stockfish/Stockfish/issues/2860 -latomic is sometimes needed. If for both linux and android we need -latomic on armv7 it is easy enough to add to the Makefile. If however the flag interferes with building on certain OS/compiler versins it is more difficult.

vondele commented 4 years ago

so, from reading a bit up, I think it would be fine to add -latomic to the linker flags for armv7, the library seems to be needed whenever certain atomic operations are not supported by the hardware, and is named the same way for both gcc and clang.

Dantist commented 4 years ago

@vondele, Thank you for taking care of this! I am not a C++ dev, so can't help you with that and leave it up to you :-) We may add it and test it in the wild to see if there are any new complaints emerge. :-)

I want to recall that:

  1. My actual hardware is armv6, but it seems that it's not relevant to the situation.
  2. -latomic isn't needed on gcc 6.3, but adding -latomic was safe and just dropped performance a little bit (3%).

Offtopic: When you just clone the repo and try to execute make help it prints the error (prior to printing the help itself):

pi@pi:~/sf_nnue/src $ make help
make: [Makefile:738: .depend] Error 1 (ignored)

To compile stockfish, type:
....

It also creates an empty .depend file in the directory and stops printing the error message on all consecutive make help executions. Also, empty .depend file speed up the building process start (obviously). Seems, that this has nothing to do with the issue we are discussing, I just wanted to inform you if you have not observed this before.

vondele commented 4 years ago

interestingly, the error on make help is not visible here. So something seems to go wrong while generating the dependencies at that point. Maybe you can see what the error message is when you apply this change

diff --git a/src/Makefile b/src/Makefile
index 38f607cb2..ba3564789 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -735,6 +735,6 @@ icc-profile-use:
        all

 .depend:
-       -@$(CXX) $(DEPENDFLAGS) -MM $(SRCS) > $@ 2> /dev/null
+       -@$(CXX) $(DEPENDFLAGS) -MM $(SRCS) > $@

 -include .depend
Dantist commented 4 years ago

Here it is:

pi@pi:~/sf_nnue/src $ make help
g++: error: unrecognized command line option ‘-msse’; did you mean ‘-fdse’?
make: [Makefile:738: .depend] Error 1 (ignored)

To compile stockfish, type:
....
vondele commented 4 years ago

ah, clear, somehow it passes the DEPENDFLAGS flags to dependency generation and they make no sense on arm (since -msse is x86). I'll remove that as part of the patch.

vondele commented 4 years ago

Maybe you can have a look if that would fix this issue: https://github.com/official-stockfish/Stockfish/pull/3006

Dantist commented 4 years ago

Definitely, this helped to build the binary on my setup. This also resolves the -msse issue. Thank you! :-)

P.S. I was eager to bench the NNUE on RPi 1 with PGO:

$ ./stockfish bench 16 1 13 default depth mixed >/dev/null
===========================
Total time (ms) : 305108
Nodes searched  : 3905447
Nodes/second    : 12800

$ ./stockfish bench 16 1 13 default depth classical >/dev/null
===========================
Total time (ms) : 196804
Nodes searched  : 4243037
Nodes/second    : 21559

$ ./stockfish bench 16 1 13 default depth NNUE >/dev/null
===========================
Total time (ms) : 503838
Nodes searched  : 4189131
Nodes/second    : 8314

profile-build vs build:

Classic, nps : 21559 vs 19024 (+13.3%)
NNUE,    nps :  8314 vs  6470 (+28.5%)

NNUE/classical ratio: 38.56

vondele commented 4 years ago

actually that speed ratio has a typo, it is a more reasonable 3.856

Any chance you do a match between NNUE and Classical on the hardware to see what the Elo difference would be? Maybe you need a light-weight game manager (see e.g. https://github.com/lucasart/c-chess-cli)

This might be the one of the few pieces of hardware where classical still outperforms NNUE, probably a bit depending on TC.

Dantist commented 4 years ago

actually that speed ratio has a typo, it is a more reasonable 3.856

Oh, sure there is a typo, but another one.. I mean 38.56% (profile-build) :-)

Maybe you need a light-weight game manager

I communicate with stockfish on Pi via SSH, it is something like "Remote UCI Engine" in my local network, so I can even run it using cutechess-cli on my PC.

Any chance you do a match between NNUE and Classical on the hardware

I will try to run cutechess-cli with noob_3moves.epd on STC, LTC, VLTC (with TC adaptation to this hardware, like fishtest do). Please, reply if I do something wrong in my test and if there are other cutechess-cli options that I need to set :-)

Update: The scale factor is 74 for this hardware. And it seems that TC should not be scaled at all if we benchmark the ELO difference on particular hardware. I will run on the usual 10+0.1, 60+0.6, and 120+1.2.

vondele commented 4 years ago

I have updated master with what I believe is the best patch so far. There might/will still be issues, let's try to improve as a follow up. Thanks for the feedback and testing.