NNUE merge - Githubissues

vondele commented 4 years ago

The NNUE branch maintained by @nodchip has demonstrated strong results and offers great potential, and we will proceed to merge it into master. This will assure that Stockfish remains a reference engine based on the top computer chess technology for the foreseeable future. This merge will introduce machine learning based coding to the engine, thus enlarging the community of developers, bringing in new skills. We are eager to keep everybody on board, including all developers and users of diverse hardware, aiming to be an inclusive community. We will keep and evolve the handcrafted evaluation, not only for the beauty and insight it offers, but for additional reasons, including its value on diverse hardware platforms, and for the potential of hybridization with NNUE, and use in search. The initial goal for the merge is the NNUE playing capability only, this focus makes sense given the complexity of the project, and the aim of Stockfish as a production quality chess engine. We will try to make changes such that the @nodchip repo and ours can be kept in sync as easily as possible. Written by knowledgeable people, NNUE is already of good quality, we will make small modifications needed to make it pass our standard CI, and integrate nicely in the existing engine. Rigorous testing will continue to be a cornerstone of our development, and accordingly integration in fishtest needs to be properly implemented. Afterwards, we will guarantee continuous progression in playing strength testing core engine and networks with our usual SPRT process. Some initial steps of the merge will be outlined below, the precise steps needed will become clearer as we proceed, I look forward to working with the community to make this happen!

vondele commented 4 years ago

some tasks needed for integration:

[x] create network server with download/upload api
- [x] merge https://github.com/glinscott/fishtest/pull/736
- [x] delete now unused repo (https://github.com/official-stockfish/networks)
- [x] make readme documenting process for NNUE testing (https://github.com/glinscott/fishtest/wiki/Creating-my-first-test#nnue-net-tests)
- [x] all nets use a nn-sha256sum[12 digits].bin name for reference from engine
- [x] upload API (https://tests.stockfishchess.org/upload)
- [x] add first net (and more: https://tests.stockfishchess.org/nns)
[x] create nnue-player-wip branch in Stockfish repo, work in this branch till ready for a merge.
- [x] create branch (https://github.com/official-stockfish/Stockfish/tree/nnue-player-wip)
- [x] review code comments throughout
- [x] go through open issues
- [x] remove non-player related code
- [x] properly adjust readme
- [x] clamp abs(NNUE eval) < VALUE_TB_WIN_IN_MAX_PLY
- [x] fix eval / evalnn output
- [x] no compiler warning generated
- [x] fix debug=yes build
- [x] passes the usual CI tests (sanitizers, valgrind, etc) when run by hand
  - [x] verify bench
  - [x] verify perft
  - [x] verify reprosearch
  - [x] instrumented valgrind
  - [x] instrumented valgrind-thread (very slow! (multiple hours))
  - [x] instrumented sanitizer-undefined
  - [x] instrumented sanitizer-thread
- [x] verify equal bench on all (18!) architectures supported (done except for armv7, armv8, ppc-32)
- [x] adjust bench to cover both handcrafted and NNUE evaluation (needs solution for network download as part of CI)
- [x] consider random / synthetic network to perform bench / pgo.
- [x] decide when to init_nnue() (startup, EvalFile, Use NNUE, is_ready, cmdline, ucinewgame, ....), add assert.
- [x] changes in core engine
  - [x] minimize
  - [x] coding style
  - [x] minimize performance impact on handcrafted running style
- [x] UCI default net name matching a net in networks repo.
- [x] complete/maintain AUTHORS file
- [x] integrate in CI
  - [x] deal with std::aligned_alloc
  - [x] perform NNUE testing in CI (needs net download solution)
  - [x] update appveyor / MSCV to use C++17
  - [x] update travis environment to support C++17
  - [x] unfreed memory (fixed with newer toolchain)
  - [x] clang build (needs at least clang 7.0)
  - [x] gcc build (needs at least gcc 7.4)
  - [x] valgrind uninit memory on loading TB
  - [x] valgrind uninit memory
  - [x] memset
  - [x] memcpy
- [x] figure out reason for bench change wrt to nodchip repo (we are up-to-date with master on the branch).
- [x] test performance locally.
- [x] Figure out if we need to update Position objects on a 'Use NNUE' change (https://github.com/official-stockfish/Stockfish/issues/2859)
- [x] SPRT test classical eval impacted less than 5 Elo on fishtest (looking good after https://github.com/official-stockfish/Stockfish/pull/2873, STC: https://tests.stockfishchess.org/tests/view/5f2906a2a5abc164f05e4c5b LTC: https://tests.stockfishchess.org/tests/view/5f2906cfa5abc164f05e4c5d)
- [x] SPRT test NNUE more than 30 Elo gain on fishtest (skipping SPRT. See >80Elo gain: https://tests.stockfishchess.org/tests/view/5f28fe6ea5abc164f05e4c4c and https://tests.stockfishchess.org/tests/view/5f290229a5abc164f05e4c58)
- [x] Merge master
- [x] squash commits ?
- [x] create PR
[x] update fishtest to support NNUE testing
- [x] communicate / document / update compiler requirements to C++17
- [x] communicate / document removal of custom_make.txt
- [x] download networks from network server as required by binary.
- [x] build code with proper architecture (avx2, bmi2, ?)
- [x] instruct cutechess to do the right thing (i.e. Use NNUE, find network)
- [x] adjust minimal nps for running ?
- [x] decide how to deal with hardware that is slower for nnue.
[x] contact abrok, possible new targets for build (but see https://abrok.eu/stockfish-nnue/)

vondele commented 4 years ago

Let's use this issue specifically for synchronizing work on these or related tasks. General discussion use https://github.com/official-stockfish/Stockfish/issues/2728

vondele commented 4 years ago

debug=yes build is broken it seems.

vondele commented 4 years ago

A quick verification / benchmark (single bench run) on the targets I can test easily (on zen2 architecture), gives a feeling for the order of magnitudes. Good thing is all benches match. If people can test x86-64-avx512 armv7 armv8 ppc-32 ppc-64 that would be appreciated.

target	nodes	nodes NNUE	nps	nps NNUE
general-32	4578298	3377227	1725705	134486
x86-32-old	4578298	3377227	1783520	133497
x86-32	4578298	3377227	1758178	133635
general-64	4578298	3377227	2423662	308000
x86-64	4578298	3377227	2485503	436108
x86-64-modern	4578298	3377227	2611693	424648
x86-64-sse3	4578298	3377227	2470749	1131778
x86-64-sse3-popcnt	4578298	3377227	2675802	1199299
x86-64-ssse3	4578298	3377227	2577870	1166975
x86-64-sse41	4578298	3377227	2445672	1243914
x86-64-sse42	4578298	3377227	2549163	1113126
x86-64-avx2	4578298	3377227	2678933	1635461
x86-64-bmi2	4578298	3377227	1906829	1411884

Edit: since 'x86-64-modern' on master enables both sse3 and popcnt already, we can easily adjust that one (actually, incorrect, needs ssse3 not sse3, the target x86-64-sse3 enables both, increased modern to require ssse3).

dorzechowski commented 4 years ago

Fix for debug build: PR #2827.

vondele commented 4 years ago

Some more tasks as seen from the CI (patches welcome):

[DONE] appveyor testing needs to add '/std:c++17' somewhere in the right place (needs solution for aligned_alloc with MSCV)
clang: nnue/evaluate_nnue.cpp:60:45: error: no member named 'aligned_alloc' in namespace 'std' (on osx)
[DONE] position.cpp:199:39: error: ‘void memset(void, int, size_t)’ clearing an object of non-trivial type ‘struct StateInfo’; use assignment or value-initialization instead [-Werror=class-memaccess]
[DONE] position.cpp:714:51: error: ‘void memcpy(void, const void*, size_t)’ writing to an object of a non-trivial type ‘struct StateInfo’ leaves 1272 bytes unchanged [-Werror=class-memaccess]
[DONE] unfreed memory: in use at exit: 72,704 bytes in 1 blocks (possibly only with older version of toolchain)
[DONE] valgrind: uninit memory access: at 0x426526: Position::set(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool, bool, StateInfo, Thread) (position.cpp:222)

sf-x commented 4 years ago

* clang: nnue/evaluate_nnue.cpp:60:45: error: no member named 'aligned_alloc' in namespace 'std'

Compiler bug. std::aligned_alloc is standard C++17.

vondele commented 4 years ago

or header missing?

sf-x commented 4 years ago

hmm. std::aligned_alloc should exist after #include <cstdlib> which exists in "types.h", which is included from "position.h", which is included from nnue/evaluate_nnue.cpp...

vondele commented 4 years ago

OK, needs clang 7.0 (https://godbolt.org/z/rE3joM), CI is still based on 6.0. .travis.yml needs updating gcc needs to be >= 7.4

vondele commented 4 years ago

I have add a commit to upgrade the travis settings on Linux... probably somebody needs to come with a similar fix for the osx of our CI.

Edit: seems like this upgrade didn't work. I'll revert that patch and we'll address the issue later

JavaMast commented 4 years ago

@vondele

I'm afraid that assemblies for the x86_64 and x86_32 architecture (without sse3 or popcnt) are useless for testing in the framework, since they are much slower than the classic Stockfish assemblies for the same architecture and are much weaker. Testing them can significantly affect the quality of the tests.

vondele commented 4 years ago

almost certainly all machines on fishtest support x86-64-modern (which is right now equivalent to x86-64-sse3-popcnt) on the branch. The majority of the machines support avx2 or bmi2. We'll have to deal with that eventually, but right now it is not too much of a concern.

daylen commented 4 years ago

On compiling for macOS

On macOS 10.15.6 (Catalina) make build ARCH=x86-64-bmi2 COMP=clang has two issues:

The first one: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/malloc/_malloc.h:50:10: note: 'aligned_alloc' has been marked as being introduced in macOS 10.15 here, but the deployment target is macOS 10.9.0

So we need to change -mmacosx-version-min=10.9 in the Makefile to -mmacosx-version-min=10.15 but according to statcounter only ~52% of Mac users are on 10.15.

If we do change the min version, second issue: nnue/evaluate_nnue.cpp:60:40: error: no member named 'aligned_alloc' in namespace 'std'; did you mean simply 'aligned_alloc'? /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/malloc/_malloc.h:50:10: note: 'aligned_alloc' declared here

It seems like the C++ library that ships with Xcode dev tools has aligned_alloc in the global namespace. Could maybe get around this with a #if __APPLE__.

dorzechowski commented 4 years ago

Fix for memset/memcpy warnings and valgrind issue: PR #2830.

I can't see unfreed memory issue. Could you check again after applying these fixes?

MichaelB7 commented 4 years ago

error on build

$ make  build ARCH=x86-64 COMP=mingw

Config:
debug: 'no'
sanitize: 'no'
optimize: 'yes'
arch: 'x86_64'
bits: '64'
kernel: 'MINGW64_NT-10.0-18363'
os: 'Windows_NT'
prefetch: 'yes'
popcnt: 'no'
sse: 'yes'
sse3: 'no'
ssse3: 'no'
sse41: 'no'
sse42: 'no'
avx2: 'no'
pext: 'no'
avx512: 'no'

Flags:
CXX: g++
CXXFLAGS: -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2
LDFLAGS:  -static

Testing config sanity. If this fails, try 'make help' ...

C:/Program Files/Git/mingw64/bin/make.exe ARCH=x86-64 COMP=mingw all
make[1]: Entering directory 'C:/Users/MichaelB7/home/Github/Stockfish/src'
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o benchmark.o benchmark.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o bitbase.o bitbase.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o bitboard.o bitboard.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o endgame.o endgame.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o evaluate.o evaluate.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o main.o main.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o material.o material.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o misc.o misc.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o movegen.o movegen.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o movepick.o movepick.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o pawns.o pawns.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o position.o position.cpp
position.cpp: In member function 'Position& Position::set(const string&, bool, bool, StateInfo*, Thread*)':
position.cpp:199:39: warning: 'void* memset(void*, int, size_t)' clearing an object of non-trivial type 'struct StateInfo'; use assignment or value-initialization instead [-Wclass-memaccess]
  199 |   std::memset(si, 0, sizeof(StateInfo));
      |                                       ^
In file included from position.cpp:31:
position.h:40:8: note: 'struct StateInfo' declared here
   40 | struct StateInfo {
      |        ^~~~~~~~~
position.cpp: In member function 'void Position::do_move(Move, StateInfo&, bool)':
position.cpp:714:51: warning: 'void* memcpy(void*, const void*, size_t)' writing to an object of a non-trivial type 'struct StateInfo' leaves 1272 bytes unchanged [-Wclass-memaccess]
  714 |   std::memcpy(&newSt, st, offsetof(StateInfo, key));
      |                                                   ^
In file included from position.cpp:31:
position.h:40:8: note: 'struct StateInfo' declared here
   40 | struct StateInfo {
      |        ^~~~~~~~~
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o psqt.o psqt.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o search.o search.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o thread.o thread.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o timeman.o timeman.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o tt.o tt.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o uci.o uci.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o ucioption.o ucioption.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o tune.o tune.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o syzygy/tbprobe.o syzygy/tbprobe.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o nnue/evaluate_nnue.o nnue/evaluate_nnue.cpp
nnue/evaluate_nnue.cpp: In function 'void Eval::NNUE::Detail::Initialize(Eval::NNUE::AlignedPtr<T>&)':
nnue/evaluate_nnue.cpp:60:45: error: 'aligned_alloc' is not a member of 'std'; did you mean 'aligned_union'?
   60 |     pointer.reset(reinterpret_cast<T*>(std::aligned_alloc(alignof(T), sizeof(T))));
      |                                             ^~~~~~~~~~~~~
      |                                             aligned_union
make[1]: *** [<builtin>: nnue/evaluate_nnue.o] Error 1
make[1]: Leaving directory 'C:/Users/MichaelB7/home/Github/Stockfish/src'
make: *** [Makefile:595: build] Error 2

MichaelB7@VM-894787 MINGW64 ~/home/Github/Stockfish/src (nnue-player-wip)

daylen commented 4 years ago

@MichaelB7 Please provide more details, such as what version of g++ you have and your OS/platform. Also, you could try https://github.com/official-stockfish/Stockfish/pull/2831 and see if that fixes your issue—I think the PR makes the aligned_alloc issue more compatible in general for platforms other than macOS.

MichaelB7 commented 4 years ago

$ stockfish Stockfish 250720 64 POPCNT by T. Romstad, M. Costalba, J. Kiiski, G. Linscott compiler

Compiled by g++ (GNUC) 10.1.0 on MinGW64 VERSION macro expands to: 10.1.0

Windows 10 Pro Version 10.0.18363 Build 18363 will try your patch

MichaelB7 commented 4 years ago

Update: The patch did not resolve the issue:

$ make  build ARCH=x86-64 COMP=mingw

Config:
debug: 'no'
sanitize: 'no'
optimize: 'yes'
arch: 'x86_64'
bits: '64'
kernel: 'MINGW64_NT-10.0-18363'
os: 'Windows_NT'
prefetch: 'yes'
popcnt: 'no'
sse: 'yes'
sse3: 'no'
ssse3: 'no'
sse41: 'no'
sse42: 'no'
avx2: 'no'
pext: 'no'
avx512: 'no'

Flags:
CXX: g++
CXXFLAGS: -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2
LDFLAGS:  -static

Testing config sanity. If this fails, try 'make help' ...

C:/Program Files/Git/mingw64/bin/make.exe ARCH=x86-64 COMP=mingw all
make[1]: Entering directory 'C:/Users/MichaelB7/home/Github/Stockfish/src'
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o nnue/evaluate_nnue.o nnue/evaluate_nnue.cpp
nnue/evaluate_nnue.cpp: In function 'void Eval::NNUE::Detail::Initialize(Eval::NNUE::AlignedPtr<T>&)':
nnue/evaluate_nnue.cpp:60:40: error: there are no arguments to 'aligned_malloc' that depend on a template parameter, so a declaration of 'aligned_malloc' must be available [-fpermissive]
   60 |     pointer.reset(reinterpret_cast<T*>(aligned_malloc(alignof(T), sizeof(T))));
      |                                        ^~~~~~~~~~~~~~
nnue/evaluate_nnue.cpp:60:40: note: (if you use '-fpermissive', G++ will accept your code, but allowing the use of an undeclared name is deprecated)
nnue/evaluate_nnue.cpp: In instantiation of 'void Eval::NNUE::Detail::Initialize(Eval::NNUE::AlignedPtr<T>&) [with T = Eval::NNUE::FeatureTransformer; Eval::NNUE::AlignedPtr<T> = std::unique_ptr<Eval::NNUE::FeatureTransformer, Eval::NNUE::AlignedDeleter<Eval::NNUE::FeatureTransformer> >]':
nnue/evaluate_nnue.cpp:79:43:   required from here
nnue/evaluate_nnue.cpp:60:54: error: 'aligned_malloc' was not declared in this scope; did you mean '_aligned_malloc'?
   60 |     pointer.reset(reinterpret_cast<T*>(aligned_malloc(alignof(T), sizeof(T))));
      |                                        ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
      |                                        _aligned_malloc
nnue/evaluate_nnue.cpp: In instantiation of 'void Eval::NNUE::Detail::Initialize(Eval::NNUE::AlignedPtr<T>&) [with T = Eval::NNUE::Layers::AffineTransform<Eval::NNUE::Layers::ClippedReLU<Eval::NNUE::Layers::AffineTransform<Eval::NNUE::Layers::ClippedReLU<Eval::NNUE::Layers::AffineTransform<Eval::NNUE::Layers::InputSlice<512>, 32> >, 32> >, 1>; Eval::NNUE::AlignedPtr<T> = std::unique_ptr<Eval::NNUE::Layers::AffineTransform<Eval::NNUE::Layers::ClippedReLU<Eval::NNUE::Layers::AffineTransform<Eval::NNUE::Layers::ClippedReLU<Eval::NNUE::Layers::AffineTransform<Eval::NNUE::Layers::InputSlice<512>, 32> >, 32> >, 1>, Eval::NNUE::AlignedDeleter<Eval::NNUE::Layers::AffineTransform<Eval::NNUE::Layers::ClippedReLU<Eval::NNUE::Layers::AffineTransform<Eval::NNUE::Layers::ClippedReLU<Eval::NNUE::Layers::AffineTransform<Eval::NNUE::Layers::InputSlice<512>, 32> >, 32> >, 1> > >]':
nnue/evaluate_nnue.cpp:80:31:   required from here
nnue/evaluate_nnue.cpp:60:54: error: 'aligned_malloc' was not declared in this scope; did you mean '_aligned_malloc'?
   60 |     pointer.reset(reinterpret_cast<T*>(aligned_malloc(alignof(T), sizeof(T))));
      |                                        ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
      |                                        _aligned_malloc
make[1]: *** [<builtin>: nnue/evaluate_nnue.o] Error 1
make[1]: Leaving directory 'C:/Users/MichaelB7/home/Github/Stockfish/src'
make: *** [Makefile:595: build] Error 2

But if I make the suggested change from the error message above , it worked.

replacing this on line 60:

  pointer.reset(reinterpret_cast<T*>(aligned_alloc(alignof(T), sizeof(T))));

with this:

  pointer.reset(reinterpret_cast<T*>(_aligned_malloc(alignof(T), sizeof(T))));

vondele commented 4 years ago

@dorzechowski one more valgrind warning triggers when loading TB files:

uciok
setoption name SyzygyPath value ../tests/syzygy/
==122921== Conditional jump or move depends on uninitialised value(s)
==122921==    at 0x13DF43: Position::set(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, StateInfo*, Thread*) (position.cpp:222)
==122921==    by 0x13EFDE: Position::set(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Color, StateInfo*) (position.cpp:407)
==122921==    by 0x16D412: (anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>::TBTable(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (tbprobe.cpp:366)
==122921==    by 0x173AB9: void __gnu_cxx::new_allocator<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0> >::construct<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>((anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (new_allocator.h:147)
==122921==    by 0x171CFC: void std::allocator_traits<std::allocator<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0> > >::construct<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>(std::allocator<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0> >&, (anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (alloc_traits.h:484)
==122921==    by 0x171D97: void std::deque<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>, std::allocator<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0> > >::_M_push_back_aux<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (deque.tcc:496)
==122921==    by 0x170C2F: (anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>& std::deque<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>, std::allocator<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0> > >::emplace_back<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (deque.tcc:174)
==122921==    by 0x16DAFB: (anonymous namespace)::TBTables::add(std::vector<PieceType, std::allocator<PieceType> > const&) (tbprobe.cpp:489)
==122921==    by 0x16F2A6: Tablebases::init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (tbprobe.cpp:1367)
==122921==    by 0x167A8C: UCI::on_tb_path(UCI::Option const&) (ucioption.cpp:44)
==122921==    by 0x1699F0: UCI::Option::operator=(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (ucioption.cpp:202)
==122921==    by 0x163837: (anonymous namespace)::setoption(std::__cxx11::basic_istringstream<char, std::char_traits<char>, std::allocator<char> >&) (uci.cpp:100)
==122921== 
==122921== Conditional jump or move depends on uninitialised value(s)
==122921==    at 0x13DF43: Position::set(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, StateInfo*, Thread*) (position.cpp:222)
==122921==    by 0x13EFDE: Position::set(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Color, StateInfo*) (position.cpp:407)
==122921==    by 0x16D62F: (anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>::TBTable(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (tbprobe.cpp:385)
==122921==    by 0x173AB9: void __gnu_cxx::new_allocator<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0> >::construct<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>((anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (new_allocator.h:147)
==122921==    by 0x171CFC: void std::allocator_traits<std::allocator<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0> > >::construct<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>(std::allocator<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0> >&, (anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (alloc_traits.h:484)
==122921==    by 0x171D97: void std::deque<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>, std::allocator<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0> > >::_M_push_back_aux<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (deque.tcc:496)
==122921==    by 0x170C2F: (anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>& std::deque<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>, std::allocator<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0> > >::emplace_back<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (deque.tcc:174)
==122921==    by 0x16DAFB: (anonymous namespace)::TBTables::add(std::vector<PieceType, std::allocator<PieceType> > const&) (tbprobe.cpp:489)
==122921==    by 0x16F2A6: Tablebases::init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (tbprobe.cpp:1367)
==122921==    by 0x167A8C: UCI::on_tb_path(UCI::Option const&) (ucioption.cpp:44)
==122921==    by 0x1699F0: UCI::Option::operator=(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (ucioption.cpp:202)
==122921==    by 0x163837: (anonymous namespace)::setoption(std::__cxx11::basic_istringstream<char, std::char_traits<char>, std::allocator<char> >&) (uci.cpp:100)
==122921== 
info string Found 35 tablebases

Edit: this should be fixed by:

diff --git a/src/position.cpp b/src/position.cpp
index 52ba710ae..8f2c4b029 100644
--- a/src/position.cpp
+++ b/src/position.cpp
@@ -404,7 +404,7 @@ Position& Position::set(const string& code, Color c, StateInfo* si) {
   string fenStr = "8/" + sides[0] + char(8 - sides[0].length() + '0') + "/8/8/8/8/"
                        + sides[1] + char(8 - sides[1].length() + '0') + "/8 w - - 0 10";

-  return set(fenStr, false, use_nnue(), si, nullptr);
+  return set(fenStr, false, false, si, nullptr);
 }

This set() method is basically just there to create a materials key, and starts from a freshly created Position() object.

mstembera commented 4 years ago

@MichaelB7 Thank you. I can now at least compile. However, I believe the parameters for aligned_alloc and _aligned_malloc should be reversed. Also we should probably match it with _aligned_free(ptr) instead of std::free(ptr).

vondele commented 4 years ago

first green CI badge on Linux/gcc https://travis-ci.org/github/official-stockfish/Stockfish/jobs/711914367

dorzechowski commented 4 years ago

@vondele Another thing to ascertain before we can use NNUE in SF is a license to use it. As nodchip's implementation doesn't have any license attached, I'm not sure we can even use it legally now. We assume so but it's not enough I think. This should be solved together with @nodchip and those files should have either GPLv3 or compatible license attached.

vondele commented 4 years ago

@dorzechowski would be great @nodchip confirms this, but as the code (binaries and sources) are being distributed already, the license is GPL, this is also what the repository Copying file in the repository mentions.

vondele commented 4 years ago

Meanwhile also green on Linux/clang : https://travis-ci.org/github/official-stockfish/Stockfish/jobs/711961757

dorzechowski commented 4 years ago

@dorzechowski would be great @nodchip confirms this, but as the code (binaries and sources) are being distributed already, the license is GPL, this is also what the repository Copying file in the repository mentions.

That's great. All this licensing can be very confusing.

mstembera commented 4 years ago

I can confirm the x86-64-avx512 gives the same 3377227 bench as above.

dorzechowski commented 4 years ago

@vondele I believe task "remove non-player related code" is done already. I see nothing that could be removed without breaking it. Of course possible optimizations is another thing.

vondele commented 4 years ago

@dorzechowski OK, I'll tick that box, assuming you had a careful look.

vondele commented 4 years ago

I have also verified bench on linux-rhel7-ppc64le using gcc version 9.1.0 (GCC)

There are warnings however:

nnue/evaluate_nnue.cpp: In function ‘Value Eval::NNUE::ComputeScore(const Position&, bool)’:
nnue/evaluate_nnue.cpp:124:61: warning: requested alignment 64 is larger than 16 [-Wattributes]
  124 |         transformed_features[FeatureTransformer::kBufferSize];
      |                                                             ^
nnue/evaluate_nnue.cpp:126:61: warning: requested alignment 64 is larger than 16 [-Wattributes]
  126 |     alignas(kCacheLineSize) char buffer[Network::kBufferSize];
      |                                                             ^

seemingly that's not a bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68267) and gcc provides a __BIGGEST_ALIGNMENT__ macro which is arch specific (https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html). It is however different from 64 for most arch I tried (except knl).

Maybe there is an easy solution?

gekkehenker commented 4 years ago

@vondele Another thing to ascertain before we can use NNUE in SF is a license to use it. As nodchip's implementation doesn't have any license attached, I'm not sure we can even use it legally now. We assume so but it's not enough I think. This should be solved together with @nodchip and those files should have either GPLv3 or compatible license attached.

AFAIK YaneuraOu is the original author and wrote about it with his full support:

http://yaneuraou.yaneu.com/2020/06/21/stockfish-nnue-subjective-review/

https://twitter.com/nodchip/status/1286947373374087168/retweets/with_comments

nodchip commented 4 years ago

YaneuraOu are distributed under GPLv3. https://github.com/yaneurao/YaneuraOu I also intended that the ported code and code that I wrote are distributed under GPLv3. If there are additional works to express that those code are distributed under GPLv3, please let me know.

mstembera commented 4 years ago

Do we know why the bench of the @nodchip branch with the default net is 3909820 but our nnue-player-wip is 3377227? The reason I ask is because I played a local STC match between SF and SFNNUE and after 100 games SF won them all. Obviously something seems broken. @dorzechowski Did you do any play testing w/ your branch and what were the results?

vondele commented 4 years ago

@nodchip thanks for confirming the license. There is nothing particular you need to do, I might add a copyright header to the added files, that makes things clear.

vondele commented 4 years ago

@mstembera I think I did play with the branch, and had good results. Obviously that's something to clarify. I'm not sure both branches have the same version of master merged.

vondele commented 4 years ago

@mstembera quick check of the current status of master vs branch looks good for me:

Score of master vs nnue: 17 - 49 - 84  [0.393] 150
Elo difference: -75.3 +/- 36.7, LOS: 0.0 %, DrawRatio: 56.0 %

mstembera commented 4 years ago

@vondele Thanks. I tried testing using the chessbase gui and all is fine there. Seems I only have the issue when using cutechess-cli. If I figure out what I'm doing wrong I'll report.

vondele commented 4 years ago

I have used:

./cutechess-cli -repeat -rounds 10000 -games 2 -tournament gauntlet -resign movecount=3 score=400 -draw movenumber=34 movecount=8 score=20 -concurrency 15 -openings file=noob_3moves.epd format=epd order=random plies=16  -engine name=master cmd=stockfish.master -engine name=nnue cmd=stockfish.nnue option.EvalFile=/home/vondele/chess/match/nn-c157e0a5755b.bin option."Use NNUE"=true -ratinginterval 1 -each tc=10.0+0.1 proto=uci option.Threads=1 -pgnout nnue.pgn

If I have to guess, most likely missing the full path to EvalFile ?

Meanwhile results with a few more games:

Score of master vs nnue: 498 - 1019 - 1879  [0.423] 3396
Elo difference: -53.7 +/- 7.8, LOS: 0.0 %, DrawRatio: 55.3 %

mstembera commented 4 years ago

Those are encouraging results. I wasn't using the EvalFile path because I have the net in the same directory as the .exe Adding it anyway seems to make no difference. When I test nnue vs nnue things seem normal except the draw ration is only about 10%. Most wins are by adjudication as normal.

vondele commented 4 years ago

@mstembera I think cutechess has the current working directory independent of where the engine binary is, and as a result the binary doesn't find the EvalFile, unless given explicitly with the full path. Adding -debug should show that in the output. After printing the error message the code continues, with a zero-ed net.

mstembera commented 4 years ago

@vondele Yes that was it. Thanks! I would have never guessed cutechess was overriding the current directory. I wonder if we should just fail when "Use NNUE"=true but the network fails to load?

vondele commented 4 years ago

you are touching on a difficult point, and that is how to deal with the net in general.

For example, do we expect people who build the code to download the default net to the src directory ? This would make it relatively easy to enable profile-build. However, to figure out the name of the default net, one should first build the binary, execute uci and look at EvalFile. So, require a build, afterward a profile-build.

Similar question for the bench command. One could test both NNUE and handcrafted eval in a single bench, if we assume the developer has the default net in the src directory. Similar question of course if 'Use NNUE' would default to true.

Probably that's something we could expect for developers, but I'm sure we will all be bitten by this requirement at some point. Other opinions?

mstembera commented 4 years ago

Yes this isn't straight forward. I think it's important than when inevitably something goes wrong it doesn't happen silently. This makes it much harder for a person to track down. The current net is about 21MB and compresses to about 8MB. Is it out of the question to simply add it to the source directory along with the source files?

mstembera commented 4 years ago

@vondele Here is a commit by @nodchip to solve the issue of allowing the net path to be relative to the binary. https://github.com/nodchip/Stockfish/commit/fbdb373b6482db2b462b30d7399c3c7fed1d4f26

stockchess commented 4 years ago

will profile build without net in directory give same optimizer compiler flags as with net in directory? does it matter if use nnue is false default? agree with @mstembera - good to include default net in source directory and have net in same directory as exe by default

vondele commented 4 years ago

@mstembera the problem with adding to the source directory is that the git repo will become very large to clone 20Mb with each new net. We might also run into github bandwidth limits if there are too many downloads. However, I don't think we should add the code you referenced for 'finding' the binary. This is OS/GUI specific, and right now, the problem is not very different from specifying the syzygy directory. Also in that case a relative path doesn't work.

@stockchess there is some impact of having NNUE false by default for the profile-build ... if the code is not executed the profiling will not gather statistics for it, and thus optimize less. The difference is likely not spectacular, but will be visible.

vondele commented 4 years ago

Meanwhile two tests with using nnue=false show some overhead, wrt to master.

https://tests.stockfishchess.org/tests/view/5f1fddb7c09435d870cb9f15 https://tests.stockfishchess.org/tests/view/5f1ff3bdc09435d870cb9f25

it is just below the self-put target of 5 Elo, but would be nice to improve upon this.

mstembera commented 4 years ago

@vondele Can't the old net be purged from the repo each time we add a new one so that it never has more than one and doesn't grow?

vondele commented 4 years ago

technically, yes, but that's kind of opposite of what a version control system like git wants, and certainly not good style https://docs.github.com/en/github/managing-large-files/removing-files-from-a-repositorys-history

dorzechowski commented 4 years ago

@mstembera Yes, I played a lot of test matches in cutechess-cli, LittleBlitzer and Arena and the results were as good as reported by other people. I don't have any Chessbase GUI to test. I also carefully checked that bench didn't change after I removed/refactored some code. I also updated it to latest master. It's still possible that I've missed something but I'm pretty confident I haven't.

official-stockfish / Stockfish

NNUE merge #2823