official-stockfish / Stockfish

A free and strong UCI chess engine
https://stockfishchess.org/
GNU General Public License v3.0
11.33k stars 2.25k forks source link

NNUE merge #2823

Closed vondele closed 4 years ago

vondele commented 4 years ago

The NNUE branch maintained by @nodchip has demonstrated strong results and offers great potential, and we will proceed to merge it into master. This will assure that Stockfish remains a reference engine based on the top computer chess technology for the foreseeable future. This merge will introduce machine learning based coding to the engine, thus enlarging the community of developers, bringing in new skills. We are eager to keep everybody on board, including all developers and users of diverse hardware, aiming to be an inclusive community. We will keep and evolve the handcrafted evaluation, not only for the beauty and insight it offers, but for additional reasons, including its value on diverse hardware platforms, and for the potential of hybridization with NNUE, and use in search. The initial goal for the merge is the NNUE playing capability only, this focus makes sense given the complexity of the project, and the aim of Stockfish as a production quality chess engine. We will try to make changes such that the @nodchip repo and ours can be kept in sync as easily as possible. Written by knowledgeable people, NNUE is already of good quality, we will make small modifications needed to make it pass our standard CI, and integrate nicely in the existing engine. Rigorous testing will continue to be a cornerstone of our development, and accordingly integration in fishtest needs to be properly implemented. Afterwards, we will guarantee continuous progression in playing strength testing core engine and networks with our usual SPRT process. Some initial steps of the merge will be outlined below, the precise steps needed will become clearer as we proceed, I look forward to working with the community to make this happen!

vondele commented 4 years ago

some tasks needed for integration:

vondele commented 4 years ago

Let's use this issue specifically for synchronizing work on these or related tasks. General discussion use https://github.com/official-stockfish/Stockfish/issues/2728

vondele commented 4 years ago

debug=yes build is broken it seems.

vondele commented 4 years ago

A quick verification / benchmark (single bench run) on the targets I can test easily (on zen2 architecture), gives a feeling for the order of magnitudes. Good thing is all benches match. If people can test x86-64-avx512 armv7 armv8 ppc-32 ppc-64 that would be appreciated.

target nodes nodes NNUE nps nps NNUE
general-32 4578298 3377227 1725705 134486
x86-32-old 4578298 3377227 1783520 133497
x86-32 4578298 3377227 1758178 133635
general-64 4578298 3377227 2423662 308000
x86-64 4578298 3377227 2485503 436108
x86-64-modern 4578298 3377227 2611693 424648
x86-64-sse3 4578298 3377227 2470749 1131778
x86-64-sse3-popcnt 4578298 3377227 2675802 1199299
x86-64-ssse3 4578298 3377227 2577870 1166975
x86-64-sse41 4578298 3377227 2445672 1243914
x86-64-sse42 4578298 3377227 2549163 1113126
x86-64-avx2 4578298 3377227 2678933 1635461
x86-64-bmi2 4578298 3377227 1906829 1411884

Edit: since 'x86-64-modern' on master enables both sse3 and popcnt already, we can easily adjust that one (actually, incorrect, needs ssse3 not sse3, the target x86-64-sse3 enables both, increased modern to require ssse3).

dorzechowski commented 4 years ago

Fix for debug build: PR #2827.

vondele commented 4 years ago

Some more tasks as seen from the CI (patches welcome):

sf-x commented 4 years ago
* clang: nnue/evaluate_nnue.cpp:60:45: error: no member named 'aligned_alloc' in namespace 'std'

Compiler bug. std::aligned_alloc is standard C++17.

vondele commented 4 years ago

or header missing?

sf-x commented 4 years ago

hmm. std::aligned_alloc should exist after #include <cstdlib> which exists in "types.h", which is included from "position.h", which is included from nnue/evaluate_nnue.cpp...

vondele commented 4 years ago

OK, needs clang 7.0 (https://godbolt.org/z/rE3joM), CI is still based on 6.0. .travis.yml needs updating gcc needs to be >= 7.4

vondele commented 4 years ago

I have add a commit to upgrade the travis settings on Linux... probably somebody needs to come with a similar fix for the osx of our CI.

Edit: seems like this upgrade didn't work. I'll revert that patch and we'll address the issue later

JavaMast commented 4 years ago

@vondele

I'm afraid that assemblies for the x86_64 and x86_32 architecture (without sse3 or popcnt) are useless for testing in the framework, since they are much slower than the classic Stockfish assemblies for the same architecture and are much weaker. Testing them can significantly affect the quality of the tests.

vondele commented 4 years ago

almost certainly all machines on fishtest support x86-64-modern (which is right now equivalent to x86-64-sse3-popcnt) on the branch. The majority of the machines support avx2 or bmi2. We'll have to deal with that eventually, but right now it is not too much of a concern.

daylen commented 4 years ago

On compiling for macOS

On macOS 10.15.6 (Catalina) make build ARCH=x86-64-bmi2 COMP=clang has two issues:

The first one: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/malloc/_malloc.h:50:10: note: 'aligned_alloc' has been marked as being introduced in macOS 10.15 here, but the deployment target is macOS 10.9.0

So we need to change -mmacosx-version-min=10.9 in the Makefile to -mmacosx-version-min=10.15 but according to statcounter only ~52% of Mac users are on 10.15.

If we do change the min version, second issue: nnue/evaluate_nnue.cpp:60:40: error: no member named 'aligned_alloc' in namespace 'std'; did you mean simply 'aligned_alloc'? /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/malloc/_malloc.h:50:10: note: 'aligned_alloc' declared here

It seems like the C++ library that ships with Xcode dev tools has aligned_alloc in the global namespace. Could maybe get around this with a #if __APPLE__.

dorzechowski commented 4 years ago

Fix for memset/memcpy warnings and valgrind issue: PR #2830.

I can't see unfreed memory issue. Could you check again after applying these fixes?

MichaelB7 commented 4 years ago

error on build

$ make  build ARCH=x86-64 COMP=mingw

Config:
debug: 'no'
sanitize: 'no'
optimize: 'yes'
arch: 'x86_64'
bits: '64'
kernel: 'MINGW64_NT-10.0-18363'
os: 'Windows_NT'
prefetch: 'yes'
popcnt: 'no'
sse: 'yes'
sse3: 'no'
ssse3: 'no'
sse41: 'no'
sse42: 'no'
avx2: 'no'
pext: 'no'
avx512: 'no'

Flags:
CXX: g++
CXXFLAGS: -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2
LDFLAGS:  -static

Testing config sanity. If this fails, try 'make help' ...

C:/Program Files/Git/mingw64/bin/make.exe ARCH=x86-64 COMP=mingw all
make[1]: Entering directory 'C:/Users/MichaelB7/home/Github/Stockfish/src'
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o benchmark.o benchmark.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o bitbase.o bitbase.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o bitboard.o bitboard.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o endgame.o endgame.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o evaluate.o evaluate.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o main.o main.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o material.o material.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o misc.o misc.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o movegen.o movegen.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o movepick.o movepick.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o pawns.o pawns.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o position.o position.cpp
position.cpp: In member function 'Position& Position::set(const string&, bool, bool, StateInfo*, Thread*)':
position.cpp:199:39: warning: 'void* memset(void*, int, size_t)' clearing an object of non-trivial type 'struct StateInfo'; use assignment or value-initialization instead [-Wclass-memaccess]
  199 |   std::memset(si, 0, sizeof(StateInfo));
      |                                       ^
In file included from position.cpp:31:
position.h:40:8: note: 'struct StateInfo' declared here
   40 | struct StateInfo {
      |        ^~~~~~~~~
position.cpp: In member function 'void Position::do_move(Move, StateInfo&, bool)':
position.cpp:714:51: warning: 'void* memcpy(void*, const void*, size_t)' writing to an object of a non-trivial type 'struct StateInfo' leaves 1272 bytes unchanged [-Wclass-memaccess]
  714 |   std::memcpy(&newSt, st, offsetof(StateInfo, key));
      |                                                   ^
In file included from position.cpp:31:
position.h:40:8: note: 'struct StateInfo' declared here
   40 | struct StateInfo {
      |        ^~~~~~~~~
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o psqt.o psqt.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o search.o search.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o thread.o thread.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o timeman.o timeman.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o tt.o tt.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o uci.o uci.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o ucioption.o ucioption.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o tune.o tune.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o syzygy/tbprobe.o syzygy/tbprobe.cpp
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o nnue/evaluate_nnue.o nnue/evaluate_nnue.cpp
nnue/evaluate_nnue.cpp: In function 'void Eval::NNUE::Detail::Initialize(Eval::NNUE::AlignedPtr<T>&)':
nnue/evaluate_nnue.cpp:60:45: error: 'aligned_alloc' is not a member of 'std'; did you mean 'aligned_union'?
   60 |     pointer.reset(reinterpret_cast<T*>(std::aligned_alloc(alignof(T), sizeof(T))));
      |                                             ^~~~~~~~~~~~~
      |                                             aligned_union
make[1]: *** [<builtin>: nnue/evaluate_nnue.o] Error 1
make[1]: Leaving directory 'C:/Users/MichaelB7/home/Github/Stockfish/src'
make: *** [Makefile:595: build] Error 2

MichaelB7@VM-894787 MINGW64 ~/home/Github/Stockfish/src (nnue-player-wip)
daylen commented 4 years ago

@MichaelB7 Please provide more details, such as what version of g++ you have and your OS/platform. Also, you could try https://github.com/official-stockfish/Stockfish/pull/2831 and see if that fixes your issue—I think the PR makes the aligned_alloc issue more compatible in general for platforms other than macOS.

MichaelB7 commented 4 years ago

$ stockfish Stockfish 250720 64 POPCNT by T. Romstad, M. Costalba, J. Kiiski, G. Linscott compiler

Compiled by g++ (GNUC) 10.1.0 on MinGW64 VERSION macro expands to: 10.1.0

Windows 10 Pro Version 10.0.18363 Build 18363 will try your patch

MichaelB7 commented 4 years ago

Update: The patch did not resolve the issue:

$ make  build ARCH=x86-64 COMP=mingw

Config:
debug: 'no'
sanitize: 'no'
optimize: 'yes'
arch: 'x86_64'
bits: '64'
kernel: 'MINGW64_NT-10.0-18363'
os: 'Windows_NT'
prefetch: 'yes'
popcnt: 'no'
sse: 'yes'
sse3: 'no'
ssse3: 'no'
sse41: 'no'
sse42: 'no'
avx2: 'no'
pext: 'no'
avx512: 'no'

Flags:
CXX: g++
CXXFLAGS: -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2
LDFLAGS:  -static

Testing config sanity. If this fails, try 'make help' ...

C:/Program Files/Git/mingw64/bin/make.exe ARCH=x86-64 COMP=mingw all
make[1]: Entering directory 'C:/Users/MichaelB7/home/Github/Stockfish/src'
g++ -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -DUSE_SSE2   -c -o nnue/evaluate_nnue.o nnue/evaluate_nnue.cpp
nnue/evaluate_nnue.cpp: In function 'void Eval::NNUE::Detail::Initialize(Eval::NNUE::AlignedPtr<T>&)':
nnue/evaluate_nnue.cpp:60:40: error: there are no arguments to 'aligned_malloc' that depend on a template parameter, so a declaration of 'aligned_malloc' must be available [-fpermissive]
   60 |     pointer.reset(reinterpret_cast<T*>(aligned_malloc(alignof(T), sizeof(T))));
      |                                        ^~~~~~~~~~~~~~
nnue/evaluate_nnue.cpp:60:40: note: (if you use '-fpermissive', G++ will accept your code, but allowing the use of an undeclared name is deprecated)
nnue/evaluate_nnue.cpp: In instantiation of 'void Eval::NNUE::Detail::Initialize(Eval::NNUE::AlignedPtr<T>&) [with T = Eval::NNUE::FeatureTransformer; Eval::NNUE::AlignedPtr<T> = std::unique_ptr<Eval::NNUE::FeatureTransformer, Eval::NNUE::AlignedDeleter<Eval::NNUE::FeatureTransformer> >]':
nnue/evaluate_nnue.cpp:79:43:   required from here
nnue/evaluate_nnue.cpp:60:54: error: 'aligned_malloc' was not declared in this scope; did you mean '_aligned_malloc'?
   60 |     pointer.reset(reinterpret_cast<T*>(aligned_malloc(alignof(T), sizeof(T))));
      |                                        ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
      |                                        _aligned_malloc
nnue/evaluate_nnue.cpp: In instantiation of 'void Eval::NNUE::Detail::Initialize(Eval::NNUE::AlignedPtr<T>&) [with T = Eval::NNUE::Layers::AffineTransform<Eval::NNUE::Layers::ClippedReLU<Eval::NNUE::Layers::AffineTransform<Eval::NNUE::Layers::ClippedReLU<Eval::NNUE::Layers::AffineTransform<Eval::NNUE::Layers::InputSlice<512>, 32> >, 32> >, 1>; Eval::NNUE::AlignedPtr<T> = std::unique_ptr<Eval::NNUE::Layers::AffineTransform<Eval::NNUE::Layers::ClippedReLU<Eval::NNUE::Layers::AffineTransform<Eval::NNUE::Layers::ClippedReLU<Eval::NNUE::Layers::AffineTransform<Eval::NNUE::Layers::InputSlice<512>, 32> >, 32> >, 1>, Eval::NNUE::AlignedDeleter<Eval::NNUE::Layers::AffineTransform<Eval::NNUE::Layers::ClippedReLU<Eval::NNUE::Layers::AffineTransform<Eval::NNUE::Layers::ClippedReLU<Eval::NNUE::Layers::AffineTransform<Eval::NNUE::Layers::InputSlice<512>, 32> >, 32> >, 1> > >]':
nnue/evaluate_nnue.cpp:80:31:   required from here
nnue/evaluate_nnue.cpp:60:54: error: 'aligned_malloc' was not declared in this scope; did you mean '_aligned_malloc'?
   60 |     pointer.reset(reinterpret_cast<T*>(aligned_malloc(alignof(T), sizeof(T))));
      |                                        ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
      |                                        _aligned_malloc
make[1]: *** [<builtin>: nnue/evaluate_nnue.o] Error 1
make[1]: Leaving directory 'C:/Users/MichaelB7/home/Github/Stockfish/src'
make: *** [Makefile:595: build] Error 2

But if I make the suggested change from the error message above , it worked.

replacing this on line 60:

  pointer.reset(reinterpret_cast<T*>(aligned_alloc(alignof(T), sizeof(T))));

with this:

  pointer.reset(reinterpret_cast<T*>(_aligned_malloc(alignof(T), sizeof(T)))); 
vondele commented 4 years ago

@dorzechowski one more valgrind warning triggers when loading TB files:

uciok
setoption name SyzygyPath value ../tests/syzygy/
==122921== Conditional jump or move depends on uninitialised value(s)
==122921==    at 0x13DF43: Position::set(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, StateInfo*, Thread*) (position.cpp:222)
==122921==    by 0x13EFDE: Position::set(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Color, StateInfo*) (position.cpp:407)
==122921==    by 0x16D412: (anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>::TBTable(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (tbprobe.cpp:366)
==122921==    by 0x173AB9: void __gnu_cxx::new_allocator<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0> >::construct<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>((anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (new_allocator.h:147)
==122921==    by 0x171CFC: void std::allocator_traits<std::allocator<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0> > >::construct<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>(std::allocator<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0> >&, (anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (alloc_traits.h:484)
==122921==    by 0x171D97: void std::deque<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>, std::allocator<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0> > >::_M_push_back_aux<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (deque.tcc:496)
==122921==    by 0x170C2F: (anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>& std::deque<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>, std::allocator<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0> > >::emplace_back<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (deque.tcc:174)
==122921==    by 0x16DAFB: (anonymous namespace)::TBTables::add(std::vector<PieceType, std::allocator<PieceType> > const&) (tbprobe.cpp:489)
==122921==    by 0x16F2A6: Tablebases::init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (tbprobe.cpp:1367)
==122921==    by 0x167A8C: UCI::on_tb_path(UCI::Option const&) (ucioption.cpp:44)
==122921==    by 0x1699F0: UCI::Option::operator=(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (ucioption.cpp:202)
==122921==    by 0x163837: (anonymous namespace)::setoption(std::__cxx11::basic_istringstream<char, std::char_traits<char>, std::allocator<char> >&) (uci.cpp:100)
==122921== 
==122921== Conditional jump or move depends on uninitialised value(s)
==122921==    at 0x13DF43: Position::set(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, StateInfo*, Thread*) (position.cpp:222)
==122921==    by 0x13EFDE: Position::set(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Color, StateInfo*) (position.cpp:407)
==122921==    by 0x16D62F: (anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>::TBTable(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (tbprobe.cpp:385)
==122921==    by 0x173AB9: void __gnu_cxx::new_allocator<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0> >::construct<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>((anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (new_allocator.h:147)
==122921==    by 0x171CFC: void std::allocator_traits<std::allocator<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0> > >::construct<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>(std::allocator<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0> >&, (anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (alloc_traits.h:484)
==122921==    by 0x171D97: void std::deque<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>, std::allocator<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0> > >::_M_push_back_aux<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (deque.tcc:496)
==122921==    by 0x170C2F: (anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>& std::deque<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0>, std::allocator<(anonymous namespace)::TBTable<((anonymous namespace)::TBType)0> > >::emplace_back<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (deque.tcc:174)
==122921==    by 0x16DAFB: (anonymous namespace)::TBTables::add(std::vector<PieceType, std::allocator<PieceType> > const&) (tbprobe.cpp:489)
==122921==    by 0x16F2A6: Tablebases::init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (tbprobe.cpp:1367)
==122921==    by 0x167A8C: UCI::on_tb_path(UCI::Option const&) (ucioption.cpp:44)
==122921==    by 0x1699F0: UCI::Option::operator=(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (ucioption.cpp:202)
==122921==    by 0x163837: (anonymous namespace)::setoption(std::__cxx11::basic_istringstream<char, std::char_traits<char>, std::allocator<char> >&) (uci.cpp:100)
==122921== 
info string Found 35 tablebases

Edit: this should be fixed by:

diff --git a/src/position.cpp b/src/position.cpp
index 52ba710ae..8f2c4b029 100644
--- a/src/position.cpp
+++ b/src/position.cpp
@@ -404,7 +404,7 @@ Position& Position::set(const string& code, Color c, StateInfo* si) {
   string fenStr = "8/" + sides[0] + char(8 - sides[0].length() + '0') + "/8/8/8/8/"
                        + sides[1] + char(8 - sides[1].length() + '0') + "/8 w - - 0 10";

-  return set(fenStr, false, use_nnue(), si, nullptr);
+  return set(fenStr, false, false, si, nullptr);
 }

This set() method is basically just there to create a materials key, and starts from a freshly created Position() object.

mstembera commented 4 years ago

@MichaelB7 Thank you. I can now at least compile. However, I believe the parameters for aligned_alloc and _aligned_malloc should be reversed. Also we should probably match it with _aligned_free(ptr) instead of std::free(ptr).

vondele commented 4 years ago

first green CI badge on Linux/gcc https://travis-ci.org/github/official-stockfish/Stockfish/jobs/711914367

dorzechowski commented 4 years ago

@vondele Another thing to ascertain before we can use NNUE in SF is a license to use it. As nodchip's implementation doesn't have any license attached, I'm not sure we can even use it legally now. We assume so but it's not enough I think. This should be solved together with @nodchip and those files should have either GPLv3 or compatible license attached.

vondele commented 4 years ago

@dorzechowski would be great @nodchip confirms this, but as the code (binaries and sources) are being distributed already, the license is GPL, this is also what the repository Copying file in the repository mentions.

vondele commented 4 years ago

Meanwhile also green on Linux/clang : https://travis-ci.org/github/official-stockfish/Stockfish/jobs/711961757

dorzechowski commented 4 years ago

@dorzechowski would be great @nodchip confirms this, but as the code (binaries and sources) are being distributed already, the license is GPL, this is also what the repository Copying file in the repository mentions.

That's great. All this licensing can be very confusing.

mstembera commented 4 years ago

I can confirm the x86-64-avx512 gives the same 3377227 bench as above.

dorzechowski commented 4 years ago

@vondele I believe task "remove non-player related code" is done already. I see nothing that could be removed without breaking it. Of course possible optimizations is another thing.

vondele commented 4 years ago

@dorzechowski OK, I'll tick that box, assuming you had a careful look.

vondele commented 4 years ago

I have also verified bench on linux-rhel7-ppc64le using gcc version 9.1.0 (GCC)

There are warnings however:

nnue/evaluate_nnue.cpp: In function ‘Value Eval::NNUE::ComputeScore(const Position&, bool)’:
nnue/evaluate_nnue.cpp:124:61: warning: requested alignment 64 is larger than 16 [-Wattributes]
  124 |         transformed_features[FeatureTransformer::kBufferSize];
      |                                                             ^
nnue/evaluate_nnue.cpp:126:61: warning: requested alignment 64 is larger than 16 [-Wattributes]
  126 |     alignas(kCacheLineSize) char buffer[Network::kBufferSize];
      |                                                             ^

seemingly that's not a bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68267) and gcc provides a __BIGGEST_ALIGNMENT__ macro which is arch specific (https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html). It is however different from 64 for most arch I tried (except knl).

Maybe there is an easy solution?

gekkehenker commented 4 years ago

@vondele Another thing to ascertain before we can use NNUE in SF is a license to use it. As nodchip's implementation doesn't have any license attached, I'm not sure we can even use it legally now. We assume so but it's not enough I think. This should be solved together with @nodchip and those files should have either GPLv3 or compatible license attached.

AFAIK YaneuraOu is the original author and wrote about it with his full support:

http://yaneuraou.yaneu.com/2020/06/21/stockfish-nnue-subjective-review/

https://twitter.com/nodchip/status/1286947373374087168/retweets/with_comments

nodchip commented 4 years ago

YaneuraOu are distributed under GPLv3. https://github.com/yaneurao/YaneuraOu I also intended that the ported code and code that I wrote are distributed under GPLv3. If there are additional works to express that those code are distributed under GPLv3, please let me know.

mstembera commented 4 years ago

Do we know why the bench of the @nodchip branch with the default net is 3909820 but our nnue-player-wip is 3377227? The reason I ask is because I played a local STC match between SF and SFNNUE and after 100 games SF won them all. Obviously something seems broken. @dorzechowski Did you do any play testing w/ your branch and what were the results?

vondele commented 4 years ago

@nodchip thanks for confirming the license. There is nothing particular you need to do, I might add a copyright header to the added files, that makes things clear.

vondele commented 4 years ago

@mstembera I think I did play with the branch, and had good results. Obviously that's something to clarify. I'm not sure both branches have the same version of master merged.

vondele commented 4 years ago

@mstembera quick check of the current status of master vs branch looks good for me:

Score of master vs nnue: 17 - 49 - 84  [0.393] 150
Elo difference: -75.3 +/- 36.7, LOS: 0.0 %, DrawRatio: 56.0 %
mstembera commented 4 years ago

@vondele Thanks. I tried testing using the chessbase gui and all is fine there. Seems I only have the issue when using cutechess-cli. If I figure out what I'm doing wrong I'll report.

vondele commented 4 years ago

I have used:

./cutechess-cli -repeat -rounds 10000 -games 2 -tournament gauntlet -resign movecount=3 score=400 -draw movenumber=34 movecount=8 score=20 -concurrency 15 -openings file=noob_3moves.epd format=epd order=random plies=16  -engine name=master cmd=stockfish.master -engine name=nnue cmd=stockfish.nnue option.EvalFile=/home/vondele/chess/match/nn-c157e0a5755b.bin option."Use NNUE"=true -ratinginterval 1 -each tc=10.0+0.1 proto=uci option.Threads=1 -pgnout nnue.pgn

If I have to guess, most likely missing the full path to EvalFile ?

Meanwhile results with a few more games:

Score of master vs nnue: 498 - 1019 - 1879  [0.423] 3396
Elo difference: -53.7 +/- 7.8, LOS: 0.0 %, DrawRatio: 55.3 %
mstembera commented 4 years ago

Those are encouraging results. I wasn't using the EvalFile path because I have the net in the same directory as the .exe Adding it anyway seems to make no difference. When I test nnue vs nnue things seem normal except the draw ration is only about 10%. Most wins are by adjudication as normal.

vondele commented 4 years ago

@mstembera I think cutechess has the current working directory independent of where the engine binary is, and as a result the binary doesn't find the EvalFile, unless given explicitly with the full path. Adding -debug should show that in the output. After printing the error message the code continues, with a zero-ed net.

mstembera commented 4 years ago

@vondele Yes that was it. Thanks! I would have never guessed cutechess was overriding the current directory. I wonder if we should just fail when "Use NNUE"=true but the network fails to load?

vondele commented 4 years ago

you are touching on a difficult point, and that is how to deal with the net in general.

For example, do we expect people who build the code to download the default net to the src directory ? This would make it relatively easy to enable profile-build. However, to figure out the name of the default net, one should first build the binary, execute uci and look at EvalFile. So, require a build, afterward a profile-build.

Similar question for the bench command. One could test both NNUE and handcrafted eval in a single bench, if we assume the developer has the default net in the src directory. Similar question of course if 'Use NNUE' would default to true.

Probably that's something we could expect for developers, but I'm sure we will all be bitten by this requirement at some point. Other opinions?

mstembera commented 4 years ago

Yes this isn't straight forward. I think it's important than when inevitably something goes wrong it doesn't happen silently. This makes it much harder for a person to track down. The current net is about 21MB and compresses to about 8MB. Is it out of the question to simply add it to the source directory along with the source files?

mstembera commented 4 years ago

@vondele Here is a commit by @nodchip to solve the issue of allowing the net path to be relative to the binary. https://github.com/nodchip/Stockfish/commit/fbdb373b6482db2b462b30d7399c3c7fed1d4f26

stockchess commented 4 years ago

will profile build without net in directory give same optimizer compiler flags as with net in directory? does it matter if use nnue is false default? agree with @mstembera - good to include default net in source directory and have net in same directory as exe by default

vondele commented 4 years ago

@mstembera the problem with adding to the source directory is that the git repo will become very large to clone 20Mb with each new net. We might also run into github bandwidth limits if there are too many downloads. However, I don't think we should add the code you referenced for 'finding' the binary. This is OS/GUI specific, and right now, the problem is not very different from specifying the syzygy directory. Also in that case a relative path doesn't work.

@stockchess there is some impact of having NNUE false by default for the profile-build ... if the code is not executed the profiling will not gather statistics for it, and thus optimize less. The difference is likely not spectacular, but will be visible.

vondele commented 4 years ago

Meanwhile two tests with using nnue=false show some overhead, wrt to master.

https://tests.stockfishchess.org/tests/view/5f1fddb7c09435d870cb9f15 https://tests.stockfishchess.org/tests/view/5f1ff3bdc09435d870cb9f25

it is just below the self-put target of 5 Elo, but would be nice to improve upon this.

mstembera commented 4 years ago

@vondele Can't the old net be purged from the repo each time we add a new one so that it never has more than one and doesn't grow?

vondele commented 4 years ago

technically, yes, but that's kind of opposite of what a version control system like git wants, and certainly not good style https://docs.github.com/en/github/managing-large-files/removing-files-from-a-repositorys-history

dorzechowski commented 4 years ago

@mstembera Yes, I played a lot of test matches in cutechess-cli, LittleBlitzer and Arena and the results were as good as reported by other people. I don't have any Chessbase GUI to test. I also carefully checked that bench didn't change after I removed/refactored some code. I also updated it to latest master. It's still possible that I've missed something but I'm pretty confident I haven't.