official-stockfish / Stockfish

A free and strong UCI chess engine
https://stockfishchess.org/
GNU General Public License v3.0
11.56k stars 2.27k forks source link

Ideal bench settings for the PGO build #2383

Closed Alayan-stk-2 closed 4 years ago

Alayan-stk-2 commented 5 years ago

Right now, the makefile instruct to run the default bench, which happens on one-core, with default hash (16MB) and default depth (13). So the question are, does this provide the best profile when intending to run stockfish with a big hash, at higher depth and with many threads. And which parameters to choose depending on the machine and intended use case. And how could SF's building system be tweaked.

The default bench with low depth and hash doesn't properly reflect the importance of memory pressure in longer use, as pretty much all the hash fits in cache. So if there is a different way to optimize the code that fares better in this situation, it could make a measurable difference.

With the submission for the TCEC FRC bonus, I had Aloril trying out a different bench run for the PGO profile.

Both setoption name Threads value 256 go depth 28

Changed to PGOBENCH = ./$(EXE) bench 2048 256 14, otherwise way too slow ;)

I had suggested d28 which was taking too long, but d14 is barely above the default d13 and is a poor match for 2GB hash too.

The flag -fprofile-correction must be set for the multi-threaded profile run to run.

info depth 28 seldepth 42 multipv 1 score cp 57 nodes 3064323750 nps 99795601 hashfull 1000 tbhits 0 time 30706 pv d2d4 e7e6 c2c4 g8f6 g1f3 b7b6 g2g3 c8a6 b2b3 a6b7 f1g2 f8b4 c1d2 c7c5 e1g1 e8g8 d2b4 c5b4 a2a3 b8a6 e2e3 d8e7 a3a4 f6e4 f3d2 d7d5 d2e4 d5e4 b1d2 f7f5 a4a5 e6e5 f1e1 e5d4 e3d4 a8d8 a5b6 2nd time: info depth 28 seldepth 40 multipv 1 score cp 45 nodes 4083633575 nps 109754443 hashfull 1000 tbhits 0 time 37207 pv d2d4 g8f6 c2c4 e7e6 g1f3 d7d5 b1c3 c7c6 e2e3 f8d6 f1d3 e8g8 e1g1 b8d7 c1d2 f8e8 d1c2 e6e5 c4d5 c6d5 c3b5 d6b8 d4e5 d7e5 f3e5 b8e5 d2c3 e5b8 c3f6 d8f6 d3h7 g8h8 3rd time: info depth 28 seldepth 40 multipv 1 score cp 51 nodes 3358646487 nps 103896015 hashfull 1000 tbhits 0 time 32327 pv d2d4 d7d5 c2c4 e7e6 b1c3 c7c6 c4d5 e6d5 c1f4 f8d6 f4d6 d8d6 e2e3 g8e7 f1d3 c8f5 g1f3 f5d3 d1d3 d6g6 d3g6 h7g6 e1c1 b8d7 g2g3 d7f6 c1c2 a8d8 f3e5 f6e4 c3e4

Ordinary build: info depth 28 seldepth 39 multipv 1 score cp 64 nodes 4114545927 nps 103167993 hashfull 1000 tbhits 0 time 39882 pv d2d4 e7e6 c2c4 g8f6 g1f3 d7d5 c4d5 e6d5 c1f4 f8d6 f4d6 d8d6 e2e3 e8g8 f1e2 c8g4 d1b3 f8e8 e1g1 b8d7 b1d2 c7c6 a1c1 a8b8 e2d3 g4h5 b3c2 h5g6 d3g6 h7g6 h2h3 2nd time: info depth 28 seldepth 39 multipv 1 score cp 56 nodes 4457059738 nps 101234690 hashfull 1000 tbhits 0 time 44027 pv d2d4 e7e6 g1f3 g8f6 c2c4 d7d5 b1c3 c7c6 e2e3 b8d7 f1d3 d5c4 d3c4 b7b5 c4e2 c8b7 e1g1 a7a6 e3e4 c6c5 e4e5 f6d5 c3e4 f8e7 c1g5 e8g8 f1e1 c5c4 g5e7 d8e7 e4d6 b7c6 d1c2 3rd time: info depth 28 seldepth 40 multipv 1 score cp 53 nodes 3118260750 nps 104366448 hashfull 1000 tbhits 0 time 29878 pv d2d4 e7e6 c2c4 g8f6 g1f3 b7b6 g2g3 c8a6 e2e3 f8b4 c1d2 b4d2 d1d2 c7c5 b1c3 e8g8 f1e2 d7d5 c4d5 e6d5 e2a6 b8a6 a1c1 d8e8 e1g1 e8e6 g1g2 f8e8 d2e2 a6c7 f1d1 a8c8 h2h3 h7h6

So from this limited data, it's not clear which is better.

MichaelB7 commented 5 years ago

The default makefile settings are fine for most people . You are possibly fretting over an Elo or two for an engine rated 3500 elo. You would have to play 50,000 games for that to make an impact in results.

vondele commented 5 years ago

No need to try too fancy things. Certainly threading is not a good idea (i.e. the PGO counters updates are racy, and thus wrong, which is what the profile-correction tries to patch after the facts). PGO just provides the compiler with a hint on basic stuff such as branching probability or loop iteration counts, and it doesn't matter if the branching prob is 67% or 69%. So, I think the current setup is pretty good.

snicolet commented 4 years ago

it's not clear which is better

So I am closing this issue for now :-)