official-stockfish / books

Chess books used to develop Stockfish
149 stars 31 forks source link

Another book with unbalanced human openings #39

Closed vondele closed 9 months ago

vondele commented 9 months ago

A new book derived from Lichess games, with a model draw rate between 48% and 52%

It attempts to address the following points, relative to the currently used book:

The construction process involved

1) Parsing all 15B lichess games in the database https://database.lichess.org/ for the period Jan - Sept 2023. Extract from these the popular positions, i.e. seen at least twice, within the first 16 plies played, exploring newly added games to at most 8 previously unseen plies.

$ ./fastpopular --dir /mnt/md0/chess/lichessgames/2023/ --minCount 2 --stopEarly --countStopEarly 8 --maxPlies 16 --concurrency 9 -o popular_Lichess_JanSept_maxPlies16_stopEarly8.epd
Looking for pgn files in /mnt/md0/chess/lichessgames/2023/
Found 9 .pgn(.gz) files, creating 9 chunks for processing.
Processed 9 files
Retained 296993424 positions from 1127228493 unique visited in 15251265926 games.
Total time for processing: 7374.5 s

fastpopular as available at https://github.com/vondele/fastpopular

2) Score all these 296M games with a modified stockfish, based on master, that analyses positions up to a depth 24, for as long as the draw rate is predicted (UCI_ShowWDL) near 50%. Positions will be analysed to low depth if the draw rate is very different from 50% at low depth. From these scored positions, extract those with a draw rate in the range 48 - 52% That modified branch is available at https://github.com/vondele/Stockfish/tree/createUHO

   ./stockfish.createUHO bench 128 1 24 popular_Lichess_JanSept_maxPlies16_stopEarly8.epd > popular_Lichess_JanSept_maxPlies16_stopEarly8_scored.epd
   awk '{if ($15>480 && $15<520) print $0}' popular_Lichess_JanSept_maxPlies16_stopEarly8_scored.epd | cut -d';' -f1 | sed "s/ $//g" > UHO_Lichess_4852_v1.epd

Short initial testing at STC shows the draw rate is, as expected, close to 50% for self-play games:

Score of master1 vs master2: 1048 - 1031 - 1921 [] 4000
Elo difference: 1.48 +/- 7.75, LOS: 64.54 %, DrawRatio: 48.02 %
Ptnml:        WW     WD  DD/WL     LD     LL
Distr:        21    473   1026    462     18
robertnurnberg commented 4 months ago
  • positions at all game plies between 1 and 16

Just a tiny correction. The earliest game ply I could find is 2, e.g. for the position rnbqkbnr/p1pppppp/8/1p6/3P4/8/PPP1PPPP/RNBQKBNR w KQkq - 0 2.

Edit: Here the complete list of frequencies.

game ply  2: 5 times
game ply  3: 47 times
game ply  4: 642 times
game ply  5: 3454 times
game ply  6: 12996 times
game ply  7: 29984 times
game ply  8: 60510 times
game ply  9: 99575 times
game ply 10: 156793 times
game ply 11: 217136 times
game ply 12: 288550 times
game ply 13: 353868 times
game ply 14: 420058 times
game ply 15: 470702 times
game ply 16: 517716 times