raccrompton / BookBuilder

An automatic Chess opening repertoire Builder
GNU General Public License v3.0
126 stars 18 forks source link

Cumulative Playrate is not based on starting position? #4

Closed jeremyjh closed 2 years ago

jeremyjh commented 2 years ago

Hi, this is an awesome tool, I'm really excited about your ideas. I'm a little confused about how the tool works though, it seems like the cumulative playrate is based on the total likelihood of a move being played, which may make sense if you are starting with one move to make a top-level repertoire. But if I'm trying to build a specific repertoire, it doesn't seem to work. For example I've been thinking of playing the Leningrad Dutch recently, and trying to build out PGNs for mainline and side-lines. Because I specifically want to play the Leningrad I can't just start with 1. d4 f5. So I setup my opening book like this:

OPENINGBOOK: [{"Name": "Dutch London", "pgn": "1. d4 f5 2. Bf4 Nf6 3. e3 g6"}]

And the only thing generated is this:

[Event "Dutch London Line 1"]

1. d4 f5 2. Bf4 Nf6 3. e3 g6 4. Nf3 Bg7
{Move playrates:
+25.96% d4
+13.87% Bf4
+62.79% e3
+36.11% Nf3
Line cumulative playrate: +0.82%
Line winrate: +48.24% over 5232 games}%

Yet there are many other options that should be well within my threshold, just looking at the Lichess book configured the same as my config:

image

In addition to 4. Nf3 I expected to see h4, h3, etc. Probably, I'd see way to many with my current settings if it worked as I expected, and I'd dial it back. My full config looks like this at the moment:

#BOOK SETTINGS
OPENINGBOOK: [{"Name": "Dutch London", "pgn": "1. d4 f5 2. Bf4 Nf6 3. e3 g6"}]
#add the starting point PGNs you want to create repertoires for, with starting point pgns. The format for multiple PGNs and chapters looks like this: [{"Name": "Book A", "pgn": "1. e4 e5"},{"Name": "Book B", "pgn": "1. e4 e5 2. f4"}]
LONGTOSHORT: 0
#if you want the chapter ordered from long lines to short lines, instead of short to long, change to "1". Else 0.

#DATABASE SETTINGS
VARIANT: 'standard'
#Variants to include in the analysis
SPEEDS: ['rapid,classical']
#comma separated Formats to include in the analysis
RATINGS: ['1800,2000,2200']
#Ratings of the players to include in the analysis
MOVES: 10
#The number of most played moves to search over for the best move (minimum 5)

#MOVE SELECTION SETTINGS
DEPTHLIKELIHOOD: 0.005
#this controls how deep moves and lines are generated. The smaller the number the deeper the lines. Once cumulative line likelihood reaches this probability threshold, no futher continuations will be added (in percentage so 0.0025 = 0.25%)
ALPHA: 0.001
#The larger this number the more likely we are to select moves with less data. This is the confidence interval alpha (EG 0.05 = 95% CI), for deciding the lower bounds of how good a move's winrate is.
MINPLAYRATE: 0.001
#minimum frequency for a move to be played in a position to be considered as a 'best move' candidate, as a percentage (0.05= 5%)
MINGAMES: 19
#games where moves played this or less than this will be discarded (unless top engine move) (25 = 25 games).
CONTINUATIONGAMES: 10
# games where moves played this or less than this will not be considered a valid continuation (ie we don't want to be inferring cumulative probability or likely lines from tiny amounts of games/1 game)
DRAWSAREWINS: 0
#if you want to count draws as wins, for the win rate calculation, select 1. Else 0.

#ENGINE SETTINGS
ENGINEPATH: "/usr/local/bin/stockfish"
#the filepath where the engine is stored on your computer, so it can be accessed. Keep the 'r' character
CAREABOUTENGINE: 1
#care about engine eval of position or engine finishing = 1, dont care = 0
ENGINEDEPTH: 15
#what depth the engine should evaluate best moves. the higher the depth the longer the evaluation will take.
ENGINEFINISH: 1
#if we want the engine to complete lines to cumulative likelihood where data is insufficient, 1. Otherwise 0, and lines will end where there's no good human data
SOUNDNESSLIMIT: -99
#maximum centipawns we are willing to be down in engine eval, provided the winrate is better (-300 = losing by 3 pawns in eval). We never give up a forced mate, however.
MOVELOSSLIMIT: -99
#maximum centipawns we are willing to lose vs engine analysis pre move to play a higher winrate move. We never give up a forced mate, however.
IGNORELOSSLIMIT: 300
#centipawns advantage above which we won't care if we play a move that hits our loss limit, if it has a higher win rate (is easier to win)
ENGINETHREADS: 4 #my max 20
#how many threads you want the engine to use (check your comp and set 1 if unsure)
ENGINEHASH: 5120 #my max 10240
#how much hash you want the engine to use (check your comp and set to 16 if unsure)

# Change this to true/false depending if you want to see detailed output in your terminal when running the program
PRINT_INFO_TO_CONSOLE: true
raccrompton commented 2 years ago

Pretty sure this is working as intended:

If the depthlikelihood threshold is 0.005 then BookBuilder will stop finding replies for moves with less than 0.5% cumulative probability of occurring. Ie any move that, assuming you play your moves with 100% probability, you face less than 1 in every 200 games (with black in this case).

In your example, getting to 4.Nf3 has 0.82% cumulative probability. All the other moves you mention instead of Nf3 occur 3.5x less often, so they don’t hit then 0.5% cumulative probability threshold.

You have two options to get more moves:

  1. make depthlikelihood smaller (eg 0.001 or 0.0005); and / or
  2. start with ‘1.d4 f5’ or add multiple starting pgns, since perhaps the issue is that 2.Bf4 is actually a rareish move order compared with 2. c4... Which makes the cumulative probability drop significantly. So either you can add the other move orders as PGNs into BookBuilder OR, assuming BookBuilder picks the Leningrad in most lines (I think it does), just run ‘1.d4 f5’ and you’ll get lines following other move orders.
jeremyjh commented 2 years ago

I guess my point is that the probabilities of predecessor moves do not matter to me if I've already selected them as priors. You are right, it often will go into Leningrad-like positions but what if I want to play a variation that is less popular? Then I have to make the probability threshold very low but that would distort results in other book's I'm generating from the same config.

jeremyjh commented 2 years ago

Probably there is a misconception on my part about how to approach this - I mean it does make sense we should consider the cumulative probably from move 1 since that does indicate how likely I am to get these positions. I'll just keep playing with the numbers and see if I can make it work, thanks again for your time and work on this!

kamekura commented 2 years ago

@jeremyjh in that case, I think it's best to use multiple config files.

raccrompton commented 1 year ago

If the depthlikelihood threshold is 0.005 then BookBuilder will stop finding replies for moves with less than 0.5% cumulative probability of occurring. Ie any move that, assuming you play your moves with 100% probability, you face less than 1 in every 200 games (with black in this case).

In your example, getting to 4.Nf3 has 0.82% cumulative probability. All the other moves you mention instead of Nf3 occur 3.5x less often, so they don’t hit then 0.5% cumulative probability threshold.

You have two options to get more moves:

  1. make depthlikelihood smaller (eg 0.001 or 0.0005); and / or
  2. start with ‘1.d4 f5’ or add multiple starting pgns, since perhaps the issue is that 2.Bf4 is actually a rareish move order compared with 2. c4... Which makes the cumulative probability drop significantly. So either you can add the other move orders as PGNs into BookBuilder OR, assuming BookBuilder picks the Leningrad in most lines (I think it does), just run ‘1.d4 f5’ and you’ll get lines following other move orders.

Cheers,

On Sun, 17 Jul 2022 at 19:25, Jeremy Huffman @.***> wrote:

Hi, this is an awesome tool, I'm really excited about your ideas. I'm a little confused about how the tool works though, it seems like the cumulative playrate is based on the total likelihood of a move being played, which may make sense if you are starting with one move to make a top-level repertoire. But if I'm trying to build a specific repertoire, it doesn't seem to work. For example I've been thinking of playing the Leningrad Dutch recently, and trying to build out PGNs for mainline and side-lines. Because I specifically want to play the Leningrad I can't just start with 1. d4 f5. So I setup my opening book like this:

OPENINGBOOK: [{"Name": "Dutch London", "pgn": "1. d4 f5 2. Bf4 Nf6 3. e3 g6"}]

And the only thing generated is this:

[Event "Dutch London Line 1"]

  1. d4 f5 2. Bf4 Nf6 3. e3 g6 4. Nf3 Bg7 {Move playrates: +25.96% d4 +13.87% Bf4 +62.79% e3 +36.11% Nf3 Line cumulative playrate: +0.82% Line winrate: +48.24% over 5232 games}%

Yet there are many other options that should be well within my threshold, just looking at the Lichess book configured the same as my config:

[image: image] https://user-images.githubusercontent.com/90510/179419508-3ddf19f2-2b32-4d6c-bf5f-aa6c392644ad.png

In addition to 4. Nf3 I expected to see h4, h3, etc. Probably, I'd see way to many with my current settings if it worked as I expected, and I'd dial it back. My full config looks like this at the moment:

BOOK SETTINGSOPENINGBOOK: [{"Name": "Dutch London", "pgn": "1. d4 f5 2. Bf4 Nf6 3. e3 g6"}]#add the starting point PGNs you want to create repertoires for, with starting point pgns. The format for multiple PGNs and chapters looks like this: [{"Name": "Book A", "pgn": "1. e4 e5"},{"Name": "Book B", "pgn": "1. e4 e5 2. f4"}]LONGTOSHORT: 0#if you want the chapter ordered from long lines to short lines, instead of short to long, change to "1". Else 0.

DATABASE SETTINGSVARIANT: 'standard'#Variants to include in the analysisSPEEDS: ['rapid,classical']#comma separated Formats to include in the analysisRATINGS: ['1800,2000,2200']#Ratings of the players to include in the analysisMOVES: 10#The number of most played moves to search over for the best move (minimum 5)

MOVE SELECTION SETTINGSDEPTHLIKELIHOOD: 0.005#this controls how deep moves and lines are generated. The smaller the number the deeper the lines. Once cumulative line likelihood reaches this probability threshold, no futher continuations will be added (in percentage so 0.0025 = 0.25%)ALPHA: 0.001#The larger this number the more likely we are to select moves with less data. This is the confidence interval alpha (EG 0.05 = 95% CI), for deciding the lower bounds of how good a move's winrate is.MINPLAYRATE: 0.001#minimum frequency for a move to be played in a position to be considered as a 'best move' candidate, as a percentage (0.05= 5%)MINGAMES: 19#games where moves played this or less than this will be discarded (unless top engine move) (25 = 25 games).CONTINUATIONGAMES: 10# games where moves played this or less than this will not be considered a valid continuation (ie we don't want to be inferring cumulative probability or likely lines from tiny amounts of games/1 game)DRAWSAREWINS: 0#if you want to count draws as wins, for the win rate calculation, select 1. Else 0.

ENGINE SETTINGSENGINEPATH: "/usr/local/bin/stockfish"#the filepath where the engine is stored on your computer, so it can be accessed. Keep the 'r' characterCAREABOUTENGINE: 1#care about engine eval of position or engine finishing = 1, dont care = 0ENGINEDEPTH: 15#what depth the engine should evaluate best moves. the higher the depth the longer the evaluation will take.ENGINEFINISH: 1#if we want the engine to complete lines to cumulative likelihood where data is insufficient, 1. Otherwise 0, and lines will end where there's no good human dataSOUNDNESSLIMIT: -99#maximum centipawns we are willing to be down in engine eval, provided the winrate is better (-300 = losing by 3 pawns in eval). We never give up a forced mate, however.MOVELOSSLIMIT: -99#maximum centipawns we are willing to lose vs engine analysis pre move to play a higher winrate move. We never give up a forced mate, however.IGNORELOSSLIMIT: 300#centipawns advantage above which we won't care if we play a move that hits our loss limit, if it has a higher win rate (is easier to win)ENGINETHREADS: 4 #my max 20#how many threads you want the engine to use (check your comp and set 1 if unsure)ENGINEHASH: 5120 #my max 10240#how much hash you want the engine to use (check your comp and set to 16 if unsure)

Change this to true/false depending if you want to see detailed output in your terminal when running the programPRINT_INFO_TO_CONSOLE: true

— Reply to this email directly, view it on GitHub https://github.com/raccrompton/BookBuilder/issues/4, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWBKZUPIZY7X6JWJHJ4DJLVURFYFANCNFSM532EMXWQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>