official-stockfish / books

Chess books used to develop Stockfish
155 stars 31 forks source link

refinement of UHO Lichess book #41

Closed robertnurnberg closed 4 months ago

robertnurnberg commented 6 months ago

The PR attempts to refine the book UHO_Lichess_4852_v1.epd, introduced in https://github.com/official-stockfish/books/pull/39, by removing very onesided openings, which in our LTC tests on fishtest have almost exclusively led to draws or wins, respectively.

To this end, I used the repo ana-opening-book to analyse the more than 62 million games that were played for the book on fishtest under LTC conditions since its introduction in October 2023. I then removed any exit with a draw rate outside of [0.1, 0.9], if at least 10 games were played for it.

The new book UHO_Lichess_4852_v2.epd has the following properties:

The order of the exits has been preserved, to allow easy comparison using diff, for example.

As always, there is a balance to be struck between variety of exits, and sharpness of the book. I have tried to remove only very few exits, as seen by the mild draw rate bounds [0.1, 0.9] and the fact that I included games from PTs in the analysis. I believe it is best to be cautious, to not undo a lot of the good work that went into UHO_Lichess_4852_v1.epd.

Of course, if this new book were to be adopted for fishtest, then a similar revision process could be done after 50M-100M games having been played with the new book.

Finally, the commands to create the book, using the tools from ana-opening-book and cdblib/addons are as follows:

./build/src/analysis --dir /disk2/fishtest/ --matchBook UHO_Lichess_4852_v1.epd --fixFENsource UHO_Lichess_4852_v1.epd
python post_process_csv.py --bookFile UHO_Lichess_4852_v1.epd
python post_process_csv.py UHO_Lichess_4852_v1.csv --drawRateMin 10 --drawRateMax 90 --drawRateGames 10 --outFile UHO_Lichess_4852_v2.csv
python ../cdblib/addons/fens_filter_overlap.py UHO_Lichess_4852_v1.epd UHO_Lichess_4852_v2.epd > removed.epd
python ../cdblib/addons/fens_filter_overlap.py UHO_Lichess_4852_v1.epd removed.epd > UHO_Lichess_4852_v2.epd
robertnurnberg commented 6 months ago

Using the command

python post_process_csv.py UHO_Lichess_4852_v1.csv UHO_Lichess_4852_v2.csv

we can visualize the differences between the v1 and v2 books.

The first plot shows the frequencies of the observed draw rates for the two books: one can clearly see that for v2 the very drawish and very one-sided openings have been removed.

fencsv_drawrate

The second plot shows the number of games that were played for each exit in the two books: here it is nice to see that we have mainly removed exits that have seen quite many games, so their exclusion will not be down to chance.

fencsv_games

The third plot shows the distribution of game plies for the exits in the two books, and there is hardly any difference. That means that v2 keeps the nice coverage of the different book exit depths.

fencsv_depth

robertnurnberg commented 6 months ago

Using the commands

python ../cdblib/addons/score_fens_locally.py removed.epd uho_cdbpv.epd.gz > removed_cdb.epd
python ../cdblib/addons/plot_fens_cdb_dist.py removed_cdb.epd --density 0

to obtain and visualize the cdb evaluations thanks to the scores from UHOtrack we can visualize the eval distrubution on cdb for the removed exits.

The evals of the removed exits seem to peak at around +/-1.00cp, which due to the damping of cdb may often already be completely won/lost. The fact that only very few exits with low evals are removed is probably due to the fact that the book v1 contains hardly any such exits.

removed_cdb

Disservin commented 6 months ago

Very nice to see some results of the new repo analysis, I guess this will be of particular interest to @vondele so I'll leave it open until for the time being.

vondele commented 6 months ago

interesting analysis. I wonder if a 10 game limit will not lead to many false positives, i.e. what's the probability that a fine opening (50% win rate) would be removed after 10 games? What fraction of the removed openings are in that case false positives?

robertnurnberg commented 6 months ago

A 50% opening that was played 10 times will be removed with probability 0.2%, I think.

It's like asking "what are the chances to see 10 times head or 10 times tail when you toss a coin 10 times", which I believe is 2 * 2^{-10} = 2^{-9} = 1/512 = 0.2%.

Increasing the game count will decrease that probability exponentially.

robertnurnberg commented 6 months ago

Here the distributions of the removed positions' statistics.

We see that almost all of these were played 12 times or more. So if we, for now, keep the 703 exits that were only played 10 times, we can reduce the probability of the false positives to 0.05%. Here it should be stressed that this probability of false positives will also reduce dramatically for any removed exits with more than 12 games played, of course. E.g. for exits with 14 games played it will be about 0.01% etc.

fencsv_drawrate

fencsv_games

fencsv_depth

Attached also the file removed.epd.gz.

PS: These plots where created with the command python post_process_csv.py --bookFile removed.epd.gz && python post_process_csv.py removed.csv

robertnurnberg commented 6 months ago

As discussed on discord, here the scatter plot of observed draw rates vs. cdb evals. @vondele

cdb_scatter

This plot was created with the command python post_process_csv.py UHO_Lichess_4852_v1.csv --cdbFile uho_cdbpv.epd.gz

vondele commented 6 months ago

So, I could reproduce the analysis, I get this plot with a fit (linear and cubic) for the cdb eval depending on the draw rate:

image

(probably needs to be interpreted with care, as this is for a special set of positions, i.e. those with SF wdl 48-52).

robertnurnberg commented 6 months ago

Just putting this here from my message on discord:

looking at the new plots: we would expect 703 0.2% = 1.4 false positives amongst the exits having been played exactly 10 times similarly 3990 0.05% = 2 for the exits played 12 times and 4166 * 0.01% = 0.4 for the exits played 14 times so in total for the current PR we would expect maybe 5-10 false positives, amongst 42K removed exits personally, I think that is acceptable in the drive to make the book more efficient in distinguishing good and bad patches on fishtest but if you prefer to re-run the filtering with different parameters, let me know. possibilities are [0.05, 0.95] or [0.1,0.9] with minimum of 12 or 14 moves, for example

vondele commented 6 months ago

Now, just picking the subset of openings with exactly 50% draws, and plotting the corresponding distribution of cdb evals, together with a Gaussian fit:

image

robertnurnberg commented 6 months ago

Oh, very interesting. Suggest the 50% winrate boundary on cdb is at about 92.5. Probably roughly chimes with what many bookmakers for computer chess competitions feel about it.

vondele commented 6 months ago

yes, that's probably the right value to us, at least for positions that are not too deeply explored (otherwise the decay will have some unclear effects).

vondele commented 4 months ago

after some additional testing, the removed.epd book by itself is still pretty good. Some STC Elo measurement of master as of now and SF16 showed:

UHO_4060_v3.epd, 29.79 +- 0.49 UHO_Lichess_4852_v1.epd, 30.80 +- 0.49 noob_3moves.epd, 13.13 +- 0.24 removed.epd, 25.78 +- 0.50

quoting from discord lichess book is so good even its worst parts still decent 😅

While we have an interesting analysis, right now not worth the effort of updating the books.