lichess-org / lila-openingexplorer

Opening explorer for lichess.org that can handle all the variants and trillions of unique positions
http://lichess.org/analysis#explorer
GNU Affero General Public License v3.0
135 stars 34 forks source link

Opening Explores does not know about some positions anymore #156

Closed SG7 closed 2 years ago

SG7 commented 2 years ago

In my opinion, the Opening Explorer is the best feature on lichess! I really admire its usefulness. Thank you for the great feature! Last July, I extensively used it to research thousands of lines. I made some studies and marked down hundreds of games and positions found by the Opening Explorer. Recently, I came back to my notes and I cannot find many positions which I wrote down. I have posted some of them here: https://lichess.org/forum/lichess-feedback/bug-the-reindexing-missed-games Example from that post. Enter into the Opening Explorers this FEN r1bq1rk1/ppp2p1p/3b2p1/4Q3/3p4/3B4/PPP3PP/RNB1R1K1 w - - 1 14

In this position only 14. Qe2 move is presented for the game [White "the-prod"] - [Black "cleber_x"] [Date "2017.09.28"] lichess.org/AhaDaNZT

However, 14. Qg5 was played in [White "keon5080"] - [Black "ramtel"] [Date "2021.06.09"] lichess.org/PlP8csWq

If you enter this FEN into the opening explorer r1bq1rk1/ppp2p1p/3b2p1/6Q1/3p4/3B4/PPP3PP/RNB1R1K1 b - - 2 14 this position will not be found.

Opening explorer cannot find them. Yet, the game still exists in the DB. I am not sure what went wrong from July to now. The Opening Explores does not know about those positions anymore. Could you kindly take a look at this behavior?

niklasf commented 2 years ago

Thanks for reporting.

The games selected for the Lichess opening explorer are a random sample of all games, so whenever the explorer is reindexed, different games can be selected. And indeed, there have been various reindexing runs since July together with updates of the explorer server. Positions that had very few games may not have any games at all after reindexing, and positions that had no games may now have a few.

I understand this is not ideal, so in the future the sample will be deterministic (via https://github.com/lichess-org/lila/commit/e47b54c6bca08f76087e0e4138d31c75008850c3 and https://github.com/lichess-org/lila-openingexplorer/commit/d1b55a43eb4bbaace45c244d7f33d86b11c7ee41).

Unfortunately this can't work retractively.

SG7 commented 2 years ago

"The games selected for the Lichess opening explorer are a random sample of all games"

“I understand this is not ideal, so in the future the sample will be deterministic"

"Unfortunately this can't work retractively.

niklasf commented 2 years ago

For example after 1.e4 e5 the number for 2.Nf3 is not a real number but a number based on a random sample of all games?

Yes. To clarify, there is no extrapolation going on, so every game that is counted there is real, but many more games have been played on Lichess.

Also, as you have seen the sampling rate is chosen based on time control and rating, so the sampling is biased across time control and rating groups, but unbiased within each group.

SG7 commented 2 years ago

"As of the time of writing, the Lichess opening book features a nice summation row at the bottom, which tells the player the aggregate number of games played in position X"