Closed SG7 closed 2 years ago
Thanks for reporting.
The games selected for the Lichess opening explorer are a random sample of all games, so whenever the explorer is reindexed, different games can be selected. And indeed, there have been various reindexing runs since July together with updates of the explorer server. Positions that had very few games may not have any games at all after reindexing, and positions that had no games may now have a few.
I understand this is not ideal, so in the future the sample will be deterministic (via https://github.com/lichess-org/lila/commit/e47b54c6bca08f76087e0e4138d31c75008850c3 and https://github.com/lichess-org/lila-openingexplorer/commit/d1b55a43eb4bbaace45c244d7f33d86b11c7ee41).
Unfortunately this can't work retractively.
"The games selected for the Lichess opening explorer are a random sample of all games"
“I understand this is not ideal, so in the future the sample will be deterministic"
"Unfortunately this can't work retractively.
For example after 1.e4 e5 the number for 2.Nf3 is not a real number but a number based on a random sample of all games?
Yes. To clarify, there is no extrapolation going on, so every game that is counted there is real, but many more games have been played on Lichess.
Also, as you have seen the sampling rate is chosen based on time control and rating, so the sampling is biased across time control and rating groups, but unbiased within each group.
"As of the time of writing, the Lichess opening book features a nice summation row at the bottom, which tells the player the aggregate number of games played in position X"
As per your explanations: 1) This is NOT "the aggregate number of games played in position X" but the number of the sampled games. 2) It is also NOT the real relative frequency of played moves. But frequency that was derived from the game samples.
What crossed my mind was to have a feature with the "real query" to find all games via given FEN over the full DB, not just a query over the sampled games. However, that would require substantial development efforts. One of the many problems is the presentation. A given position may be found in millions of games. That alone effectively kills the usability of this feature.
In my opinion, the Opening Explorer is the best feature on lichess! I really admire its usefulness. Thank you for the great feature! Last July, I extensively used it to research thousands of lines. I made some studies and marked down hundreds of games and positions found by the Opening Explorer. Recently, I came back to my notes and I cannot find many positions which I wrote down. I have posted some of them here: https://lichess.org/forum/lichess-feedback/bug-the-reindexing-missed-games Example from that post. Enter into the Opening Explorers this FEN r1bq1rk1/ppp2p1p/3b2p1/4Q3/3p4/3B4/PPP3PP/RNB1R1K1 w - - 1 14
In this position only 14. Qe2 move is presented for the game [White "the-prod"] - [Black "cleber_x"] [Date "2017.09.28"] lichess.org/AhaDaNZT
However, 14. Qg5 was played in [White "keon5080"] - [Black "ramtel"] [Date "2021.06.09"] lichess.org/PlP8csWq
If you enter this FEN into the opening explorer r1bq1rk1/ppp2p1p/3b2p1/6Q1/3p4/3B4/PPP3PP/RNB1R1K1 b - - 2 14 this position will not be found.
Opening explorer cannot find them. Yet, the game still exists in the DB. I am not sure what went wrong from July to now. The Opening Explores does not know about those positions anymore. Could you kindly take a look at this behavior?