lichess-org / lila

♞ lichess.org: the forever free, adless and open source chess server ♞
https://lichess.org
GNU Affero General Public License v3.0
15.66k stars 2.28k forks source link

Evaluate position complexity from SF output #275

Open ornicar opened 9 years ago

ornicar commented 9 years ago

Allows user profile complexity stats, and to interrupt SF computations when stability of evaluation is reached.

Read: http://en.chessbase.com/news/2006/world_champions2006.pdf Contact: Twipsy on #lichess

Vinvin20 commented 9 years ago

Here's an interesting post about complexity : http://www.talkchess.com/forum/viewtopic.php?p=636034#636034 You should probably get the source easily as Ferdinand Mosca share a lot.

ddugovic commented 9 years ago

I believe ~this is an important feature, and~ it's a challenging problem to solve! What's complex for a human agent might not be complex for a computer, and vice versa!

@Vinvin20 I appreciate the enthusiasm and that's one possible measure of complexity. (I did try to contact Twipsy months ago but haven't heard back.)

Another possible measure of complexity are "proof number" and "disproof number" from PN search.

I have a feeling that we're overlooking some key concept here: complexity isn't simply about finding the best move, since it could also be about finding a good move, or about correctly evaluating the best move (or evaluating the position in general).

Vinvin20 commented 8 years ago

The Pyhton source who measure the complexity to find some remarkable moves is here : http://home.scarlet.be/vincentlejeune/chess/game_analyzer_v38.zip in file "game_analyzer_v38.py" -> 3. Calculation of position complexity by "pv move changes" starts at iteration depth equal to 9 def analyze_complexity(engineName, fen, _eng_option, movetimev, multipvv, nshortPv): """ Position is complex when the engine move changes more than once """

ddugovic commented 6 years ago

@Vinvin20 I downloaded the latest version of Mosca's "game analyzer" annotator and slightly modified it to not produce null moves, and to put comments after the main line instead of before a variation.

Then I uploaded the analyzed PGN as a study.

shermansiu commented 1 year ago

Stockfish already employs alpha-beta pruning to reduce the number of nodes evaluated. There currently isn't a way to make the engine calculate the position more quickly without compromising on the quality of the evaluations. (i.e. The best way to approximate this would be to cap the max depth, which we already do).

However, reducing the depth has an impact on evaluating endgames with >7 pieces (where we can't use Sygyzy tables).

Preserving Stockfish-level evaluations while making more efficient evaluations is equivalent to making a better chess engine, which is a hard problem. (Some of the best things we might have for making faster versions of closely performing neural nets are model distillation or post-training quantization. However, AFAIK, Stockfish already applies PTQ (not 100% sure for model distillation) and Leela applies model distillation and PTQ).

niklasf commented 1 year ago

Good point. So all we could hope for is a metric to display to the user. I guess that would be something like depth at a particular node count? Tempted to close this issue.

shermansiu commented 1 year ago

I suppose? But that would tell you more about how many nodes were pruned during the evaluation, which doesn't necessarily correlate well with how easy it is for a human to play the position ("Allows user profile complexity stats"). Depth vs. node count isn't easily interpretable in general.

The best alternative would be to train a CNN/ViT to guess the minimum Glicko required for a human to make a decent move (above a certain probability threshold), but that would require us to load in an extra machine learning model to guess the Glicko.

Nevertheless, training such a model isn't exactly trivial (e.g. Should we track the distribution of Glicko scores (better, but maybe less interpretable?) or the minimum Glicko (easier to understand, but the information is lossy), esp. since players of various skill levels may play common openings? What threshold should we use for accepting scores as being from a particular Glicko? How do we handle multiple "good enough" moves? And how do we ensure that we aren't over-reliant upon careful hyperparameter optimization?)

shermansiu commented 1 year ago

As for "interrupt[ing] SF computations", there isn't a better way to do it except for choosing a lower depth. But that can obviously lower the quality of the evaluations.