rpdelaney-archive / python-chess-annotator

Reads chess games in PGN format and adds annotations using an engine
GNU General Public License v3.0
62 stars 29 forks source link

Negative ACPL values #8

Open rpdelaney opened 6 years ago

rpdelaney commented 6 years ago

python-chess-annotator sometimes returns games where one player has negative ACPL. Anecdotally it seems to happen more often when on a tight time budget (1 minute). Cause unknown.

I considered checking for negative values and enforcing a floor of 0, but that would just be masking the bug.

ddugovic commented 6 years ago

Try analyzing the game from last ply to first ply & see if the issue persists?

rpdelaney commented 6 years ago

If I'm understanding your suggestion, I think this is already the existing behavior. python-chess-annotator starts at the end of the game and works its way backward.

The idea is to populate the engine's hash table with analysis about future positions as we go, hopefully improving the quality of the analysis as we reach more complicated middlegame positions. I haven't done any testing to verify that this actually works as intended, however - we don't set, nor provide any options for the user to set, the engine's hash table size (though we probably should), and I don't even know if python-chess' UCI implementation would preserve the hash between engine calls.

Regardless, can you maybe expand on your thinking here? How do you think/suspect reversing the processing order might affect ACPL calculations?

rpdelaney commented 6 years ago

Also, this happens rarely enough that it's very difficult to reproduce. Very often, if I get a negative ACPL, I'll re-run the analysis immediately after and it will turn up positive the second time around.

ddugovic commented 6 years ago

Oops, you're correct that was my idea (to populate the hash table such that already-searched positions and evaluates may be re-used)... hm.

Of course theoretically (with "accurate" evaluations) there should never be an evaluation gain (negative CP loss) between positions. But the default hash size should be adequate, and Lichess analyses have a slightly less tight budget (4 CPU-seconds/move, so ~5 CPU-minutes/game) without major quality problems.

It would be useful to see (rare) examples of the issue in order to measure to what extent any of the following ideas help:

  1. ML-based (with the cost function being some measure of analysis quality) time budgeting using all available data as input (including numbers produced by Stockfish eval command)
  2. Derive a formula from Stockfish timeman.cpp and/or find some way to use it to auto-budget CPU (time, threads, memory) to use
  3. Upon detecting an "inaccurate" evaluation, use some heuristic to gracefully recover and produce more "accurate" evaluations.

This all presumes that evaluations can be "accurate"; of course, every legal chess position falls in one of three categories:

and evaluations are simply approximations in cases where a mate (or forced draw) cannot be detected.

niklasf commented 5 years ago

So yeah, fundamentally this is not a bug, just a consequence of the fact that chess engines are not perfect players/evaluators.

Capping at 0, or even reporting negative ACPL scores seems fine.

If absolutely nescessary, each position could be evaluated with sufficiently large MultiPV. Then calculate the loss as the difference between the picked move and the best move as seen from the current position (rather than the difference between the positions). This would always produce consistent results, but is rather inefficient.