lichess-org / lila

♞ lichess.org: the forever free, adless and open source chess server ♞
https://lichess.org
GNU Affero General Public License v3.0
15.58k stars 2.28k forks source link

Evaluation missmatch between gauge and computer analysis over whole game #9119

Closed vincentwoelfer closed 3 weeks ago

vincentwoelfer commented 3 years ago

Exact URL where the bug happened https://lichess.org/MveigSpU/black#31

Steps to reproduce the bug Open the URL and go to move 16 by white "dxe5"

What did you expect to happen? After move 16 from white "dxe5", the evaluation gauge shows a mate in seven for black "#-7". This checks out from my not so advanced understanding of chess, i could not find a way for white to avoid this mate if black plays the suggested moves. I requested a computer analysis of the whole game. This possible mate should be visible there.

What happened instead? The evaluation over the whole game does not show a mate in 7 for that move. Instead it shows a "-4.7", both as a number when hovering over that move and the graph shows a -4.7.

Operating system and browser version Ubuntu 18.04 Firefox 89.0 64bit

Local analysis settings: Stockfish 13+ NNUE Depth 28/28

image

benediktwerner commented 3 years ago

This is because the graph is based upon the server analysis but the eval shown at the right is based upon your local analysis which will have a higher depth (and therefore be more accurate) if you let it run for a little bit.

I guess we could update the graph when local analysis reaches a certain depth like we already do with the eval next to moves. The main question is whether it would cause weird inconsistencies because local analysis tends to be more sporadic and is not necessarily done on every move. But at least it seems less problematic than adjusting blunder annotations.

vincentwoelfer commented 3 years ago

While i understand the two different ways/sources for the evaluation and the fact that they may deviate from another, leading to the case i perceived as a bug: Why are they so different? I was always under the impression that the whole-game analysis done through the "Request Analysis" button was performed on "the server" (from the users point of view) because it's very deep and computationally expensive. Not finding a forced mate in 7 in a situation where half of the moves involved trigger a check (which i through is a move the engine almost always takes into consideration) seems like the evaluation is very shallow.

You may notice that i possess some "dangerous superficial knowledge" without actually understanding the evaluation process. If the answer simply is "the missmatch is because the whole-game evaluation is different than your locally computed evaluation and we can't increase the server evaluation depth because we dont have enough ressources" then just ignore this bug.

benediktwerner commented 3 years ago

It indeed is a bit surprising that the server analysis didn't find this but it's not completely unexpected and it happens fairly frequently that the server analysis misses mates with a moderate amount of moves which the local analysis finds pretty quickly. At first glance, it seems like the problem might be that the first move "blunders" a rook and the last few moves of the mate are pretty open. The server analysis is indeed somewhat shallow which is necessary for it to complete quickly. It's still much faster than if you'd do the same amount of analysis in the browser. But if you want to analyze 60 moves in maybe 10 seconds you can't really spend that much time on each move.

The general gist of the analysis is still correct though and shows a fairly overwhelming advantage for black.

But let's still keep this issue open for a bit to consider whether we want to update the graph with the local eval.

danispringer commented 8 months ago

12813 as mentioned here, this seems to be a "misconnection" between design choices and actual UI design. Unclear what happens when requesting an analysis, such as: what depth is used? What depth can I choose (if any)?

I would rather wait (much) longer and be able to request something like "run analysis for this game with depth 30 for every move", than having a quick analysis which calls moves blunders and inaccuracies when they simply are not (such as mate being clearly unavoidable, but a move being called a blunder because it supposedly makes mate unavoidable then, when in reality it already was) [even if a refresh "fixes" that, which no user should have to guess, as mentioned here: #14696 ]

niklasf commented 3 weeks ago

Closing, with #12813 remaining open.