official-stockfish / Stockfish

A free and strong UCI chess engine
https://stockfishchess.org/
GNU General Public License v3.0
11.56k stars 2.27k forks source link

Blunder in TCEC Cup Finals #2138

Closed adentong closed 5 years ago

adentong commented 5 years ago

In TCEC Cup Finals Game 9, SF blundered at around move 156 in a 7-piece endgame that was a tb draw. I don't know how or why it happened, but I'm sure there are people who do, so I'm opening an issue for discussion.

vondele commented 5 years ago

using this fen Q5K1/6N1/1kp3q1/8/1P6/8/8/8 b - - 62 155 50 searches of 45s using 32 threads 16Gb hash, and 5-men syzygy all yield the correct bestmove (b6b5). Furthermove all ~2000 PVs from depths 20-50 start with the correct move.

Maybe 6-men changes the picture dramatically...?

adentong commented 5 years ago

I do remember seeing in twitch chat someone mentioned hash collision. Think that person said there's no problem if we just look at that that particular fen, but if we start from a few moves prior then weird stuff starts happening?

xoto10 commented 5 years ago

Yes, that is what I was going to say. Jjoshua2 (I think) said it worked ok with fresh hash, but by playing through the previous 4 or 5 moves he could create the problem.

Matthies commented 5 years ago

I have tested SF_19042711 on the position 5K2/1k4N1/2p5/Q7/1P4q1/8/8/8 b - - 66 157 one move later which is already lost according to 7-men. I'm using 2GB (clean) hash, one thread and 6-men syzygy as in TCEC. SF evaluates this to 0.00 at depth 80 and wants to play Qd4. The same misevaluation may have happened at move 156 in the real game. Seems that SF can't find a safe way to 6-men board and just choses a losing one. I doubt that this has to do with hash collision.

vondele commented 5 years ago

could you repeat the test with a version that's a few commit newer, in particular 5c4002aa827653a125130a0d01d0bb96dd2b8bae aka stockfish_19050219

Matthies commented 5 years ago

stockfish_19050219 also evaluates to 0.00 (even earlier/ at lower depth than 19042711). Qh4 is the preferred move.

vondele commented 5 years ago

ok thanks. Indeed, also master scores this lost fen as 0.0. I agree that's the likely cause.

Matthies commented 5 years ago

Hmm. I haven't looked at halfmovecounter which is 66. Syzygy says "DTZ39" for the two preferred moves. So SF is right with these two moves and the game isn't lost here. But in the game it played 157. ... Qg6 which is the real blunder. One move later at 5K2/1k4N1/2p3q1/4Q3/1P6/8/8/8 b - - 68 158 with hash NOT cleared the evaluation still shows 0.00. When I clear the hash and search again, SF immediately sees the loss. I remember that someone in chat metioned "no 50-moves-information in hash" which seems related to the problem.

vondele commented 5 years ago

right, overlooked it as well.

leesailer commented 5 years ago

SF was playing on increment, so time trouble needs to be considered.

On Fri, May 10, 2019 at 1:44 PM Joost VandeVondele notifications@github.com wrote:

right, overlooked it as well.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/official-stockfish/Stockfish/issues/2138#issuecomment-491424476, or mute the thread https://github.com/notifications/unsubscribe-auth/AH6A5SBFM5DQYLO4BXEK3HLPUXNCLANCNFSM4HMDC2GA .

Matthies commented 5 years ago

I don't think that time trouble is the reason. SF searched to depth > 70 and I doubt that the eval would have changed if it searched longer. Someone with a big machine may test.

My simple explanation:

  1. During the search SF evaluates a position to be draw because halfmovecounter + dtz > 100
  2. SF stores draw score for this position in tt without the information for minimal halfmovecounter needed to secure the 50-moves draw
  3. SF later reads this draw score from tt and takes a "shortcut" to this position now with halfmovecounter + dtz < 100.

Explanation sounds simple. Solving the issue is probably harder. It seems difficult to save halfmovecounter information in the tt and use it without loss of elo.

vdbergh commented 5 years ago

Texel hashes the 50 move counter if the position is sufficiently close to the 50 move limit. I would assume that with sufficient effort such a thing can be achieved without losing elo. But I am pretty sure it will never be an elo gain since the issue is too rare to make a difference. So a non-starter for SF.

MJZ1977 commented 5 years ago

It seems to me that SF verify TT evaluations at PV. So, it should see that there is a line leading to non draw after long time thinking. But indeed, TT values troubles the search because all preliminar nonPV searches are leading to a draw. It is not hard to store separate TT positions for rule50 > threshold. You have just to add something like : posKey = pos.key() ^ Key(excludedMove << 16) ^ Key((pos.rule50_count() > 90) << 17)

Perhaps senior developers can correct me. I launched an STC test here : http://tests.stockfishchess.org/tests/view/5cd6c7950ebc5925cf050a7b

@Matthies, can you test it on your machine to see if it sees the blunder Qg6 quickly even after a previous long search?

Matthies commented 5 years ago

@MJZ1977 I did the following with both your patch and master:

setoption name Hash value 2048
setoption name SyzygyPath value h:\tb
position fen 5K2/1k4N1/2p5/Q7/1P4q1/8/8/8 b - - 66 157
go depth 80
position fen 5K2/1k4N1/2p3q1/4Q3/1P6/8/8/8 b - - 68 158
go depth 80

Bad luck that I couldn't reproduce what I saw yesterday (using Arena and some older SF). Both versions detect the loss in the second position. Master in depth 62, your patch in depth 60. I will do a second test with a slightly lower depth search on the first position...

joergoster commented 5 years ago

@MJZ1977 Doesn't work for me. The loss in the 2nd position above isn't found. 1024 MB Hash, 3 Threads, syzygy-6, 2-minute search (after searching the 1st position to fill the hash):

info depth 89 seldepth 33 multipv 1 score cp 0 nodes 524958668 nps 4374619 hashfull 907 tbhits 18694459 time 120001 pv g6h6 e5d4 b7a6 d4c4 a6b7 f8g8 h6g6 c4f4 b7a6 g8f8 a6b5 g7e8 b5a4 e8d6 a4a3 d6c4 a3a4 c4b6 a4a3 bestmove g6h6 ponder e5d4

What works for me is this:

    // At non-PV nodes we check for an early TT cutoff
    if (  !PvNode
        && ttHit
        && tte->depth() >= depth
        && pos.rule50_count() < 80                 <== 
        && ttValue != VALUE_NONE // Possible in case of TT access race
        && (ttValue >= beta ? (tte->bound() & BOUND_LOWER)
                            : (tte->bound() & BOUND_UPPER)))
MJZ1977 commented 5 years ago

OK, unfortunately it doesn't work then. In my computer it works well, but perhaps because I don't have syzygy-6.

adentong commented 5 years ago

@joergoster Would you be able to submit a test to see if your change works as a simplification?

MJZ1977 commented 5 years ago

Can someone explains why we save TB results with depth + 6? Can this lead to a problem of bad TT result?

                    tte->save(posKey, value_to_tt(value, ss->ply), ttPv, b,
                              std::min(DEPTH_MAX - ONE_PLY, depth + 6 * ONE_PLY),
                              MOVE_NONE, VALUE_NONE);
vdbergh commented 5 years ago

@MJZ1977

Can someone explains why we save TB results with depth + 6? Can this lead to a problem of bad TT result?

To give them some protection against overwriting (TB scores are hard scores). They are not saved with DEPTH_MAX-1 since that might fill up the TT with useless entries which could never be overwritten.

joergoster commented 5 years ago

@adentong Why should this be accepted as simplification? Maybe as a bugfix, but then what exactly is the bug? However, if anybody wants to give it a try or even submit a PR, feel free to do so.

@Matthies Where should the info about dtz come from? During the search only WDL tables are being probed ...

Let me state the obvious: if you want perfect play in a rather complicated 7-man endgame, you need 7-man bases!

MJZ1977 commented 5 years ago

@joergoster : can you please verify that you still have the problem if you don't use syzygy tables ? Perhaps it can give us an indication how to find the bug ?

Matthies commented 5 years ago

During the search SF evaluates a position to be draw because halfmovecounter + dtz > 100

@Matthies Where should the info about dtz come from? During the search only WDL tables are being probed ...

Here dtz doesn't mean probing the dtz TB but "opponent cannot reach winning 6-piece-position inside the 100 moves counter". But your conclusion is probably right.

adentong commented 5 years ago

@adentong Why should this be accepted as simplification? Maybe as a bugfix, but then what exactly is the bug? However, if anybody wants to give it a try or even submit a PR, feel free to do so.

@Matthies Where should the info about dtz come from? During the search only WDL tables are being probed ...

Let me state the obvious: if you want perfect play in a rather complicated 7-man endgame, you need 7-man bases!

Gosh that was stupid. I did mean to say as a bug fix.

miguel-l commented 5 years ago

Just an idea, maybe instead of VALUE_DRAW, we can make something like VALUE_DRAW_IN_PLY so we can check the 50 move rule after accessing the TT. Is something like this feasible?

svivanov72 commented 5 years ago

Well, there is a room in TT to store the 50-move counter along with draw scores. I submitted a test here: http://tests.stockfishchess.org/tests/view/5cd894650ebc5925cf054112 Since it won a 100-game match at home, I am trying it in the standard test (just dreaming).

Just an idea, maybe instead of VALUE_DRAW, we can make something like VALUE_DRAW_IN_PLY so we can check the 50 move rule after accessing the TT. Is something like this feasible?

miguel-l commented 5 years ago

I tried the value trick I mentioned: http://tests.stockfishchess.org/tests/view/5cd8f21b0ebc5925cf054bc5

I'm not too sure if it's necessary, and perhaps I'm just overthinking it, but it's another option.

mstembera commented 5 years ago

It may be useful to distinguish between draws where we are up material versus down. If we are up material we should reject 3-fold repetitions or stalemeates since a small inaccuracy from our opponent may result in a win for us. Conversely if we are down material we should take a 3-fold/stalemate as soon as possible because we could play an inaccuracy and lose. In the cup game this strategy would have resulted in taking the 3-fold. I'm not sure how to best implement it.

xoto10 commented 5 years ago

I've tried things along these lines before (different kings of draw). All draws being the same value means sf does end up in some very tricky to hold draws, and perhaps doesn't try as hard as it could when slightly ahead.

One thing i tried at home ages ago, was to increase the size of the values coming from evaluate by x2 or x4, to create a couple of extra bits for this purpose. Either to get more accuracy out of evaluate() or to add info in the lsb. I never got it to work (probably endgame related). Or, could we even use a 32 bit value, with the current value in the top 16 bits, and the lower 16 then available for extra info to differentiate more desirable lines?

Alayan-stk-2 commented 5 years ago

I've tried things along these lines before (different kings of draw). All draws being the same value means sf does end up in some very tricky to hold draws, and perhaps doesn't try as hard as it could when slightly ahead.

I share this impression from watching a lot SF play drawn endgames.

MortenLohne commented 5 years ago

Were any PRs ever merged to fix this issue, or is it still present?

adentong commented 5 years ago

I think several tries have been tested, but none passed fishtest. I think I'm just going to close this now since this was literally the only time I've seen it happening in my 2 years of watching sf so it's rare enough to warrant not paying it too much attention.

ALAKTORN commented 4 years ago

This happened again on game 30 at CCC “Fat Fritz vs. Stockfish” https://www.chess.com/computer-chess-championship#event=fat-fritz-vs-stockfish&game=30

Alayan-stk-2 commented 4 years ago

The better the hardware, the longer the TC, and the higher the chance hash 50mr issues will strike.

This isn't very common, but isn't very rare either. It may also strike outside of the PV harming search in a less obvious way that in those games with a sure win/draw thrown away.