Closed adentong closed 5 years ago
using this fen Q5K1/6N1/1kp3q1/8/1P6/8/8/8 b - - 62 155 50 searches of 45s using 32 threads 16Gb hash, and 5-men syzygy all yield the correct bestmove (b6b5). Furthermove all ~2000 PVs from depths 20-50 start with the correct move.
Maybe 6-men changes the picture dramatically...?
I do remember seeing in twitch chat someone mentioned hash collision. Think that person said there's no problem if we just look at that that particular fen, but if we start from a few moves prior then weird stuff starts happening?
Yes, that is what I was going to say. Jjoshua2 (I think) said it worked ok with fresh hash, but by playing through the previous 4 or 5 moves he could create the problem.
I have tested SF_19042711 on the position 5K2/1k4N1/2p5/Q7/1P4q1/8/8/8 b - - 66 157 one move later which is already lost according to 7-men. I'm using 2GB (clean) hash, one thread and 6-men syzygy as in TCEC. SF evaluates this to 0.00 at depth 80 and wants to play Qd4. The same misevaluation may have happened at move 156 in the real game. Seems that SF can't find a safe way to 6-men board and just choses a losing one. I doubt that this has to do with hash collision.
could you repeat the test with a version that's a few commit newer, in particular 5c4002aa827653a125130a0d01d0bb96dd2b8bae aka stockfish_19050219
stockfish_19050219 also evaluates to 0.00 (even earlier/ at lower depth than 19042711). Qh4 is the preferred move.
ok thanks. Indeed, also master scores this lost fen as 0.0. I agree that's the likely cause.
Hmm. I haven't looked at halfmovecounter which is 66. Syzygy says "DTZ39" for the two preferred moves. So SF is right with these two moves and the game isn't lost here. But in the game it played 157. ... Qg6 which is the real blunder. One move later at 5K2/1k4N1/2p3q1/4Q3/1P6/8/8/8 b - - 68 158 with hash NOT cleared the evaluation still shows 0.00. When I clear the hash and search again, SF immediately sees the loss. I remember that someone in chat metioned "no 50-moves-information in hash" which seems related to the problem.
right, overlooked it as well.
SF was playing on increment, so time trouble needs to be considered.
On Fri, May 10, 2019 at 1:44 PM Joost VandeVondele notifications@github.com wrote:
right, overlooked it as well.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/official-stockfish/Stockfish/issues/2138#issuecomment-491424476, or mute the thread https://github.com/notifications/unsubscribe-auth/AH6A5SBFM5DQYLO4BXEK3HLPUXNCLANCNFSM4HMDC2GA .
I don't think that time trouble is the reason. SF searched to depth > 70 and I doubt that the eval would have changed if it searched longer. Someone with a big machine may test.
My simple explanation:
Explanation sounds simple. Solving the issue is probably harder. It seems difficult to save halfmovecounter information in the tt and use it without loss of elo.
Texel hashes the 50 move counter if the position is sufficiently close to the 50 move limit. I would assume that with sufficient effort such a thing can be achieved without losing elo. But I am pretty sure it will never be an elo gain since the issue is too rare to make a difference. So a non-starter for SF.
It seems to me that SF verify TT evaluations at PV. So, it should see that there is a line leading to non draw after long time thinking. But indeed, TT values troubles the search because all preliminar nonPV searches are leading to a draw. It is not hard to store separate TT positions for rule50 > threshold. You have just to add something like : posKey = pos.key() ^ Key(excludedMove << 16) ^ Key((pos.rule50_count() > 90) << 17)
Perhaps senior developers can correct me. I launched an STC test here : http://tests.stockfishchess.org/tests/view/5cd6c7950ebc5925cf050a7b
@Matthies, can you test it on your machine to see if it sees the blunder Qg6 quickly even after a previous long search?
@MJZ1977 I did the following with both your patch and master:
setoption name Hash value 2048
setoption name SyzygyPath value h:\tb
position fen 5K2/1k4N1/2p5/Q7/1P4q1/8/8/8 b - - 66 157
go depth 80
position fen 5K2/1k4N1/2p3q1/4Q3/1P6/8/8/8 b - - 68 158
go depth 80
Bad luck that I couldn't reproduce what I saw yesterday (using Arena and some older SF). Both versions detect the loss in the second position. Master in depth 62, your patch in depth 60. I will do a second test with a slightly lower depth search on the first position...
@MJZ1977 Doesn't work for me. The loss in the 2nd position above isn't found. 1024 MB Hash, 3 Threads, syzygy-6, 2-minute search (after searching the 1st position to fill the hash):
info depth 89 seldepth 33 multipv 1 score cp 0 nodes 524958668 nps 4374619 hashfull 907 tbhits 18694459 time 120001 pv g6h6 e5d4 b7a6 d4c4 a6b7 f8g8 h6g6 c4f4 b7a6 g8f8 a6b5 g7e8 b5a4 e8d6 a4a3 d6c4 a3a4 c4b6 a4a3 bestmove g6h6 ponder e5d4
What works for me is this:
// At non-PV nodes we check for an early TT cutoff
if ( !PvNode
&& ttHit
&& tte->depth() >= depth
&& pos.rule50_count() < 80 <==
&& ttValue != VALUE_NONE // Possible in case of TT access race
&& (ttValue >= beta ? (tte->bound() & BOUND_LOWER)
: (tte->bound() & BOUND_UPPER)))
OK, unfortunately it doesn't work then. In my computer it works well, but perhaps because I don't have syzygy-6.
@joergoster Would you be able to submit a test to see if your change works as a simplification?
Can someone explains why we save TB results with depth + 6? Can this lead to a problem of bad TT result?
tte->save(posKey, value_to_tt(value, ss->ply), ttPv, b,
std::min(DEPTH_MAX - ONE_PLY, depth + 6 * ONE_PLY),
MOVE_NONE, VALUE_NONE);
@MJZ1977
Can someone explains why we save TB results with depth + 6? Can this lead to a problem of bad TT result?
To give them some protection against overwriting (TB scores are hard scores). They are not saved with DEPTH_MAX-1 since that might fill up the TT with useless entries which could never be overwritten.
@adentong Why should this be accepted as simplification? Maybe as a bugfix, but then what exactly is the bug? However, if anybody wants to give it a try or even submit a PR, feel free to do so.
@Matthies Where should the info about dtz come from? During the search only WDL tables are being probed ...
Let me state the obvious: if you want perfect play in a rather complicated 7-man endgame, you need 7-man bases!
@joergoster : can you please verify that you still have the problem if you don't use syzygy tables ? Perhaps it can give us an indication how to find the bug ?
During the search SF evaluates a position to be draw because halfmovecounter + dtz > 100
@Matthies Where should the info about dtz come from? During the search only WDL tables are being probed ...
Here dtz doesn't mean probing the dtz TB but "opponent cannot reach winning 6-piece-position inside the 100 moves counter". But your conclusion is probably right.
@adentong Why should this be accepted as simplification? Maybe as a bugfix, but then what exactly is the bug? However, if anybody wants to give it a try or even submit a PR, feel free to do so.
@Matthies Where should the info about dtz come from? During the search only WDL tables are being probed ...
Let me state the obvious: if you want perfect play in a rather complicated 7-man endgame, you need 7-man bases!
Gosh that was stupid. I did mean to say as a bug fix.
Just an idea, maybe instead of VALUE_DRAW, we can make something like VALUE_DRAW_IN_PLY so we can check the 50 move rule after accessing the TT. Is something like this feasible?
Well, there is a room in TT to store the 50-move counter along with draw scores. I submitted a test here: http://tests.stockfishchess.org/tests/view/5cd894650ebc5925cf054112 Since it won a 100-game match at home, I am trying it in the standard test (just dreaming).
Just an idea, maybe instead of VALUE_DRAW, we can make something like VALUE_DRAW_IN_PLY so we can check the 50 move rule after accessing the TT. Is something like this feasible?
I tried the value trick I mentioned: http://tests.stockfishchess.org/tests/view/5cd8f21b0ebc5925cf054bc5
I'm not too sure if it's necessary, and perhaps I'm just overthinking it, but it's another option.
It may be useful to distinguish between draws where we are up material versus down. If we are up material we should reject 3-fold repetitions or stalemeates since a small inaccuracy from our opponent may result in a win for us. Conversely if we are down material we should take a 3-fold/stalemate as soon as possible because we could play an inaccuracy and lose. In the cup game this strategy would have resulted in taking the 3-fold. I'm not sure how to best implement it.
I've tried things along these lines before (different kings of draw). All draws being the same value means sf does end up in some very tricky to hold draws, and perhaps doesn't try as hard as it could when slightly ahead.
One thing i tried at home ages ago, was to increase the size of the values coming from evaluate by x2 or x4, to create a couple of extra bits for this purpose. Either to get more accuracy out of evaluate() or to add info in the lsb. I never got it to work (probably endgame related). Or, could we even use a 32 bit value, with the current value in the top 16 bits, and the lower 16 then available for extra info to differentiate more desirable lines?
I've tried things along these lines before (different kings of draw). All draws being the same value means sf does end up in some very tricky to hold draws, and perhaps doesn't try as hard as it could when slightly ahead.
I share this impression from watching a lot SF play drawn endgames.
Were any PRs ever merged to fix this issue, or is it still present?
I think several tries have been tested, but none passed fishtest. I think I'm just going to close this now since this was literally the only time I've seen it happening in my 2 years of watching sf so it's rare enough to warrant not paying it too much attention.
This happened again on game 30 at CCC “Fat Fritz vs. Stockfish” https://www.chess.com/computer-chess-championship#event=fat-fritz-vs-stockfish&game=30
The better the hardware, the longer the TC, and the higher the chance hash 50mr issues will strike.
This isn't very common, but isn't very rare either. It may also strike outside of the PV harming search in a less obvious way that in those games with a sure win/draw thrown away.
In TCEC Cup Finals Game 9, SF blundered at around move 156 in a 7-piece endgame that was a tb draw. I don't know how or why it happened, but I'm sure there are people who do, so I'm opening an issue for discussion.