v0.97.6 eval/pruning issues

tpoppins commented 5 years ago

v0.97.1 has been tested at CCRL at 40/4 and received a 2088 Elo rating. Another tester ran several hundred games with v0.97.4 for a similar performance (the results will be up after the pending update by Sunday).

A 40/40 test for the latest version, 0.97.6, had to be aborted when SpaceDog scored 0/13 vs. the 2100-rated King's Out. In eight games SpaceDog didn't make it even to the first time control. Thinking there was a problem with the UCI mode I switched to the Xboard mode but SpaceDog continued playing wildly, giving away pieces left and right. Here's a sample blitz game vs. Monik (1955 Elo):

[Event "?"]
[Site "?"]
[Date "2019.03.09"]
[Round "?"]
[White "SpaceDog 0.97.6 64-bit"]
[Black "Monik 2.2.7 64-bit"]
[Result "0-1"]
[ECO "A82"]
[GameDuration "00:01:52"]
[GameEndTime "2019-03-09T03:43:43.701 Eastern Standard Time"]
[GameStartTime "2019-03-09T03:41:50.922 Eastern Standard Time"]
[Opening "Dutch"]
[PlyCount "45"]
[Termination "adjudication"]
[TimeControl "40/120"]
[Variation "Staunton gambit"]

1. d4 {book} f5 {book} 2. e4 {book} fxe4 {book} 3. Nc3 {book} Nf6 {book}
4. f3 {book} exf3 {book} 5. Nxf3 {book} d5 {book} 6. Bd3 {book} g6 {book}
7. Bf4 {+0.12/9 3.5s} e6 {-0.02/8 3.8s} 8. O-O {-0.13/8 3.5s} Bd6 {-0.02/8 3.7s}
9. Bh6 {-0.21/9 3.5s} Rg8 {+0.04/8 3.7s} 10. Nb5 {0.00/9 3.5s}
Nc6 {+0.02/7 3.7s} 11. Ng5 {+0.94/11 3.5s} Be7 {-0.27/7 3.7s}
12. Nxh7 {+0.95/11 3.5s} Nxh7 {-0.03/8 3.7s} 13. Qg4 {+1.01/11 3.5s}
Nf6 {+1.89/8 3.7s} 14. Bxg6+ {+0.91/11 3.5s} Kd7 {+2.95/9 3.7s}
15. Qxe6+ {0.00/12 3.5s} Kxe6 {+8.81/3 0.19s} 16. Bg7 {0.00/12 3.5s}
Rxg7 {+11.72/10 3.8s} 17. Rae1+ {-7.19/13 3.5s} Kd7 {+11.72/10 3.8s}
18. Rxf6 {-7.19/12 3.5s} Nxd4 {+13.47/9 3.8s} 19. Nxd4 {-9.06/12 3.5s}
Bxf6 {+13.74/9 3.8s} 20. Bf5+ {-11.71/12 3.5s} Kd6 {+13.54/3 0.21s}
21. Nb5+ {-12.81/12 3.5s} Kc5 {+15.25/9 3.9s} 22. b4+ {-12.68/10 3.5s}
Kxb5 {+18.50/9 3.9s} 23. Bd3+ {-16.67/11 3.5s, Black wins by adjudication} 0-1

Note that on move 15 SpaceDog, already a piece down, gives up the queen for the e6-pawn and a few checks. A 2088-rated engine wouldn't play like that. Clearly, some changes since v0.97.4 cost SpaceDog hundreds Elo points.

Tirsa@CCRL

thorsilver commented 5 years ago

Yikes! Thanks for this. I'm not entirely sure what this could be; in tests here I haven't seen this behaviour. I have just fixed some small bugs which were leading to some instances of undefined behaviour, so it's possible that may have played a role.

Regardless I'll take a deeper look at the pruning, clearly something has gone massively wrong somewhere.

thorsilver commented 5 years ago

Right, I've had a look at this, and while I couldn't replicate the exact same behaviour in the sense that SpaceDog didn't actually play 15. Qxe6+, it was popping up in the search (albeit with a -4.31 eval). When I removed LMR, futility pruning and LMP, SpaceDog only ever considers Qg3 as below:

When I give SpaceDog 5 minutes to think on the same position, it switches to Qg5:

This move may also be stupid, but at least it's certainly less stupid than Qxe6+. So clearly something is wrong with my pruning methods here.

Anyway, I'm going to rescind 0.97.6, and replace 0.97.5 with this version (no LMR/LMP/FP, keeping the recent bugfixes) until I can figure this out. MacOS and Windows executables will be uploaded within the next 30 minutes or so.

Thank you so much for bringing this to my attention! As you can see I'm quite new to the chess programming hobby so I'm really pleased that you folks are putting SpaceDog through its paces.

bctboi23 commented 4 years ago

Hi! If you're still at this engine programming thing, I think I may have found a problem with your LMR that may be causing this issue.

In your commented out LMR code, I saw that one of the conditions was that FoundPV has to be FALSE. This, I believe, is a mistake, as the engine musn't reduce or prune in the PV node, and that is what seems to be happening in your LMR code. I would check to make sure, but in my LMR code (I started from a copy of VICE as well!), FoundPV must be TRUE in order to reduce.

Hope this helps! (if you are still on this project)

thorsilver / SpaceDog

v0.97.6 eval/pruning issues #15