Closed tpoppins closed 6 years ago
Thanks for the nice report! I'm checking those issues.
Despite encountering same issues on another of my boxes (a dual Xeon X5670, 12 cores, HT off, max concurrent games = 8) I wasn't able to catch a complete log for any of the problem games. I realize that just a PGN may not be enough to track down the bug(s), so if you'd like me to run a debug version that produces a log of its own I can try to reproduce the issues.
I've made a release (v1.9.3) with some changes addressing all reported issues. The "stalled connection" problem should be fixed.
A test gauntlet with v1.9.3 produced four stalls out of 29 completed games. Here is one short example:
[Event "Pirarucu 1.9.3 64-bit 40/40 Gauntlet"]
[Site "Dual X5670"]
[Date "2018.05.23"]
[Round "1"]
[White "Pirarucu 1.9.3 64-bit"]
[Black "Aice 0.99.2"]
[Result "0-1"]
[ECO "A10"]
[GameDuration "00:23:51"]
[GameEndTime "2018-05-23T14:35:21.546 Eastern Daylight Time"]
[GameStartTime "2018-05-23T14:11:30.536 Eastern Daylight Time"]
[Opening "English Opening"]
[PlyCount "28"]
[Termination "stalled connection"]
[TimeControl "40/1260"]
1. c4 {book} g6 {book} 2. Nc3 {book} Bg7 {book} 3. g3 {book} d6 {book}
4. Bg2 {book} Nc6 {book} 5. e4 {book} e5 {book} 6. Nge2 {book} h5 {book}
7. h3 {book} h4 {book} 8. g4 {book} f5 {book} 9. gxf5 {+0.18/24 32s}
gxf5 {-0.09/11 15s} 10. exf5 {+0.28/22 39s} Bxf5 {-0.10/11 28s}
11. d3 {+0.28/23 30s} Qd7 {+0.53/11 35s} 12. Nd5 {+0.28/18 24s}
O-O-O {+0.69/11 31s} 13. Nec3 {+0.27/21 46s} Nf6 {+0.98/10 18s}
14. Ne4 {+0.14/21 41s} Nxd5 {+1.81/12 21s, White's connection stalls} 0-1
and longer one:
[Event "Pirarucu 1.9.3 64-bit 40/40 Gauntlet"]
[Site "Dual X5670"]
[Date "2018.05.23"]
[Round "1"]
[White "Popochin 4.1 64-bit"]
[Black "Pirarucu 1.9.3 64-bit"]
[Result "1-0"]
[ECO "D85"]
[GameDuration "00:56:44"]
[GameEndTime "2018-05-23T17:05:11.658 Eastern Daylight Time"]
[GameStartTime "2018-05-23T16:08:27.043 Eastern Daylight Time"]
[Opening "Gruenfeld"]
[PlyCount "81"]
[Termination "stalled connection"]
[TimeControl "40/1260"]
[Variation "Modern Exchange Variation"]
1. d4 {book} Nf6 {book} 2. c4 {book} g6 {book} 3. Nc3 {book} d5 {book}
4. cxd5 {book} Nxd5 {book} 5. e4 {book} Nxc3 {book} 6. bxc3 {book} Bg7 {book}
7. Nf3 {book} c5 {book} 8. h3 {book} O-O {book} 9. Be2 {book} Nc6 {book}
10. Be3 {book} f5 {book} 11. Qb3+ {+0.55/18 42s} e6 {-0.52/24 26s}
12. exf5 {+0.51/20 41s} gxf5 {-0.52/25 28s} 13. dxc5 {+0.51/21 39s}
Qa5 {-0.52/28 39s} 14. O-O {+0.70/24 76s} Qxc3 {-0.43/27 273s}
15. Qxc3 {+0.57/24 35s} Bxc3 {-0.49/29 26s} 16. Rad1 {+0.57/22 34s}
f4 {-0.50/27 23s} 17. Bd2 {+0.62/21 33s} Nd4 {-0.59/25 20s}
18. Rfe1 {+0.73/22 32s} Nxe2+ {-0.81/23 19s} 19. Rxe2 {+0.81/22 31s}
Rd8 {-1.25/24 22s} 20. Re4 {+1.09/22 30s} e5 {-1.36/25 23s}
21. Ree1 {+0.85/23 29s} Bxd2 {-1.41/28 14s} 22. Rxd2 {+1.03/22 28s}
Rxd2 {-1.48/30 22s} 23. Nxd2 {+1.04/23 27s} Bd7 {-1.44/27 9.5s}
24. Nc4 {+1.06/25 26s} Re8 {-1.46/28 17s} 25. Rxe5 {+1.04/27 25s}
Kg7 {-1.44/28 14s} 26. Rxe8 {+1.03/29 24s} Bxe8 {-1.12/42 19s}
27. Na5 {+1.01/31 24s} b6 {-1.10/45 15s} 28. cxb6 {+1.01/32 16s}
axb6 {-1.10/45 20s} 29. Nc4 {+1.01/29 22s} b5 {-1.09/42 8.2s}
30. Nd6 {+0.99/24 22s} Bc6 {-1.09/43 16s} 31. Nf5+ {+1.00/29 21s}
Kf6 {-1.10/46 17s} 32. Nd4 {+1.02/28 20s} Bd5 {-1.10/53 14s}
33. a3 {+1.07/29 19s} Bc4 {-1.09/42 16s} 34. g3 {+1.09/27 19s}
Ke5 {-1.09/41 31s} 35. Nf3+ {+1.09/27 18s} Kf5 {-1.08/38 35s}
36. g4+ {+1.08/27 18s} Kf6 {-1.09/39 15s} 37. Kh2 {+1.08/27 17s}
Bf1 {-1.09/45 13s} 38. h4 {+0.93/28 16s} Bd3 {-1.03/39 12s}
39. Ng5 {+1.19/25 16s} Bc2 {-1.10/41 22s} 40. Kg2 {+1.21/27 15s}
h6 {-1.11/43 9.3s} 41. Nh3 {+1.44/32 46s, Black's connection stalls} 1-0
I wasn't able to save the game log for any of the stalled games; however, I managed to reproduce a loss on time (that with Margin=2000 ms):
[Event "?"]
[Site "?"]
[Date "2018.05.23"]
[Round "?"]
[White "Pirarucu 1.9.3 64-bit"]
[Black "Absolute Zero 2.4.7.2 64-bit"]
[Result "0-1"]
[ECO "C17"]
[GameDuration "00:24:56"]
[GameEndTime "2018-05-23T19:41:01.035 Eastern Daylight Time"]
[GameStartTime "2018-05-23T19:16:04.749 Eastern Daylight Time"]
[Opening "French"]
[PlyCount "38"]
[Termination "time forfeit"]
[TimeControl "40/1260"]
[Variation "Winawer, Advance, Russian Variation"]
1. e4 {book} e6 {book} 2. d4 {book} d5 {book} 3. Nc3 {book} Bb4 {book}
4. e5 {book} c5 {book} 5. Qg4 {book} Ne7 {book} 6. Nf3 {book} Nbc6 {book}
7. Bb5 {book} cxd4 {book} 8. Nxd4 {book} O-O {book} 9. Nxc6 {+0.24/27 38s}
Bxc3+ {+0.24/19 23s} 10. bxc3 {+0.23/26 22s} bxc6 {+0.28/20 23s}
11. Bd3 {+0.23/29 29s} Qc7 {+0.28/18 18s} 12. Qh5 {+0.23/26 16s}
Ng6 {+0.39/17 23s} 13. f4 {+0.22/29 32s} c5 {+0.30/17 23s}
14. O-O {+0.28/26 27s} c4 {+0.59/18 23s} 15. Bxg6 {+0.27/24 18s}
Qb6+ {+0.54/20 23s} 16. Kh1 {+0.23/29 51s} hxg6 {+0.58/20 18s}
17. Qd1 {+0.24/32 64s} Qa5 {+0.63/22 23s} 18. Rf3 {+0.23/30 94s}
Bb7 {+0.68/20 19s} 19. Qe1 {+0.28/23 15s}
Qa4 {+0.61/19 18s, White loses on time} 0-1
and here's the full game log: Pirarucu193-forfeit.txt
Interesting lines at the end:
>Absolute Zero 2.4.7.2 64-bit(1): go wtime 853844 btime 1043937 movestogo 22
>Pirarucu 1.9.3 64-bit(0): go wtime 853844 btime 1026222 movestogo 21
<Pirarucu 1.9.3 64-bit(0): info depth 1 time 0 score cp 28 nps 0 nodes 10 hashfull 999 pv f3f2 f8b8 h1g1 a4e8 c1e3 b7c6 e3d4 a7a6 a2a3 b8b2
<snip>
<Pirarucu 1.9.3 64-bit(0): info depth 37 time 12520 score cp 28 nps 1204310 nodes 15077965 hashfull 999 pv f3f2 f8b8 h1g1 a4e8 c1e3 b7c6 e3d4 a7a6 a2a3 b8b2
>Pirarucu 1.9.3 64-bit(0): stop
<Pirarucu 1.9.3 64-bit(0): info depth 38 time 856013 score cp 28 nps 1136036 nodes 972462225 hashfull 1000 pv f3f2 f8b8 h1g1 a4e8 c1e3 b7c6 g2g4
<Pirarucu 1.9.3 64-bit(0): bestmove f3f2
Thanks for the report. Are "stalled connection" happening when running high concurrency? From the logs seems like the engine is working fine but going into a really long search on depth 38, when the GUI sends a stop (after the game ends) it stops the current search and report the best move found.
The concurrency for the above gauntlet was 10 games (the Dual X5670 has 12 physical cores). The forfeit game was produced by running eight separate instances of Cute Chess GUI, one game each, for a total of about 20 games.
With the latest release (v1.9.4) time loss should be gone. Since I'm not able to reproduce "stalled connection" bug I've added some debug information trying to identify if the search thread is starting properly.
A short 40-games 40/40 gauntlet for v1.9.4 just finished on the X5670 (concurrency 8). Zero forfeits and stalls! Not to mention a clean 10-0 score vs. Wuttang r2 64-bit (2018 Elo). Well done!
I'm running an additional 200 games on another box with double the concurrency for a larger sample. Meanwhile you can expect v1.9.2 to appear on our 40/40 list by Monday.
Wow! Great news! Thanks for the help!
Bad news and good news.
The bad news is that out of the 100 extra games completed at concurrency=16 three ended up as "stalls". But that's not half bad considering that two of them were stalls by both engines, so probably not Pirarucu's fault. One stall out of a hundred -- that's a vast improvement over a dozen out of 128. And no forfeits and illegals any more. I can name engines hundreds Elo higher that behave much worse.
The good news is that v1.9.2 results are in and they exceed my expectations considerably:
Good job and thank you for the prompt response and the fix!
Do you have the log from the stalled connections game? It indeed had great results, i'll check the games and try to identify improvements.
Thanks for the help.
I don't, unfortunately. And now with your recent fixes it should be a very rare beast exceedingly hard to catch.
However, I got an idea: how 'bout running a short gauntlet with v1.9.2 installed via InBetween (which acts as a middleman between the engine and the GUI)? InBetween does provide a log of communications between the engine and the GUI, so the data for any stalled games may be helpful in tracking down the issue.
The only catch would be that I never used it and there might be a lot of data to sift through.
You should be able to do it with cutechess cli but i believe we can live with that low error rate for now.
Agreed!
A couple of examples from an ongoing 40/40 gauntlet for CCRL:
Illegal move
Time forfeit
Stalled connection
The "stalled connection" problem is the most common one, with ten more games out of 128 ending this way.
Setup: Dual Xeon E5-2670 @2.6 GHz, 16 cores, 32 GB RAM Hyperthreading OFF Win 7 x64 Pro Java v10.0.1 (build 10.0.1+10) amd64 Cute Chess GUI 20180328 dev build Margin = 5000 ms (under the Time Control settings) Max concurrent games = 14 (problems persist even at 12)
I'll post a complete game log for one or more of the examples above once I manage to save it (Cute Chess doesn't show the logs for the games that don't have the viewing focus, unfortunately).