Closed yetisyny closed 1 year ago
Can you provide explicit details regarding this issue.
Here is how you can reproduce the bug. I have tried to simplify things but unfortunately it is a bit complicated. I had to go into some detail how to reproduce it and this does not even include a bunch of other things I have changed in my own configuration, this is a simplified version but is still many steps. I think PGNs and engine logs are not the most useful way to reproduce this issue but rather it is best to try and recreate the same settings that cause this issue to happen, so I have put together instructions to get this bug to occur consistently in a way you can observe and test them.
Steps to reproduce bug with time management in Berserk 11.1:
If you follow these steps, you should be able to replicate this bug pretty easily and see it happen for yourself on a regular basis. For instance, I was doing these steps as I was writing them and testing them to make sure these steps cause this bug to occur, and then right in the first 10 games between Berserk 11.1 and Stockfish 9, 4 of the 10 games between them were lost by Berserk 11.1 because of timing out, 3 of them were won by Berserk 11.1 because it is stronger than Stockfish 9, and 3 of the games were ties because Stockfish 9 is able to put up a good fight despite being weaker than Berserk 11.1 (typical results). After that, Berserk 11.1 won 9 games against Toga II 1.3.1 because it is stronger and lost 1 because it timed out (usually it times out more often against Toga but this time it had good luck). Then it won 6 games against Phalanx XXIV because it is stronger and lost 4 because it timed out (typical results). Finally it won 6 games against Scidlet because it is stronger and lost 4 because it timed out (usually it wins more consistently against Scidlet, this was an unusually high number of timeouts against Scidlet this time).
Here is the crosstable from that tournament, showing Berserk 11.1 winning 24 times out of 40 because it is stronger than the other engines here, losing 13 times out of 40 because of this timeouts bug, and getting 3 draws against Stockfish 9 because Stockfish 9 put up a better fight than the weaker engines (it looks like there is also a completely unrelated bug in either Toga II 1.3.1 or in Scid vs. PC 4.23 that made Scid vs. PC list Toga as having an age of 33 and being Turkish here, but we can just ignore that, it is irrelevant to the bug we are discussing here in Berserk 11.1):
Scid vs. PC ?, 2023.04.18 Age Nat Score Berserk 11 Stockfish Phalanx XX Scidlet Toga --------------------------------------------------------------------------------------------- 1: Berserk 11.1 25.5 / 40 XXXXXXXXXX 1010100=== 1100010111 1101101010 1111111110 (+24 -13 =3) 2: Stockfish 5.5 / 10 0101011=== XXXXXXXXXX .......... .......... .......... (+4 -3 =3) 3: Phalanx XXIV 4.0 / 10 0011101000 .......... XXXXXXXXXX .......... .......... (+4 -6 =0) 4: Scidlet 4.0 / 10 0010010101 .......... .......... XXXXXXXXXX .......... (+4 -6 =0) 5: Toga 33 TUR 1.0 / 10 0000000001 .......... .......... .......... XXXXXXXXXX (+1 -9 =0) --------------------------------------------------------------------------------------------- 40 games: +22 -15 =3
This bug happens more often against opponents that give Berserk 11.1 more of a challenge, but still intermittently happens against weaker opponents too, which will show up in tournament results as the weaker opponents winning, and these are opponents that are not able to ever even get a draw, which Berserk 11.1 usually wins against outright quite easily.
In previous tournaments I noticed this bug also seems to happen strangely often against Phalanx, whether it is the bundled Phalanx XXIV or if you upgrade to the slightly more powerful Phalanx XXV. So I think specifically testing it against Phalanx, particularly the final version, Phalanx XXV, might be the best way to test this bug, if you are thinking of testing against any one specific engine that reproduces this issue the most often. That is also useful because Phalanx XXIV or Phalanx XXV definitely has a much lower Elo rating than Berserk 11.1 and they never ever draw, and every time Phalanx wins is because Berserk 11.1 timed out. I just had directions for having it play against all the other engines to demonstrate that it intermittently happens against all of them and because I usually have all the engines play each other. Also if you just have 2 engines play each other, the maximum number of games Scid vs. PC 4.23 lets you do in a tournament is 10, and occasionally Berserk 11.1 can have a winning streak where this intermittent bug doesn't happen at all 10 games in a row, and I wanted to make sure the bug occurred for you and you could replicate it. It never has a 40-game winning streak without the bug occurring 40 games in a row, if the multithreading and Ponder settings are set like this.
This bug also seems to happen more often if you do things in other programs at the same time as the chess tournament. And it seems to happen less often if all the chess engines are set to only use 1 thread each with Ponder off and with Permanent Thinking off and with time set to be per game rather than per move and with the default setting of 60 seconds per game with a 1 second increment, and if you don't do anything in other programs when it is running. It still occasionally happens then but not as much.
Playing with 12 threads and ponder on requires 12 logical cores per engine.
Yes, I have realized since I discovered this bug that these are not the best settings since then, and that Leela Chess Zero ignores the Ponder setting, and it is better to have Ponder off for tournaments. I was just trying to explain how to recreate the bug. Obviously these are not the best settings or conditions to run chess engines in, and put added stress on them and make it more difficult for them to get a move done on time. You are certainly 100% correct that requiring 12 logical cores per engine and having both run at the same idea is a little bit odd and a bit overkill and has the chess engines running many threads competing for CPU time on each others' turns. I have switched to having Ponder set to off for all the chess engines in my setup.
But this bug also occurs intermittently under normal circumstances with default settings. These settings I provided just increase the probability of this bug occurring and make it more easy to duplicate. I do not recommend for other people to use those settings unless their goal is to duplicate this sort of bug.
So, to summarize, it seems like Berserk times out on your system (or at least gets to very low time) if you run more threads than your system can handle?
NOTE: I have edited this comment since I originally posted it, based on further testing, to be as accurate as possible about my findings about what settings cause this issue to occur, after doing more testing.
I have done some further testing under various different settings and it turns out the number of threads is unimportant and can be left at the default of 1 and the bug still happens as long as Ponder is on. The thing which causes this bug to occur is having Ponder set to be on for Berserk 11.1, and it doesn't seem to matter what the settings for other chess engines are. They can be left at their defaults. This bug seems to occur when Ponder is on and not occur when Ponder is off. This also requires Permanent Thinking to be set on for the tournament too, since I think Permanent Thinking overrides the Ponder sdetting for each chess engine.
This even happens against much weaker opponents, and notably happens fairly often against Phalanx, which I think might be the best engine to test this against, either the Phalanx XXIV bundled with Scid vs. PC or the latest Phalanx XXV. I don't see the other engines I tested timing out like this when stress-tested. And Phalanx only uses one thread and does not support multithreading, but Berserk even regularly times out against it, even when Berserk is set to just use one thread too, as long as Berserk has Ponder turned on.
So I think this is a serious issue or else I would not have posted it. Berserk is one of the best chess engines and has very strong gameplay, and it losing games it would otherwise win seems like a major issue holding it back from being even better. Even though this issue seems to not occur when Ponder is turned off, so it does not seem to affect computer chess tournaments where Ponder is off for all the engines, I still think it is a major problem.
It seems like this might be connected to past issues like #282 and #348 that were related to Pondering, as well as maybe being related to the Berserk 11.1 release notes and the issue fixed in that release from the Berserk 11 release, which was a time management issue like this one. So this might help us figure out what was going on with regard to issues people found in the past regarding Pondering. I think for now it is probably good to tell people to turn Pondering off with Berserk until this issue is resolved, because Berserk seems to do much better with Pondering off. This issue doesn't involve crashing with Ponder on, but just timing out with Ponder on, but the causes might be similar to those earlier issues. I have not witnessed Berserk 11.1 crash at all under any circumstance, just time out, and only when Ponder is on.
It seems the threads settings change is completely unnecessary and irrelevant regarding this timing out bug. On the other hand, the Time Control settings for the computer chess tournament should be fairly strict, as in one second or just a few seconds, to make this bug happen more often. Those settings, as well as which engine Berserk 11.1 is playing against, do affect how often this bug happens, and it seems to only happen with Ponder/Permanent Thinking on. It is good to have the Permanent Thinking and Ponder settings be set to consistent values in the Scid vs. PC GUI to make sure it is either consistently on or consistently off. This probably occurs in other chess GUIs as well as long as Ponder is on and there are strict time controls.
I recommend testing it against the Phalanx chess engine, either Phalanx XXIV (bundled with Scid vs. PC) or the slightly better Phalanx XXV. This chess engine, Phalanx, seems to most consistently get this issue to show up in 1-on-1 games against Berserk 11.1 in my testing. Since Phalanx is a simple xboard engine and not UCI, it has no configuration settings at all, so that is not a factor, which helps simplify things. Phalanx XXV is available at https://sourceforge.net/projects/phalanx/files/Version%20XXV/ and is very good at getting this bug to happen often, and it also has the benefit of always losing games to Berserk 11.1 when Berserk doesn't time out, 100% of the time, so you can tell that every time Phalanx wins, it is from Berserk timing out.
I'll take a look, but I ran a few hundred ponder games in cute-chess yesterday against SF 11 and Berserk did not under perform.
Also worth noting, other lists have tested Ponder on and not seen issue - https://cegt.forumieren.com/t1884-testing-berserk-11-1nn
It would be really helpful if you could upload a PGN from one of these time losses
Sure, here is one I just did where it went over time after only 9 moves (I used really strict time controls of requiring every move to take less than 1 second here to make the error more likely).
[Event "Scid vs. PC"] [Site "?"] [Date "2023.04.19"] [Round "9.1"] [White "Berserk 11.1"] [Black "Phalanx XXV"] [Result "0-1"] [Movetime "1"] 1.e4 e5 2.♘f3 ♘c6 3.♗c4 ♘f6 4.d3 d5 5.exd5 ♘xd5 6.O-O ♗e7 7.♖e1 f6 8.h3 ♕d6 9.d4 ♘b6 White movetime 3.005 secs 0-1
Here is a second one where it lost after 22 moves:
[Event "Scid vs. PC"] [Site "?"] [Date "2023.04.19"] [Round "8.1"] [White "Phalanx XXV"] [Black "Berserk 11.1"] [Result "1-0"] [Movetime "1"] 1.d4 ♘f6 2.♘c3 d5 3.♘f3 e6 4.♗g5 ♗e7 5.♗xf6 ♗xf6 6.e4 O-O 7.e5 ♗e7 8.♗b5 c5 9.O-O c4 10.♕e2 a6 11.♗xc4 dxc4 12.♕xc4 b5 13.♕d3 ♘d7 14.a4 b4 15.♘e4 ♗b7 16.c3 ♖c8 17.a5 bxc3 18.♘xc3 ♘c5 19.♕d1 ♘d7 20.♕d3 ♕c7 21.♖a4 ♖fd8 22.♕e2 Black movetime 2.004 secs 1-0
And in this one it lost after 30 moves:
[Event "Scid vs. PC"] [Site "?"] [Date "2023.04.19"] [Round "10.1"] [White "Phalanx XXV"] [Black "Berserk 11.1"] [Result "1-0"] [Movetime "1"] 1.d4 d5 2.c4 e6 3.♘f3 c6 4.♘c3 dxc4 5.e4 b5 6.e5 ♗b7 7.♗e2 ♘e7 8.a4 ♘d5 9.axb5 ♘xc3 10.bxc3 cxb5 11.O-O ♗e7 12.♕d2 O-O 13.♕f4 ♕d5 14.♗a3 ♗xa3 15.♖xa3 a5 16.♖fa1 b4 17.cxb4 axb4 18.♖xa8 ♗xa8 19.♗xc4 ♕xc4 20.♖xa8 b3 21.♕d2 ♕c2 22.♔f1 ♘c6 23.♖xf8+ ♔xf8 24.h3 ♔g8 25.h4 h6 26.♔e1 ♕a2 27.♕d3 ♘b4 28.♕d1 ♘c2+ 29.♔f1 b2 30.♘d2 Black movetime 2.004 secs 1-0
It did still win 7/10 of them this time around so it won more than it lost, heh. But there is about a 1000-point Elo rating gap between Berserk 11.1 and Phalanx XXV so it would never normally lose. Again the things that make this happen are having Ponder/Permanent Thinking on and having strict time controls, and it seems to happen a bit more often against Phalanx on average than other opponents. I don't really see the pattern of why it happens in these specific games though, it seems pretty random.
If you play without Ponder + Permanent brain does this still occur? I still cannot re-create this issue.
Also it's worth noting those movetimes
are all exactly on a second mark, which says this is not random behavior.
No, it stops happening if I turn off Ponder and Permanent Thinking. I guess it is possible the issue might be with the chess GUI or something but it still happens if I do a clean installation of Scid vs. PC 4.23. I could try using a different Berserk 11.1 .exe file that does not use all the best CPU instructions to see if that makes a difference. Or I could try this in a different chess GUI. It is interesting that you cannot recreate this issue, that might indicate it has something to do with my CPU, so then using a different build that doesn't use all the latest instructions might make it go away.
For reference my CPU is a Intel Core i7 9750H, 6-core, a 9th generation Intel Core laptop processor, Coffee Lake Refresh, which is x64 and supports AVX2 and BMI2 (including PEXT) but doesn't support AVX-512. It differs from the original Coffee Lake processors in having hardware mitigations for the Meltdown and Spectre vulnerabilities and these processors were first released in the 4th quarter of 2018. This particular CPU model was introduced in the second quarter of 2019. The graphics card is a mobile NVIDIA GeForce RTX 2060. The computer itself is an HP laptop, the OMEN by HP 17t-cb000 CTO, a high-end gaming laptop I got in around May 2019, which has a 3840x2160 display rather than the 1920x1080 that was more common then, and it runs the latest Windows 11 Pro version 22H2 build 22621.1555 currently and has 32 gigabytes of dual-channel DDR4 memory. It comes with both a 512 gigabyte SSD drive and a 1 terabyte traditional hard disk internally, but I only use the SSD drive (C: drive) and haven't needed to move anything to the slower hard disk (D: drive) since I haven't filled it up quite yet. It can run most programs quite well and faster than most other computers and is very good at multitasking and it does a decent job running things that require CUDA instructions like Leela Chess Zero. I suppose it is almost 4 years old now, as a high-end laptop from May 2019, but its specifications are all pretty good for a 2019 computer and better than most new computers sold today, including desktops, since most of them are lower-end cheaper machines. It wasn't quite the most expensive and didn't have an i9 processor or the RTX 2080 or 64 gigabytes of RAM or a multi-terabyte disk, which some computers had then, and its CPU doesn't support AVX-512, but it was still pretty high-end for a 2019 laptop.
Anyway I think the hardware it is running on might be important, especially the CPU. I will try to see what happens if I use a different build of Berserk 11.1 or if I use it in a different chess GUI from Scid vs. PC. It does seem to only occur with Ponder on. I expected that this would be a bug that could be easily replicated on other computers, so it surprised me that this didn't happen on your computer, so I need to do more testing to see what conditions cause it to happen. Also I could test it out on some other computers I have access to, too, which are lower-end than mine and from other computer manufacturers.
I think I need to do some further testing, involving other builds of Berserk 11.1 that don't use all the CPU instructions such as the regular x64 build without AVX2 or PEXT, trying Berserk 10 to see if this is a newer issue or has been around longer, trying this in other chess GUIs, and trying this on other 64-bit Windows computers that have different hardware. I will have to do more tests of this kind and report back to you what happens, seeing as this doesn't seem to happen on your computer.
I have done some testing of this under different conditions such as on a different computer using the same program and same exact settings, as well as in a different chess GUI (Arena 3.5.1) among other things, but so far my results are inconclusive about how to replicate this issue. I need to continue doing testing to figure out what is going on. There are other chess engines with time management issues and some of them have worse issues worse than Berserk, such as SOS 5.1 that is bundled with Arena 3.5.1.
While this does seem to be an intermittent issue in Berserk, other chess engines have intermittent issues as well, and the observation that it only seems to happen with Berserk but not other chess engines seems to be specific to the combination of settings, chess GUI, and what computer I was on.
I will need to keep testing all the different variables and try to isolate what is causing this issue and see if it can be replicated in different situations. In Arena 3.5.1 I noticed that while Berserk 11.1 still occasionally has time management issues, the chess engines bundled with it also have several with similar sorts of issues, many of them worse. Testing on a different computer with lower specifications than my own, I found a lot of chess engines failing to get moves on time with only 1 second per move under the same settings I used in Scid vs. PC 4.23, and the problem not being unique to Beserk 11.1.
This does not mean that Berserk 11.1 doesn't have any issues with time management, but that the specific issues I found only seem to occur under certain circumstances. I need to do more testing to figure out what exactly those circumstances are. I did notice Berserk 11.1 sometimes having issues with time management in these other scenarios, but not significantly more than other chess engines. I do consistently get the same results in Scid vs. PC 4.23 on my own computer with Ponder on but this is frustratingly hard to duplicate in other scenarios, at least as far as Berserk having such bad issues and the issues being limited to Berserk and not affecting other chess engines too.
I am not saying that this is not an issue but that it is a difficult issue for me to replicate if I change the circumstances I am testing under. However, I am still working on trying to come up with a way to replicate it better and I am testing different things. Along with testing on different computers and different chess GUIs, I am also trying to answer, whether turning Ponder off completely eliminates the issue or just makes it less likely, whether changing the number of threads actually affects this at all, whether this happens more or less often against certain opponents, whether it matters which binary of Berserk I use (AVX2+PEXT, AVX2, or regular x64, not testing AVX-512 because none of the computers I have access to support it), whether it happens on Berserk 10 or older versions, and how exactly the Ponder and Permanent Thinking settings relate in Scid vs. PC 4.23.
It is possible that this could turn out to be an issue more related to the chess GUI Scid vs. PC than to the chess engine Berserk, if it does not occur in other GUIs in the same way. I have observed time management issues occasionally happening in the first other chess GUI I am testing, Arena. I also plan to test it in Banksia GUI, Cute Chess, and Lucas Chess. My testing in Arena so far is of Berserk against the 8 chess engines bundled with Arena 3.5.1, because if this is a general sort of issue affecting Berserk 11.1, it should happen at least sometimes against those chess engines as opponents, too. So far my results are inconclusive but I am still working on replicating this issue in different circumstances. I have not ruled out the issue happening in other circumstances, but I have not duplicated it to the same degree as in the initial circumstances yet, either. I will keep working on it for now until I can figure out what is going on more conclusively, because so far my results of testing in other circumstances are inconclusive and I have not learned much.
I will try to figure out if I can find a better way to duplicate this issue or find out what exactly is causing it. These tournaments are unfortunately a bit slow and setting things up like installing a new chess GUI or adding engines to a GUI and figuring out how to set up the right settings and doing this on multiple computers takes awhile. Following the exact instructions I listed in an earlier comment seems to lead to different results on different computers. I will try to do more testing to narrow things down and see if I can get a more conclusive answer of what is going on here but I need to test a bunch of different things. There is not really much you need to do in this situation, for now. I might figure out a better way to trigger this issue that can be duplicated more easily or I might not, I am not sure yet. And it might turn out to be more of an issue with Scid vs. PC than with Berserk, as another possibility. This issue is definitely not as easy to duplicate as I originally thought it would be.
Right now I think I should probably close this issue as one that cannot easily be duplicated. If my testing comes up with a way to duplicate time management issues like this that cause Berserk to lose games, that works on multiple computers, and it actually works consistently at duplicating the issue not just on my computer, I will try to come up with a better issue, but I think this has not worked out the way I thought it would, as I expected it to be easy to duplicate. I will continue testing it and if I come up with something that actually works for duplicating it on more than one computer, I may reopen that as a new issue, but I think this issue here is not very good and can't easily be duplicated, so I will be closing it as one that is not really possible to consistently duplicate, at least not with the instructions I laid out earlier, which don't seem to consistently work on other computers.
It does not make sense for this to stay listed as an open issue when nobody else besides me can duplicate it on their computers, after all. I do still think there is a definite issue here, of course, but it is too intermittent and hard to duplicate for it to be feasible for you to debug easily when it does not occur on your computer in any sort of consistent way. I will try to be more careful if I open another issue to make sure I can duplicate it on other computers before opening it here. If I do find any sort of conclusive findings that allow me to replicate the same issue on multiple computers, I think it would be best to start over fresh with a new issue once I have everything figured out. That might or might not happen, and currently it is looking like I definitely don't have this issue figured out yet, as far as how to consistently replicate it on more than one computer. So it should probably be closed now as one that can't be reproduced/duplicated. Thanks.
Berserk 11.1 has an intermittent issue where sometimes it takes longer than it is supposed to to do a move. This is obvious if you have it play a bunch of games in the Scid vs. PC 4.23 GUI against the built-in chess engines Scidlet, Phalanx, or Toga II. Most of the chess engines with high Elo ratings pretty much always beat Scidlet, almost always beat Phalanx, and pretty consistently beat Toga II. You see this with Stockfish 15.1, Lc0 v0.29.0, or Fat Fritz 2 if they are installed and configured correctly and running on good enough hardware (Lc0 requires a decent NVIDIA video card and the right CUDA build).
Anyway the problem with Berserk 11.1 is, you see it losing sometimes to the weakest chess engines, even the really really bad Scidlet, on regular occasions, something that just doesn't happen at all with other high-Elo chess engines I mentioned earlier like Stockfish, Lc0, or the controversial Fat Fritz 2. And when you look at the reason why it loses to them, it's always going over the time limit, using more time than it is allowed to use by the GUI. If a chess engine goes over the time limit, for whatever reason, it automatically loses the game, which is what happens regularly to Berserk 11.1, at least in this setup. It seems to happen a bit more often against Toga II and Phalanx than Scidlet because they are advanced enough that the game keeps going for more moves, giving Berserk more chances to go over the time limit.
To really trigger this issue more often, you can set the time control to be per move and just give 1 second per move, or set the time control to be per game and start out at 1 second and add an additional second for each move. This also makes games go faster as well as triggering this bug to occur more often, so I recommend using those settings to duplicate this bug, and testing it against an engine that is significantly weaker, enough to consistently lose if there isn't a timeout, but not so weak that it loses after just a few moves. This issue comes up remarkably often in games between Berserk and Phalanx... Phalanx is better than Scidlet so it can last longer, but it is nowhere near as good as modern NNUE engines so its loss would only be a matter of time if it weren't for this timeout issue. Since Berserk vs. Phalanx games go on longer than Berserk vs. Scidlet games, you see more of them where Berserk loses and Phalanx wins on time. If you fixed this bug in Berserk 11.1, obviously Berserk would win pretty much every game against these much weaker engines. I think the quick fix for a time issue included in Berserk 11.1 wasn't quite perfect and this issue seems to be a relic of that. Berserk DOES do pretty good moves when it does do a move, still, but this timeout issue affects it no matter what chess engine it is up against, and makes it lose against good chess engines even more than the weak ones, at least in the Scid vs. PC GUI. I have it set with Ponder = On which might be relevant, the other settings are mostly defaults other than the number of threads being the number of cores my CPU has. Everything besides those 2 is default and it is the right version for my CPU (x64-avx2-pext since I have a relatively recent CPU). Oh and this is on 64-bit Windows 11.
None of the other chess engines I have have any issues like this with time management. This issue is a real shame because I have been doing some computer chess engine tournaments just for testing purposes on my computer and Berserk 11.1 does much worse in the tournaments than it is supposed to do according to its online ratings, and I would like to get this fixed so it is at peak performance as one of the best engines with a unique evaluation function that gives different results from other top engines but still does very well. I assume that if I tested Berserk 10 or earlier, it would probably not have this bug, but I have not really done that, since I figure you can figure this out for yourself. To duplicate this, of course, you need to use a chess GUI to set strict time limits of like, one second or something, and watch how it sometimes goes over the limits and loses. Thank you.