Closed mstembera closed 2 years ago
To begin, Stockfish isn't necessarily underperforming. If you run 10K or so games with SF vs Lc0 for example, SF while winning majority of game pairs, may end up losing some.
In CCC, the most likely thing that happened is that an unlucky game pair like the one mentioned occured. Or game pairs where SF couldn't get a win occured, whereas other engines were more lucky in the dice roll.
It's why both CCC & TCEC are considered small sample size (SSS) tournaments (much like many other tournaments). One cannot conclude from their results. They do a really small number of games.
I completely agree w you regarding small sample size and we will know better after more games. The problem is that we want to be sure as soon as possible because of DivP submission looming. Until we are sure it may be prudent to submit the version prior to the last two patches.
I completely agree w you regarding small sample size and we will know better after more games. The problem is that we want to be sure as soon as possible because of DivP submission looming.
I mean, I could run 5K LTC games between SF and Ethereal. I can say confidently that SF will be higher ELO.
The latest SF is currently performing quite a bit worse than expected(behind both LC0 and Komodo) at https://www.chess.com/computer-chess-championship# including losing a game pair to Ethereal. Could one of the latest patches be a regression or perhaps scale very poorly? @vondele What do you thing about scheduling a progress test? We need to select a version to submit for TCEC Premier very soon.