lucametehau / CloverEngine

UCI chess engine
GNU General Public License v3.0
70 stars 9 forks source link

Clover 6.2 x64(AVX2) is weaker than Clover 6.1 x64(AVX2) #232

Closed toni-chess closed 3 months ago

toni-chess commented 3 months ago

In My test Clover 6.2 x64(AVX2) is 15 Elo weaker than Clover 6.1 x64(AVX2)!

Hash=16
Threads=1

Timecontrol:   09s+0.1s
Games:   1040

 Clover 6.2 x64(AVX2)           1034.0 (626.0 : 408.0)
----------------------------------------------------------
    22.0 (  7.5 :  14.5) Stockfish 16.1 x64 NN(AVX2)
    22.0 (  8.0 :  14.0) Alexandria 7.0.0 x64(BMI2)
    22.0 (  7.5 :  14.5) PlentyChess 2.0.0 x64(BMI2)
    22.0 (  9.0 :  13.0) Alexandria 7.0.0 x64(AVX2)
    22.0 (  9.5 :  12.5) Caissa 1.18 x64(BMI2)
    22.0 (  9.5 :  12.5) Caissa 1.18 x64(AVX2)
    22.0 (  7.5 :  14.5) Caissa 1.17 x64(BMI2)
    22.0 (  7.5 :  14.5) Obsidian 12.0 x64(BMI2)
    22.0 ( 11.0 :  11.0) RubiChess 20240112 x64 NN(AVX2)
    22.0 (  9.0 :  13.0) Obsidian 12.0 x64(AVX2)
    22.0 ( 10.0 :  12.0) Caissa 1.17 x64(AVX2)
    22.0 ( 10.0 :  12.0) Alexandria 6.1.0 x64(BMI2)
    22.0 ( 10.5 :  11.5) Caissa 1.15 x64(BMI2)
    22.0 ( 10.0 :  12.0) PlentyChess 1.0.0 x64(BMI2)
    22.0 ( 11.5 :  10.5) Caissa 1.16 x64
    22.0 ( 12.0 :  10.0) Caissa 1.15 x64
    22.0 ( 12.0 :  10.0) Titan 1.0 x64(AVX2)
    22.0 (  9.5 :  12.5) Titan 1.1 x64(AVX2)
    22.0 ( 11.0 :  11.0) Starzix 5.0 x64(AVX2)
    22.0 ( 13.0 :   9.0) Clarity 7.0.0 x64(BMI2)
    22.0 ( 11.0 :  11.0) Igel 3.5.5 x64(AVX2)
    22.0 ( 11.5 :  10.5) Velvet 7.2.0 x64(AVX2)
    22.0 ( 11.0 :  11.0) Igel 3.5.0 x64(AVX2)
    22.0 ( 13.5 :   8.5) Uralochka 3.41a x64(AVX2)
    22.0 ( 15.5 :   6.5) Motor 0.5.0 x64(AVX2)
    22.0 ( 13.5 :   8.5) Velvet 7.3.0 x64(AVX2)
    22.0 ( 14.5 :   7.5) Wasp 7.00 x64 NN(AVX)
    22.0 ( 15.0 :   7.0) Motor 0.4.0 x64(AVX2)
    22.0 ( 15.5 :   6.5) Lizard 10.4 x64(BMI2)
    22.0 ( 15.0 :   7.0) Arasan 24.2.1 x64 NN(AVX2)
    22.0 ( 15.5 :   6.5) Arasan 24.2.2 x64(AVX2)
    22.0 ( 18.0 :   4.0) Peacekeeper 3.00 x64(AVX2)
    22.0 ( 17.0 :   5.0) Arasan 24.2.1 x64 NN(BMI2)
    22.0 ( 13.5 :   8.5) Lizard 10.2 x64(BMI2)
    22.0 ( 15.5 :   6.5) Arasan 24.2.2 x64(BMI2)
    22.0 ( 15.0 :   7.0) Saturn 1.3.0 x64(AVX2)
    22.0 ( 19.5 :   2.5) Peacekeeper 2.40 x64(AVX2)
    22.0 ( 16.5 :   5.5) Arasan 24.1 x64 NN(AVX2)
    22.0 ( 18.0 :   4.0) Cheng 4.48 x64(AVX2)
    22.0 ( 15.5 :   6.5) Tucano 11.19 x64(AVX2)
    22.0 ( 16.5 :   5.5) Lizard 10.1 x64(BMI2)
    22.0 ( 19.0 :   3.0) Arasan 24.1 x64 NN(BMI2)
    22.0 ( 20.0 :   2.0) Texel 1.11 x64
    22.0 ( 19.0 :   3.0) Marvin 6.3.0 x64 NN(AVX2)
    22.0 ( 18.0 :   4.0) Molybdenum 3.1 x64(BMI2)
    22.0 ( 17.5 :   4.5) Xiphos 0.6.1 x64(BMI2)
    22.0 ( 20.5 :   1.5) Critter 1.6a x64
lucametehau commented 3 months ago

Your testing methodology is not very accurate. I assume you got -15 but some huge +- since you have only played 22 games against a pool of engines and in total only 1040 games. I would recommend heavily reducing the number of engines you play against and just increasing the games played per engine. You could follow what CEGT does, which is still inaccurate, but a bit better, since at least they play only vs engines in the tested engine's range.

Also this Clover 6.2 scales more at longer TC, but 09s+0.1s is very short.

TLDR: improve your testing methodology.