syzygy1 / Cfish

C port of Stockfish
GNU General Public License v3.0
137 stars 59 forks source link

Additional optimization flag to consider #98

Closed LouisZulli closed 5 years ago

LouisZulli commented 5 years ago

Under gcc, -O2 turns on the flag -ftree-pre. Thus, this flag is on when Cfish is built with optimization, which is the default.

It seems that disabling this one flag increases Cfish's search speed by about 1%, as measured by the bench command. I tested this with both gcc-8 and gcc-9.

So you might consider adding -fno-tree-pre after -O3 in the Makefile.

I should add that my testing was done only on an Intel(R) Xeon(R) CPU E5-2687W v3.

JavaMast commented 5 years ago

@LouisZulli for my Ryzen 1700X it gives about +2% speed, but on another PC's Cfish is getting slower. Also Android compilations is getting slower.

CoffeeOne commented 5 years ago

Please note that a speed difference from 1% is very hard to measure. On the other hand it would be a lot by just changing a flag, if you find 10 such flags, it would add up as 10%? :D That said, I had also a speedup (Amd FX 8 core, no turbo, 20 times bench 16 1 17, gcc 9.1 Windows) speedup of 0,264%, that could easily change with another gcc version.

d3vv commented 5 years ago

@CoffeeOne About 10%... If such flags can to produce "perfect" binaries near like asmFish, then Answer - Yes! :) In any cases investigation needed. And if flag speeds up binary in serious percentage of cases then it is a good point to include it as optional at least.

syzygy1 commented 5 years ago

Adding -fno-tree-pree also gives a 1% speed up on my old i7-3930K when compiling with gcc-9.1 (make profile-build extra=yes).

The way I test is: $ ./cfish bench 2>&1 >/dev/null | grep Nodes

And if several runs give inconsistent results: $ taskset -c 3 ./cfish bench 2>&1 >/dev/null | grep Nodes Where I vary 3 to find a cpu core that doesn't get bothered by whatever is bothering the cpu.

Of course it is also a good idea to vary the bench parameters (while sticking to single core or things will be inherently random).

CoffeeOne commented 5 years ago

@syzygy1 After reading your last message, I decided to re-test, because you use exactly the same compiler version than me (gcc 9.1), and you mentioned exactly how you compiled and tested :). On the other hand I have a different cpu and a different operating system. Having an AMD cpu, so I always use default ARCH (modern), and there is also no doubt about -march=native being a gain. Also there was never a doubt that profile-build is better than normal build. But I always used lto=yes with cfish (was always a gain like more than 1% for me) and did not use extra=yes (was always +-0 for me until gcc8). You use lto=no and extra=yes, so 2 things different.

So I repeated a test with latest cfish (where -fno-tree-pre is activated with extra), but now I had to do a lot of compilations: plain: make profile-build comp=mingw -j extra: additionally adding extra=yes lto: additionally adding lto=yes lto-extra: additinally adding lto=yes extra=yes extra-tree-pre: extra=yes, but removing the -fno-tree-pre lto-extra-tree-pre: lto=yes extra=yes, but removing the -fno-tree-pre lto-no-tree-pre: lto=yes, but adding to the makefile -fno-tree-pre (since extra=no)

Remarks to the bench run: I used 100 runs for each version of bench 16 1 17 default depth, total durations 2 hours. Please note the super low std deviation (only cfish-extra.exe having a higher one, most likely of having one too low worst run). Those low std deviations are obtained on windows, by not 1) doing something on the computer 2) having the clock speed nailed down to the maximum (no turbo, but also no c-states of the cpu active).

When you look at the results, I can confirm the speed up for -fno-tree-pre for both when only extra=yes is set (your test) and only lto=yes is set (my test). Nevertheless the fastest build now for me is to set both, but without -fno-tree-pre. :D

cfish-bench

Build Tester: 1.4.6.0 Windows 10 (Version 10.0, Build 0, 64-bit Edition) AMD FX(tm)-9590 Eight-Core Processor
SafeMode: No Running In VM: No HyperThreading Enabled: Yes CPU Warmup: Yes Command Line: bench 16 1 17 default depth Tests per Build: 100 ANOVA: Passed

            Engine# (NPS)                     Speedup     Sp     Conf. 95%    S.S.

2 (1 802 113,5 ) ---> 1 (1 767 799,2 ) ---> 1,941% 14 521,1 Yes No 3 (1 781 495,2 ) ---> 1 (1 767 799,2 ) ---> 0,775% 9 682,1 Yes No 4 (1 793 895,8 ) ---> 1 (1 767 799,2 ) ---> 1,476% 8 533,5 Yes No 5 (1 802 983,7 ) ---> 1 (1 767 799,2 ) ---> 1,990% 9 328,0 Yes No 6 (1 826 133,5 ) ---> 1 (1 767 799,2 ) ---> 3,300% 10 902,2 Yes No 7 (1 818 702,2 ) ---> 1 (1 767 799,2 ) ---> 2,879% 8 573,3 Yes No

abdulbadii commented 5 years ago

How to set -fno-tree-pre after -O3, on build command line ? Got error:

$ make profile-build ARCH=x86-64-modern COMP=mingw lto=yes -O3 -fno-tree-pre
make: *** unknown output-sync type '3'.

please help me out.. thanks before

LouisZulli commented 5 years ago

How to set -fno-tree-pre after -O3, on build command line ? Got error:

$ make profile-build ARCH=x86-64-modern COMP=mingw lto=yes -O3 -fno-tree-pre
make: *** unknown output-sync type '3'.

please help me out.. thanks before

make clean

and then try

make profile-build ARCH=x86-64-modern COMP=mingw lto=yes extra=yes

d3vv commented 5 years ago

@abdulbadii for custom flags: CFLAGS="myflags" make

Denzwell commented 5 years ago

Hi all, who can help me get the gcc-9.1 binaries to compile cfish. currently i use gcc-8.1 thanks

d3vv commented 5 years ago

@Denzwell just install latest MSYS2 and with MSYS2 MINGW64 console u will have:

$ gcc -v
Using built-in specs.
COLLECT_GCC=C:\msys64\mingw64\bin\gcc.exe
COLLECT_LTO_WRAPPER=C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/9.1.0/lto-wrapper.exe
Target: x86_64-w64-mingw32
Configured with: ../gcc-9.1.0/configure --prefix=/mingw64 --with-local-prefix=/mingw64/local --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --with-native-system-header-dir=/mingw64/x86_64-w64-mingw32/include --libexecdir=/mingw64/lib --enable-bootstrap --with-arch=x86-64 --with-tune=generic --enable-languages=c,lto,c++,fortran,ada,objc,obj-c++ --enable-shared --enable-static --enable-libatomic --enable-threads=posix --enable-graphite --enable-fully-dynamic-string --enable-libstdcxx-filesystem-ts=yes --enable-libstdcxx-time=yes --disable-libstdcxx-pch --disable-libstdcxx-debug --disable-isl-version-check --enable-lto --enable-libgomp --disable-multilib --enable-checking=release --disable-rpath --disable-win32-registry --disable-nls --disable-werror --disable-symvers --enable-plugin --with-libiconv --with-system-zlib --with-gmp=/mingw64 --with-mpfr=/mingw64 --with-mpc=/mingw64 --with-isl=/mingw64 --with-pkgversion='Rev3, Built by MSYS2 project' --with-bugurl=https://sourceforge.net/projects/msys2 --with-gnu-as --with-gnu-ld
Thread model: posix
gcc version 9.1.0 (Rev3, Built by MSYS2 project)
d3vv commented 5 years ago

@Denzwell it is all about Windows - under Linux I have gcc 5.x in most cases (and it works well)))