Closed rwbc closed 4 years ago
that's probably a different fork, since c++17 is standard in the branch.
that's probably a different fork, since c++17 is standard in the branch.
My bad, I copied the previous makefile (one week old for comparison and searching for SSE4.1) and forgot to remove it! It just compiles fine after cloning the repo branch fully again. Therefore I changed the title of the issue to something more meaningful.
But something is fishy with the speed also with Abrok NNUE compiles vs. original nodchip compilations. (at least here on my special hardware, see above) The Abrok NNUE compile for generic x64 is around 7 times! slower with the same net (also pv diverges soon after depth 6 or so - all tested from cmd with go depth 20) vs. the original nodchip compile, while the newly compiled one by me now is still 2.5 times slower.
The nodchip compilation was specially for SSE4.1, thus I need to try that too now for better comparison than against generic x64, but that is removed from the default makefile, thus I need to add an optimized one now manually.
Is it really normal that the PV diverges for the same net so early even on the start position, vs. the original nodchip compilation from 07-19?
Saved the output (for readability I eliminated the currlines) (renamed nn.bin from the other folder to nn-97f742aaefcd.nnue for the test)
Stockfish+NNUE 190720 64 by T. Romstad, M. Costalba, J. Kiiski, G. Linscott, H. Noda, Y. Nasu, M. Isozaki
uci
id name Stockfish+NNUE 190720 64
id author T. Romstad, M. Costalba, J. Kiiski, G. Linscott, H. Noda, Y. Nasu, M. Isozaki
option name Debug Log File type string default
option name Contempt type spin default 24 min -100 max 100
option name Analysis Contempt type combo default Both var Off var White var Black var Both
option name Threads type spin default 1 min 1 max 512
option name Hash type spin default 16 min 1 max 33554432
option name Clear Hash type button
option name Ponder type check default false
option name MultiPV type spin default 1 min 1 max 500
option name Skill Level type spin default 20 min 0 max 20
option name Move Overhead type spin default 10 min 0 max 5000
option name Slow Mover type spin default 100 min 10 max 1000
option name nodestime type spin default 0 min 0 max 10000
option name UCI_Chess960 type check default false
option name UCI_AnalyseMode type check default false
option name UCI_LimitStrength type check default false
option name UCI_Elo type spin default 1350 min 1350 max 2850
option name UCI_ShowWDL type check default false
option name SyzygyPath type string default <empty>
option name SyzygyProbeDepth type spin default 1 min 1 max 100
option name Syzygy50MoveRule type check default true
option name SyzygyProbeLimit type spin default 7 min 0 max 7
option name EvalFile type string default ./eval/nn.bin
option name SkipLoadingEval type check default false
option name BookMoves type spin default 16 min 0 max 10000
uciok
ucinewgame
info string NNUE ./eval/nn.bin found & loaded
go depth 20
info depth 1 seldepth 1 multipv 1 score cp 31 nodes 20 nps 10000 tbhits 0 time 2 pv e2e4
info depth 2 seldepth 2 multipv 1 score cp 44 nodes 52 nps 26000 tbhits 0 time 2 pv e2e4 a7a6
info depth 3 seldepth 3 multipv 1 score cp 50 nodes 164 nps 54666 tbhits 0 time 3 pv d2d4 e7e6 e2e4
info depth 4 seldepth 4 multipv 1 score cp 58 nodes 305 nps 76250 tbhits 0 time 4 pv d2d4 e7e6 e2e4 a7a6
info depth 5 seldepth 5 multipv 1 score cp 26 nodes 1197 nps 171000 tbhits 0 time 7 pv d2d4 d7d5 e2e3 e7e6 c2c4
info depth 6 seldepth 6 multipv 1 score cp 22 nodes 2651 nps 241000 tbhits 0 time 11 pv g1f3 g8f6 e2e3 d7d5 c2c4 e7e6
info depth 7 seldepth 8 multipv 1 score cp 27 nodes 4345 nps 255588 tbhits 0 time 17 pv g1f3 g8f6 d2d4 d7d5 e2e3 e7e6 c2c4 c7c5
info depth 8 seldepth 12 multipv 1 score cp 18 nodes 7605 nps 281666 tbhits 0 time 27 pv g1f3 g8f6 c2c4 e7e6 b1c3 d7d5 c4d5 e6d5 d2d4
info depth 9 seldepth 14 multipv 1 score cp 21 nodes 13629 nps 289978 tbhits 0 time 47 pv g1f3 g8f6 d2d4 e7e6 e2e3 d7d5 f1d3 f8e7 e1g1 c7c5 d4c5
info depth 10 seldepth 12 multipv 1 score cp 33 nodes 22695 nps 310890 tbhits 0 time 73 pv e2e4 c7c5 g1f3 b8c6 f1e2 a7a6 d2d4 e7e6 e1g1 c5d4 f3d4
info depth 11 seldepth 17 multipv 1 score cp 20 nodes 48397 nps 320509 tbhits 0 time 151 pv e2e4 c7c5 c2c3 d7d5 e4d5 d8d5 g1f3 g8f6 d2d4 e7e6 f1d3 b8c6 e1g1 c5d4 c3d4 c6d4
info depth 12 seldepth 20 multipv 1 score cp 30 nodes 62427 nps 323455 tbhits 0 time 193 pv e2e4 c7c5 c2c3 g8f6 e4e5 f6d5 d2d4 d7d6 g1f3 c5d4 c3d4 d6e5 d4e5
info depth 13 seldepth 22 multipv 1 score cp 30 nodes 196644 nps 338457 tbhits 0 time 581 pv e2e4 c7c5 g1f3 e7e6 f1e2 d7d6 b1c3 g8f6 d2d4 c5d4 f3d4 c8d7 e1g1 b8c6
info depth 14 seldepth 25 multipv 1 score cp 28 nodes 267895 nps 336551 tbhits 0 time 796 pv e2e4 c7c5 g1f3 e7e6 c2c3 d7d5 e4d5 d8d5 f1e2 g8f6 d2d4 f8e7 c3c4 d5d8 e1g1 e8g8 d4c5 e7c5
info depth 15 seldepth 24 multipv 1 score cp 18 nodes 414491 nps 339468 hashfull 184 tbhits 0 time 1221 pv g1f3 d7d5 e2e3 g8f6 c2c4 e7e6 d2d4 f8e7 c4d5 e6d5 f1d3 e8g8 h2h3 b8d7 b1c3 c7c6
info depth 16 seldepth 23 multipv 1 score cp 13 nodes 506707 nps 339843 hashfull 228 tbhits 0 time 1491 pv g1f3 d7d5 d2d4 g8f6 c2c4 e7e6 b1c3 c7c5 d4c5 f8c5 c4d5 f6d5 c3d5 d8d5 d1d5 e6d5 c1d2 e8g8 e2e3 c8f5
info depth 17 seldepth 28 multipv 1 score cp 21 nodes 690282 nps 336887 hashfull 306 tbhits 0 time 2049 pv g1f3 d7d5 d2d4 g8f6 c2c4 e7e6 b1c3 c7c5 e2e3 b8c6 a2a3 f6e4 f1d3 e4c3 b2c3 f8e7 c4d5 e6d5 d4c5 e7c5 e1g1 e8g8 c3c4
info depth 18 seldepth 27 multipv 1 score cp 16 nodes 835139 nps 338936 hashfull 360 tbhits 0 time 2464 pv g1f3 g8f6 c2c4 e7e6 d2d4 d7d5 b1c3 c7c5 e2e3 b8c6 a2a3 d5c4 f1c4 f8e7 d4c5 d8d1 c3d1 e7c5 b2b4 c5e7
info depth 19 seldepth 29 multipv 1 score cp 25 lowerbound nodes 1272938 nps 334895 hashfull 537 tbhits 0 time 3801 pv e2e4
info depth 19 seldepth 29 multipv 1 score cp 20 nodes 1485989 nps 330955 hashfull 613 tbhits 0 time 4490 pv e2e4 c7c5 c2c3 d7d5 e4d5 d8d5 g1f3 e7e6 c3c4 d5d7 d2d4 c5d4 d1d4 d7d4 f3d4 c8d7 f1e2 b8c6 d4b5
info depth 20 seldepth 27 multipv 1 score cp 20 nodes 1609267 nps 328220 hashfull 649 tbhits 0 time 4903 pv e2e4 c7c5 c2c3 d7d5 e4d5 d8d5 g1f3 e7e6 d2d4 g8f6 f1e2 c5d4 d1d4 f8c5 d4d5 e6d5 e1g1 e8g8 f1d1 f8d8 h2h3 b8c6 b1
d2 c5b6
bestmove e2e4 ponder c7c5
Stockfish 050820 by the Stockfish developers (see AUTHORS file)
uci
id name Stockfish 050820
id author the Stockfish developers (see AUTHORS file)
option name Debug Log File type string default
option name Contempt type spin default 24 min -100 max 100
option name Analysis Contempt type combo default Both var Off var White var Black var Both
option name Threads type spin default 1 min 1 max 512
option name Hash type spin default 16 min 1 max 33554432
option name Clear Hash type button
option name Ponder type check default false
option name MultiPV type spin default 1 min 1 max 500
option name Skill Level type spin default 20 min 0 max 20
option name Move Overhead type spin default 10 min 0 max 5000
option name Slow Mover type spin default 100 min 10 max 1000
option name nodestime type spin default 0 min 0 max 10000
option name UCI_Chess960 type check default false
option name UCI_AnalyseMode type check default false
option name UCI_LimitStrength type check default false
option name UCI_Elo type spin default 1350 min 1350 max 2850
option name UCI_ShowWDL type check default false
option name SyzygyPath type string default <empty>
option name SyzygyProbeDepth type spin default 1 min 1 max 100
option name Syzygy50MoveRule type check default true
option name SyzygyProbeLimit type spin default 7 min 0 max 7
option name Use NNUE type check default false
option name EvalFile type string default nn-97f742aaefcd.nnue
uciok
ucinewgame
isready
readyok
setoption name Use NNUE value true
go depth 20
info string NNUE evaluation using nn-97f742aaefcd.nnue enabled.
info depth 1 seldepth 1 multipv 1 score cp 31 nodes 20 nps 5000 tbhits 0 time 4 pv e2e4
info depth 2 seldepth 2 multipv 1 score cp 44 nodes 52 nps 10400 tbhits 0 time 5 pv e2e4 a7a6
info depth 3 seldepth 3 multipv 1 score cp 50 nodes 164 nps 23428 tbhits 0 time 7 pv d2d4 e7e6 e2e4
info depth 4 seldepth 4 multipv 1 score cp 58 nodes 305 nps 30500 tbhits 0 time 10 pv d2d4 e7e6 e2e4 a7a6
info depth 5 seldepth 5 multipv 1 score cp 26 nodes 1201 nps 46192 tbhits 0 time 26 pv d2d4 d7d5 e2e3 e7e6 c2c4
info depth 6 seldepth 6 multipv 1 score cp 20 nodes 2135 nps 47444 tbhits 0 time 45 pv d2d4 g8f6 c2c4 c7c6 e2e3 d7d5 c4d5
info depth 7 seldepth 9 multipv 1 score cp 25 nodes 4189 nps 46032 tbhits 0 time 91 pv d2d4 g8f6 g1f3 e7e6 b1d2 d7d5 e2e3 a7a6
info depth 8 seldepth 11 multipv 1 score cp 35 nodes 5848 nps 46412 tbhits 0 time 126 pv e2e4 c7c5 g1f3 a7a6 d2d4 c5d4 d1d4
info depth 9 seldepth 11 multipv 1 score cp 40 nodes 9723 nps 46971 tbhits 0 time 207 pv e2e4 e7e6 d2d4 d7d5 e4d5 e6d5 g1f3 a7a6 c2c4
info depth 10 seldepth 15 multipv 1 score cp 33 nodes 26516 nps 48653 tbhits 0 time 545 pv e2e4 c7c5 c2c3 d7d5 e4d5 d8d5 d2d4 g8f6 g1f3 e7e6 f1d3 c5d4
info depth 11 seldepth 17 multipv 1 score cp 31 nodes 34514 nps 48679 tbhits 0 time 709 pv e2e4 c7c5 c2c3 d7d5 e4d5 d8d5 d2d4 g8f6 g1f3 e7e6 f1d3 c5d4 c3d4 d5d8 e1g1 b8c6 f1e1
info depth 12 seldepth 16 multipv 1 score cp 26 nodes 60059 nps 48591 hashfull 31 tbhits 0 time 1236 pv e2e4 e7e5 g1f3 b8c6 d2d4 e5d4 f3d4 g8f6 d4c6 d7c6 d1d8 e8d8 b1c3 f8c5
info depth 13 seldepth 19 multipv 1 score cp 18 nodes 93263 nps 48803 hashfull 45 tbhits 0 time 1911 pv e2e4 e7e5 g1f3 d7d6 b1c3 g8f6 d2d4 e5d4 f3d4 g7g6 g2g3 f8g7 f1g2 e8g8 e1g1 b8c6
info depth 14 seldepth 23 multipv 1 score cp 29 nodes 158806 nps 47789 hashfull 74 tbhits 0 time 3323 pv d2d4 g8f6 g1f3 e7e6 c2c4 b7b6 e2e3 d7d5 c4d5 e6d5 b1c3 c8b7 f1d3 f8e7
info depth 15 seldepth 23 multipv 1 score cp 20 upperbound nodes 221845 nps 46285 hashfull 99 tbhits 0 time 4793 pv d2d4 g8f6
info depth 15 seldepth 23 multipv 1 score cp 17 nodes 257924 nps 46339 hashfull 115 tbhits 0 time 5566 pv d2d4 g8f6 g1f3 e7e6 c2c4 b7b6 c1f4 c8b7 e2e3 c7c5 f1d3 c5d4 e3d4 d7d5 e1g1 d5c4 d3c4 f8e7 b1c3
info depth 16 seldepth 24 multipv 1 score cp 27 lowerbound nodes 305240 nps 46558 hashfull 139 tbhits 0 time 6556 pv d2d4
info depth 16 seldepth 24 multipv 1 score cp 16 nodes 341708 nps 46579 hashfull 159 tbhits 0 time 7336 pv d2d4 g8f6 c2c4 e7e6 b1c3 d7d5 c4d5 e6d5 c1f4 c7c6 g1f3 c8f5 e2e3 b8d7 f1d3 f5d3
info depth 17 seldepth 23 multipv 1 score cp 26 lowerbound nodes 408356 nps 46546 hashfull 199 tbhits 0 time 8773 pv d2d4
info depth 17 seldepth 23 multipv 1 score cp 26 nodes 433931 nps 46609 hashfull 211 tbhits 0 time 9310 pv d2d4 g8f6 c2c4 e7e6 b1c3 f8b4 c1g5 d7d6 g1f3 h7h6 g5h4 c7c5 d4c5 b4c3 b2c3 d6c5 d1d8 e8d8 e1c1 d8e7 e2e3
info depth 18 seldepth 24 multipv 1 score cp 22 nodes 607303 nps 46533 hashfull 294 tbhits 0 time 13051 pv d2d4 g8f6 c2c4 e7e6 b1c3 f8b4 e2e3 e8g8 f1d3 c7c5 d4d5 e6d5 c4d5 f6d5 d3h7 g8h7 d1d5 d7d6 g1e2 h7g8 e3e4 b8c6 e1g1 b4c3 e2c3
info depth 19 seldepth 28 multipv 1 score cp 13 upperbound nodes 937150 nps 46545 hashfull 437 tbhits 0 time 20134 pv d2d4 g8f6
info depth 19 seldepth 28 multipv 1 score cp 22 lowerbound nodes 1022757 nps 46495 hashfull 477 tbhits 0 time 21997 pv d2d4
info depth 19 seldepth 28 multipv 1 score cp 22 nodes 1057569 nps 46519 hashfull 487 tbhits 0 time 22734 pv d2d4 g8f6 c2c4 e7e6 b1c3 f8b4 c1d2 b4c3 d2c3 f6e4 d1c2 e4c3 c2c3 e8g8 e2e3 b7b6 a1d1 c8b7 d4d5 f8e8
info depth 20 seldepth 26 multipv 1 score cp 13 upperbound nodes 1270521 nps 46355 hashfull 569 tbhits 0 time 27408 pv d2d4 g8f6
info depth 20 seldepth 26 multipv 1 score cp 22 lowerbound nodes 1320191 nps 46408 hashfull 581 tbhits 0 time 28447 pv d2d4
info depth 20 seldepth 26 multipv 1 score cp 21 nodes 1328766 nps 46354 hashfull 584 tbhits 0 time 28665 pv d2d4 g8f6 c2c4 e7e6 b1c3 f8b4 c1d2 e8g8 a2a3 b4c3 d2c3 b7b6 e2e3 c8b7 g1f3 d7d6 f1d3 f6e4 d1c2 f7f5 e1g1 b8d7 f1
d1 e4c3 c2c3
bestmove d2d4 ponder g8f6
but that is removed from the default makefile
I see target x86-64-sse41
Is it really normal that the PV diverges for the same net so early even on the start position, vs. the original nodchip compilation from 07-19?
As normal as with any two Stockfish versions whose search is different.
but that is removed from the default makefile
I see target
x86-64-sse41
Must have been a consequence of my messing up the makefile, as mentioned before and reading sth about removing SSE4.1 target somewhere. Anyhow, there is some change there, as now the SSE 4.1 compile by default always assumes popcount=yes,
ifeq ($(ARCH),x86-64-sse41)
arch = x86_64
prefetch = yes
popcnt = yes
sse = yes
sse3 = yes
ssse3 = yes
sse41 = yes
After removing the popcount entry (which produces crashing binaries here of course) for that ARCH, I get a much faster compile. Still around 10-15% slower than the nodchip one from 07-19.
Is it really normal that the PV diverges for the same net so early even on the start position, vs. the original nodchip compilation from 07-19?
As normal as with any two Stockfish versions whose search is different.
That's the question, if changes to the search (if they happened - not checked) of SF dev in just two weeks would result in such a big difference from the start position with the same net in speed and pv...?
@rwbc can you post the output of g++ -march=native -Q --help=target
on your system
changes in search are normal.
If on Linux also the output of cat /proc/cpuinfo
or the precise model of your CPU.
Well thats a lot of output, I guess that is the most interesting part:
# g++ -march=native -Q --help=target
The following options are target specific:
-m128bit-long-double [enabled]
-m16 [disabled]
-m32 [disabled]
-m3dnow [disabled]
-m3dnowa [disabled]
-m64 [enabled]
-m80387 [enabled]
-m8bit-idiv [disabled]
-m96bit-long-double [disabled]
-mabi= ms
-mabm [disabled]
-maccumulate-outgoing-args [enabled]
-maddress-mode= long
-madx [disabled]
-maes [disabled]
-malign-data= compat
-malign-double [enabled]
-malign-functions= 0
-malign-jumps= 0
-malign-loops= 0
-malign-stringops [enabled]
-march= core2
-masm= att
-mavx [disabled]
-mavx2 [disabled]
-mavx256-split-unaligned-load [disabled]
-mavx256-split-unaligned-store [disabled]
-mavx5124fmaps [disabled]
-mavx5124vnniw [disabled]
-mavx512bitalg [disabled]
-mavx512bw [disabled]
-mavx512cd [disabled]
-mavx512dq [disabled]
-mavx512er [disabled]
-mavx512f [disabled]
-mavx512ifma [disabled]
-mavx512pf [disabled]
-mavx512vbmi [disabled]
-mavx512vbmi2 [disabled]
-mavx512vl [disabled]
-mavx512vnni [disabled]
-mavx512vpopcntdq [disabled]
-mbmi [disabled]
-mbmi2 [disabled]
-mbranch-cost=<0,5> 3
-mcall-ms2sysv-xlogues [disabled]
-mcet-switch [disabled]
-mcld [disabled]
-mcldemote [disabled]
-mclflushopt [disabled]
-mclwb [disabled]
-mclzero [disabled]
-mcmodel= [default]
-mconsole [disabled]
-mcpu=
-mcrc32 [disabled]
-mcrtdll=
-mcx16 [enabled]
-mdispatch-scheduler [disabled]
-mdll [disabled]
-mdump-tune-features [disabled]
-mf16c [disabled]
-mfancy-math-387 [enabled]
-mfentry [enabled]
-mfentry-name=
-mfentry-section=
-mfma [disabled]
-mfma4 [disabled]
-mforce-drap [disabled]
-mforce-indirect-call [disabled]
-mfp-ret-in-387 [enabled]
-mfpmath= sse
-mfsgsbase [disabled]
-mfunction-return= keep
-mfused-madd
-mfxsr [enabled]
-mgeneral-regs-only [disabled]
-mgfni [disabled]
-mhard-float [enabled]
-mhle [disabled]
-miamcu [disabled]
-mieee-fp [enabled]
-mincoming-stack-boundary= 0
-mindirect-branch-register [disabled]
-mindirect-branch= keep
-minline-all-stringops [disabled]
-minline-stringops-dynamically [disabled]
-minstrument-return= none
-mintel-syntax
-mlarge-data-threshold=<number> 65536
-mlong-double-128 [disabled]
-mlong-double-64 [disabled]
-mlong-double-80 [enabled]
-mlwp [disabled]
-mlzcnt [disabled]
-mmanual-endbr [disabled]
-mmemcpy-strategy=
-mmemset-strategy=
-mmitigate-rop [disabled]
-mmmx [enabled]
-mmovbe [disabled]
-mmovdir64b [disabled]
-mmovdiri [disabled]
-mmpx [disabled]
-mms-bitfields [enabled]
-mmwaitx [disabled]
-mno-align-stringops [disabled]
-mno-default [disabled]
-mno-fancy-math-387 [disabled]
-mno-push-args [disabled]
-mno-red-zone [disabled]
-mno-sse4 [disabled]
-mnop-fun-dllimport [disabled]
-mnop-mcount [disabled]
-momit-leaf-frame-pointer [disabled]
-mpc32 [disabled]
-mpc64 [disabled]
-mpc80 [disabled]
-mpclmul [disabled]
-mpcommit [disabled]
-mpconfig [disabled]
-mpe-aligned-commons [enabled]
-mpku [disabled]
-mpopcnt [disabled]
-mprefer-avx128
-mprefer-vector-width= none
-mpreferred-stack-boundary= 0
-mprefetchwt1 [disabled]
-mprfchw [disabled]
-mptwrite [disabled]
-mpush-args [enabled]
-mrdpid [disabled]
-mrdrnd [disabled]
-mrdseed [disabled]
-mrecip [disabled]
-mrecip=
-mrecord-mcount [disabled]
-mrecord-return [disabled]
-mred-zone [enabled]
-mregparm= 4
-mrtd [disabled]
-mrtm [disabled]
-msahf [enabled]
-msgx [disabled]
-msha [disabled]
-mshstk [disabled]
-mskip-rax-setup [disabled]
-msoft-float [disabled]
-msse [enabled]
-msse2 [enabled]
-msse2avx [disabled]
-msse3 [enabled]
-msse4 [disabled]
-msse4.1 [enabled]
-msse4.2 [disabled]
-msse4a [disabled]
-msse5
-msseregparm [disabled]
-mssse3 [enabled]
-mstack-arg-probe [enabled]
-mstack-protector-guard-offset=
-mstack-protector-guard-reg=
-mstack-protector-guard-symbol=
-mstack-protector-guard= global
-mstackrealign [enabled]
-mstringop-strategy= [default]
-mstv [disabled]
-mtbm [disabled]
-mthreads [disabled]
-mtls-dialect= gnu
-mtls-direct-seg-refs [disabled]
-mtune-ctrl=
-mtune= core2
-municode [disabled]
-mvaes [disabled]
-mveclibabi= [default]
-mvect8-ret-in-mem [disabled]
-mvpclmulqdq [disabled]
-mvzeroupper [enabled]
-mwaitpkg [disabled]
-mwbnoinvd [disabled]
-mwin32 [disabled]
-mwindows [disabled]
-mx32 [disabled]
-mxop [disabled]
-mxsave [disabled]
-mxsavec [disabled]
-mxsaveopt [disabled]
-mxsaves [disabled]
so the best target for your system that is supported is x86-64-ssse3
We wont be able to support all combinations of flags... there are just too many options. The names used for 'supported archs' are at this point just names, they enable various flags. Maybe this just needs to be documented in the Makefile at this point.
There actually were separate -sse41
and -sse42
targets in nodchip's, and I don't think it was too many. Maybe in the future -avx2
/bmi2
would need to be split into two, depending on multiply perf. And so on.
there still are separate sse41 and sse42 targets btw.
There's no explicit SSE4.2 code in Stockfish and it is not likely to be visibly useful (CRC? Useless. String comparison? Maybe for UCI, but it's not optimized as is. 64-bit int comparison? Might theoretically be, with a smarter compiler). I am unable to read @nodchip 's mind but I think that he added -sse42 as a shorthand for SSE4.1 + popcount. Therefore, -sse41
should not assume popcount (and there really are chips with sse4.1 and no popcount); that's what -sse42 is for.
If you think that there's too many targets, consider getting rid of SSE4.1; it is, at present, theoretically, a very tiny improvement over SSSE3. I can't measure any speedup, maybe because I have hardware with BMI2 support.
Thanks for all the comments, I will compile a version with SSE3 today and compare to my previous compile.
Edit: FYI the SSE3 compile is just 40% of the speed of the SSE4.1 with popcount disabled here.
FYI the SSE3 compile is just 40% of the speed of the SSE4.1 with popcount disabled here.
You mixed up SSE3 and SSSE3.
FYI the SSE3 compile is just 40% of the speed of the SSE4.1 with popcount disabled here.
You mixed up SSE3 and SSSE3.
oops yes, SSSE3 and SSE41(w/o popcount) are nearly the same speed. I guess for now I will attribute the speed diff to nodchips compile due to search changes.
I think removing SSE41 and SSE42 would make sense.
A week ago there were zero problems in compiling (and never for the master branch ofc). This is on old hardware (quadcore core2) w/o popcount, but SSE4.1.
But today it failed. First I had to change std from c++11 to c++17 in the makefile, because of bazillions of those warnings
syzygy/../nnue/architectures/../features/features_common.h:27:11: warning: nested namespace definitions only available with '-std=c++17' or '-std=gnu++17' [-Wpedantic]
Then it nearly compiled and created all object files until
Any idea?