Open syzygy1 opened 4 years ago
The number of search threads might also have an impact on which is faster...
311020_1 = Correctly display castling rights for Chess960. 311020_2 = Improve non-sparse multiplication.
*Ryzen 3900X @3.8 GHz
Thanks, so sparse AVX2 is still clearly better on AMD. Were these all tested on Ryzen 3900X?
Yes, all on my Ryzen 3900X.
I hope to get tests on another CPUs soon.
Intel i5 760 (Nehalem), 2,95 GHz
Athlon_x4_870K
Thanks again!
So on Nehalem, no_sparse is now better than sparse, which was the other way around before the improvement. On my Sandybridge PC, no_sparse is improved, but sparse is still better. So there is no clear Intel rule here.
The Athlon resutls have a pretty high variance, but seem to suggest sparse is better.
Intel Core i5-7600K
Intel 6800k
i7-7700HQ @2.80GHz
Thanks. So sparse=no is now better on Intel AVX2. For SSE2, sparse=yes is better. (I have now improved non-sparse for SSE2, but it still doesn't get close to sparse.) For SSSE3/SSE41, there is no clear winner on Intel.
On AMD, sparse=yes seems better.
It looks like this.
I am very confused by the results on Athlon 870K - today more tests were carried out and the variance has become even greater.
Was tested with network nn-cb26f10b1fd9.nnue
Maybe the cpu is overheating and then throttles down?
It looks like this.
I am very confused by the results on Athlon 870K - today more tests were carried out and the variance has become even greater.
Was tested with network nn-cb26f10b1fd9.nnue
Hello guys! Above test was made on my PC, same as below speed tests. Recently my brother made a small update on my PS, and he didn't tell me that now i have Turbo boost, so now i have to learn how to switch the Turbo boost off (lol). I've repeated speed test with "Warm up CPU", speed looks more less correct.
@AlexB123 Which CPU is that? It seems non-sparse might be a little bit better with 1 thread (except for SSE2, which is expected) but loses to sparse with multiple threads. Non-sparse probably uses a bit more power and therefore increases CPU temps more.
@syzygy1 This is Athlon 870K
Ah, I see now.
Looks like no_sparse is faster on new AMD CPUs AMD RYZEN 9 5950x
================== Hope to see BMI2 builds in speed test soon.
AMD RYZEN 9 5950x
After "Updated to "AVX512, AVX2 and SSSE3 speedups"." Ryzen 3900X
What is the difference between SSSE3.exe and SSSE3_popcnt_mingw_10.exe ?
I think the fact that no_sparse now beats sparse on Zen 3 shows that AMD has improved their AVX2 implementation in Zen 3.
What is the difference between SSSE3.exe and SSSE3_popcnt_mingw_10.exe ?
SSSE3 and SSSE3_sparse is 32-bit builds (compiled in MinGW i686-8.1.0-posix-dwarf-rt_v6-rev0)
OK, so for 64-bit SSSE3 on Zen 2, sparse=yes is still faster than sparse=no.
But it seems sparse=no is now faster than sparse=yes for AVX2 on Zen 2. I thought sparse=yes was clearly faster before the AVX2 speed up. This suggests that sparse=no is now faster on all CPUs with AVX2.
I just tested a Ryzen 4500U laptop and also found that sparse=yes was faster than sparse=no before the AVX2 speedup patch and is now slower.
Hello!
Sparse=no faster for all builds except SSE2 on Core i5 - 11400f.
AVX512_VNNI fastest
Just curious, on my i5 11400f Cish is faster with Pure mode:
Only for AVX2 builds and higher. Not for SSE builds. On Ryzen 3900X - NNUE is still faster than Pure.
Pure being fasted is pretty nice. Is it also stronger?
No, Hybrid still stronger
BMI2 10+0,1 concurrency 6
Score of Cfish_x64_120421_ELTO_BMI2 vs Cfish_x64_130421_ELTO_BMI2_Pure: 668 - 521 - 6564 [0.509] ... Cfish_x64_120421_ELTO_BMI2 playing White: 520 - 138 - 3219 [0.549] 3877 ... Cfish_x64_120421_ELTO_BMI2 playing Black: 148 - 383 - 3345 [0.470] 3876 ... White vs Black: 903 - 286 - 6564 [0.540] 7753 Elo difference: 6.6 +/- 3.0, LOS: 100.0 %, DrawRatio: 84.7 % 7758 of 20000 games finished.
AVX512_VNNI 10+0,1 concurrency 5
Score of Cfish_x64_120421_ELTO_AVX512_VNNI vs Cfish_x64_130421_ELTO_AVX512_VNNI_Pure: 527 - 507 - 6038 [0.501] ... Cfish_x64_120421_ELTOAVX512VNNI playing White: 406 - 119 - 3011 [0.541] 3536 ... Cfish_x64_120421_ELTO_AVX512___VNNI playing Black: 121 - 388 - 3027 [0.462] 3536 ... White vs Black: 794 - 240 - 6038 [0.539] 7072 Elo difference: 1.0 +/- 3.1, LOS: 73.3 %, DrawRatio: 85.4 % 7076 of 20000 games finished.
@syzygy1 did you know how much Cfish faster on an old CPUs? My friend with Phenom II x6 1100T (SSE2 build compatible) told me that Cfish is 2 times faster than Stockfish... On my i5-11400f it is "only" 50% faster
even x32 build is faster
On my AVX2 laptop, sparse multiplication now turns out to be slower than the non-sparse multiplication. I suspect that this is not the case on some other AVX2 CPUs, in particular Zen 1.
I have therefore added a compilation option. To compile with sparse multiplication:
make -j pgo sparse=yes
To compile without sparse multiplication:make -j pgo sparse=no
By default "sparse=yes" except for AVX2 targets (including BMI2, VNNI, AVX512).
If it is clear that "sparse=no" is still faster on Zen 1 or on other CPUs with AVX2, I can make it the default on those CPUs. I cannot test this myself, so if anyone is willing to try sparse=yes/no on Zen 1 or other CPUs, that would be very welcome.
It would also be interesting to know if sparse=no is faster on any non-AVX2 CPUs.