-DTRANSPOSE is slower than default on Armv8 / Raspi

syzygy1 / Cfish

C port of Stockfish

GNU General Public License v3.0

137 stars 58 forks source link

Closed gsobala closed 3 years ago

gsobala commented 3 years ago

Just some feedback: the new NEON sparse multiplication code is about 10% slower if enabled by -DTRANSPOSE on a raspberry pi 64-bit armv8 compile.

syzygy1 commented 3 years ago

Thanks. For me it seems to be about 5% slower.

syzygy1 commented 3 years ago

The sparse multiplication code should now be faster. Please try again if you have time.

-DTRANSPOSE is now the default. If you want to try without, remove the #define TRANSPOSE line: https://github.com/syzygy1/Cfish/blob/421c10e9ab814746d2af3927122096347d29e47b/src/nnue.c#L102-L103 (Or just remove one character from TRANSPOSE.)