jtheardw / mantissa

Chess Engine
GNU General Public License v3.0
17 stars 0 forks source link

Old hardware compilation #3

Closed rwbc closed 2 years ago

rwbc commented 2 years ago

Hi Jeremy,

It would be nice to be able to compile for old hardware too, despite the 'rating' loss.

Still quite some people in computerchess have good ol' working quadcores from around ten years ago. Most of them have at least ssse3 and sometimes even no popcnt like me ;-) Probably Amanj can suggest a solution for an alternative vectorization.

Guenther (RWBC - maintainer of the XB/UCI chronology)

jtheardw commented 2 years ago

Thanks for bringing this to my attention. I'll do some playing around with the compilation and see what I can do. I'll check back in here to let you know. I think sse3 compilation shouldn't be too difficult.

jtheardw commented 2 years ago

I've added some new build scripts and compiled some SSE3 binaries, which I've added to the 3.0.0 release. I tried out the linux one on my own machine and it seemed to still function, but could you go ahead and try out one of them and let me know if it doesn't work?

rwbc commented 2 years ago

Hi again I tried it out and it crashed here. I suspected this was still with popcnt and looking at your build script it could be verified.

After compiling it without the popcnt flag it still doesn't run here though (WIN7-64 Ultimate)? RUSTFLAGS='-C target-feature=+sse3' cargo rustc --release --target=x86_64-pc-windows-gnu

Anyhow, for now I think you have made quite some people happy, just not me ;-)

jtheardw commented 2 years ago

Hmm. I'm sorry to hear that. I'm surprised that even without popcnt it still didn't function (I had just re-read your original post and was about to make one without popcnt, but it looks like there's more to it than that). Is there a particular error/panic/trace that shows up when trying to invoke it from a command line?

I'll keep looking into this, and I'll stop by the chess forums also to see if anyone has any tips. Once I have something, I hope I can get you to check if it works for me, since I don't have a machine of my own which can test those same conditions.

I'll keep this issue open, of course, until I can find a solution.

rwbc commented 2 years ago

Sorry, no trace or special error message, just the usual app crash for a wrong hardware.

BTW 2.5.0 from your release and my own compilation from 2.1.1 (target-cpu=native) run fine here, but that won't help much as 3.0.0 has those nnue changes.

jtheardw commented 2 years ago

Out of curiosity, if you do something like:

RUSTFLAGS='-C target-cpu=native' cargo build --release

for 3.0.0, what happens (rather than specifying the target and features directly)?

rwbc commented 2 years ago

I think I had tried this already, but I repeated it now anyway. The result is the same, it compiles fine but the binary crashes after 2-3 seconds if I want to start it from cmd.

jtheardw commented 2 years ago

Thanks. I'll see what I can find out.

jtheardw commented 2 years ago

Sorry to keep pestering you, do you know the specific processor model your machine uses?

jtheardw commented 2 years ago

I believe I've figured out the issue. The x86_64 arch library for rust has code in its implementations that will sort of "promise" to the compiler that it's safe to put out avx code for certain types and functions, regardless of target flags.

To get around this, I've set up a second Network implementation that should use code that only requires SSE3 and not AVX support, and conditioned which Network struct gets compiled on the target features specified. I then compiled the binaries using target-features +sse3 -avx and inspected the disassembly. The AVX instructions I used to notice there (even in the sse3 version) disappeared, so I think this should solve it. The new code is in the master branch also if you want to look at or build it yourself (though the source code bundled with the release is still the same as before).

I've added to the release binaries once more. The sse3 binaries have been replaced by the new ones I just compiled. I then also compiled some ones that also don't require popcnt. They're included in the release as well now.

Of course, I can't test these with certainty because my computer will just gleefully chug along with avx instructions that logically shouldn't be there. Could I trouble you to try out the new binaries and let me know if that works for you?

Thanks for all of your help.

rwbc commented 2 years ago

This is less trouble for me. Thanks for nailing it down! Below is the sse3-nopop output from cmd from your new binary. (the pop one would crash after depth 1 info as expected)

uci
id name Mantissa v3.0.0
id author jtwright
option name Hash type spin default 64 min 1 max 32768
option name Threads type spin default 1 min 1 max 64
option name Move Overhead type spin default 10 min 1 max 1000
uciok
go depth 15
info depth 1 seldepth 1 score cp 46 time 3 nodes 47 nps 15666 pv e2e4
info depth 2 seldepth 2 score cp 33 time 3 nodes 195 nps 65000 pv e2e4 c7c5
info depth 3 seldepth 3 score cp 42 time 4 nodes 525 nps 131250 pv e2e4 g8f6 e4e5
info depth 4 seldepth 4 score cp 48 time 4 nodes 880 nps 220000 pv e2e4 a7a6 g1f3 g8f6
info depth 5 seldepth 5 score cp 37 time 6 nodes 1947 nps 324500 pv e2e4 c7c5 b1c3 g8f6 e4e5
info depth 6 seldepth 6 score cp 50 time 8 nodes 2723 nps 340375 pv e2e4 c7c5 b1c3 a7a6 d2d4 c5d4
info depth 7 seldepth 8 score cp 52 time 18 nodes 7870 nps 437222 pv e2e4 e7e6 d2d4 d7d5 e4d5 g8f6 d5e6
info depth 8 seldepth 10 score cp 34 time 41 nodes 19897 nps 485292 pv d2d4 g8f6 c2c4 e7e6 d4d5 e6d5 c4d5 a7a6
info depth 9 seldepth 11 score cp 32 time 59 nodes 29387 nps 498084 pv e2e4 e7e6 d2d4 d7d5 b1c3 d5e4 c3e4 f8e7 g1f3
info depth 10 seldepth 11 score cp 37 time 94 nodes 46843 nps 498329 pv e2e4 e7e6 d2d4 d7d5 b1c3 d5e4 c3e4 f8e7 g1f3 g8f6
info depth 11 seldepth 13 score cp 37 time 153 nodes 76765 nps 501732 pv e2e4 e7e6 d2d4 d7d5 b1c3 d5e4 c3e4 f8e7 g1f3 g8f6 e4f6
info depth 12 seldepth 15 score cp 41 time 281 nodes 139842 nps 497658 pv e2e4 e7e6 d2d4 d7d5 e4d5 e6d5 c2c4 c7c5 g1f3 c5d4 c4d5 d8d5
info depth 13 seldepth 16 score cp 44 time 729 nodes 357654 nps 490609 pv e2e4 c7c5 b1c3 b8c6 g1f3 g8f6 f1b5 g7g6 d2d3 d7d6 b5c6 b7c6 e1g1 a7a6
info depth 14 seldepth 18 score cp 43 time 1274 nodes 619687 nps 486410 pv e2e4 c7c5 g1f3 b8c6 d2d4 c5d4 f3d4 g8f6 d4c6 b7c6 c1e3 f6e4 f1e2 d7d5 e1g1
info depth 15 seldepth 20 score cp 37 time 3325 nodes 1628343 nps 489727 pv d2d4 g8f6 c2c4 e7e6 g1f3 f8e7 e2e3 d7d5 f1e2 a7a6 c4d5 d8d5 e1g1 e8g8 b1c3
bestmove d2d4

I had no time yet to built my own one, but I am sure it will work now. So, thanks again for supporting the dinosaurs ;-)

Guenther

rwbc commented 2 years ago

All compilations with various working flags are successful (not even tried 'native') for running Mantissa 3.0.0. The speed lies somehow between 460-500 kn/s from start position for a dozen or so runs with 'go depth 15'.

You can surely close this issue now. Thanks again for considering compilation for old hardware!

Guenther Simon@CAPPUCCINO MINGW64 ~/mantissa
$ RUSTFLAGS='-C target-cpu=core2' cargo rustc --release
   Compiling mantissa v3.0.0 (C:\msys64\home\Guenther Simon\mantissa)
    Finished release [optimized] target(s) in 22.99s

Guenther Simon@CAPPUCCINO MINGW64 ~/mantissa
$ RUSTFLAGS='-C target-feature=+sse3' cargo rustc --release
   Compiling mantissa v3.0.0 (C:\msys64\home\Guenther Simon\mantissa)
    Finished release [optimized] target(s) in 14.38s
jtheardw commented 2 years ago

Very glad to hear it's working now! Thanks for testing it for me.

I'm always glad to help. I really want to have a very strong engine, but I also want to make sure that people who want to can use/test/have fun with Mantissa so I want to support that where I can.