syzygy1 / Cfish

C port of Stockfish
GNU General Public License v3.0
137 stars 59 forks source link

NUMA is broken #12

Closed hero2017 closed 8 years ago

hero2017 commented 8 years ago

I think numa is broken. I should be getting double the speed (kN/s). It's only half now. It's likely because I noticed that Task Manager is reporting the engine's cpu usage as only 50% instead of 100%. If I disable HT it then runs 100%. The same thing happens when I use the latest SF dev which I think it's because it's not numa aware.

Works fine in asmFish (100% cpu and speed is doubled).

Using dual e5-2696v3 (36 cores) with HT on (72 cores).

syzygy1 commented 8 years ago

Are you using a compile that supports NUMA? What is the first line you see when you start cfish from cmd.exe? If you then type uci and hit enter, is a NUMA option listed?

hero2017 commented 8 years ago

Yes of course.

Cfish 061116 64 BMI2 NUMA by Syzygy based on Stockfish info string NUMA enabled.

Yes, numa option is listed and set to all in my gui...same as asmFish.

uci id name Cfish 061116 64 BMI2 NUMA id author T. Romstad, M. Costalba, J. Kiiski, G. Linscott

option name Debug Log File type string default option name Contempt type spin default 0 min -100 max 100 option name Threads type spin default 1 min 1 max 128 option name Hash type spin default 16 min 1 max 1048576 option name Clear Hash type button option name Ponder type check default false option name MultiPV type spin default 1 min 1 max 500 option name Skill Level type spin default 20 min 0 max 20 option name Move Overhead type spin default 30 min 0 max 5000 option name Minimum Thinking Time type spin default 20 min 0 max 5000 option name Slow Mover type spin default 89 min 10 max 1000 option name nodestime type spin default 0 min 0 max 10000 option name UCI_Chess960 type check default false option name SyzygyPath type string default option name SyzygyProbeDepth type spin default 1 min 1 max 100 option name Syzygy50MoveRule type check default true option name SyzygyProbeLimit type spin default 6 min 0 max 6 option name LargePages type check default true option name NUMA type string default all uciok

syzygy1 commented 8 years ago

Thanks. Yes, the question may have sound silly but it is good to start at the very beginning when debugging this type of thing.

Next question: what output do you get if you start cfish and then type: setoption name Threads value 36 isready

You should get: info string Binding thread 0 to node 0 ... info string Binding thread 17 to node 0 info string Binding thread 18 to node 1 ... info string Binding thread 35 to node 1

And now if you type setoption name Threads value 72 isready

You should get: info string Binding thread 36 to node 0 info string Binding thread 37 to node 1 info string Binding thread 38 to node 0 info string Binding thread 39 to node 1 etc.

Btw, if you run 36 threads then 50% CPU usage might be normal: 36 of the logical cores are running (1 per physical core) and the 36 remaining logical cores are idle. (But if your nps is only half of what you got with HT disabled in the BIOS, then something is wrong of course.)

If you go from 36 to 72 threads then something like 30% higher nps seems to be normal. You CPU usage should then show as 100%.

hero2017 commented 8 years ago

Yes I'm aware that with HT on and using 36 cores will give me 50% cpu usage but I'm setting it to 72 cores with HT on and I still get 50% cpu usage. With asmFish I get 100% cpu at 72 cores with HT on which is what I should get with cFish as well.

setoption name Threads value 36

Entering the above just goes to the next line. I get nothing, and neither do I get anything with asmFish.

Setting the cFish engine to 72 in Aquarium gives me 50% cpu usage (not good) and in task manager I can see it's using 74 threads (I always seen an extra two threads on top of the core count I set it to...this happens with other engines too so I consider that normal. Speed is about less than half of what I get with 72 cores using asmFish.

If I open up Resource Manager (from Task Manager) I can see the utilization per cpu and per numa nodes. It s hows me that it's 100% for NUMA Node 0 and zero percent for NUMA Node 1. See screenshot:

image

So it's as I said in the beginning that it looks to be an issue with NUMA. Notice the cpu usage is 50% under the column CPU for the engine.

Now look what I get with asmFish and the exact same engine configuration as cFish:

image

And for further comparison here's the latest SF8 dev with same engine config (72 cores):

image

What you don't see there is if I scroll down CPU 0-35 for Node 1 shows 100% cpu...but CPU 0-35 Node 0 (as shown right above) is 0% CPU.

syzygy1 commented 8 years ago

You forgot to type isready on the next line:

setoption name Threads value 36 isready

and then:

setoption name Threads value 72 isready

I need to know what Cfish outputs to be able to understand what is wrong.

hero2017 commented 8 years ago

Darn it, I always forget about 'isready' Well, in fairness you messed up too in the example above since that wouldn't work either as the correct command is setoption name Threads value 36...anyway, meaningless stuff aside:

One more screenshot, this time with BrainFish:

image

All works fine just like asmFish but is about 10% slower.

Ok, here's cFish results for 36 cores:

Cfish 061116 64 BMI2 NUMA by Syzygy based on Stockfish info string NUMA enabled. setoption Threads value 36 No such option: setoption name Threads value 36 isready info string Binding thread 0 to node 0 info string Binding thread 1 to node 0 info string Binding thread 2 to node 0 info string Binding thread 3 to node 0 info string Binding thread 4 to node 0 info string Binding thread 5 to node 0 info string Binding thread 6 to node 0 info string Binding thread 7 to node 0 info string Binding thread 8 to node 0 info string Binding thread 9 to node 0 info string Binding thread 10 to node 0 info string Binding thread 11 to node 0 info string Binding thread 12 to node 0 info string Binding thread 13 to node 0 info string Binding thread 14 to node 0 info string Binding thread 15 to node 0 info string Binding thread 16 to node 0 info string Binding thread 17 to node 0 info string Binding thread 18 to node 1 info string Binding thread 19 to node 1 info string Binding thread 20 to node 1 info string Binding thread 21 to node 1 info string Binding thread 22 to node 1 info string Binding thread 23 to node 1 info string Binding thread 24 to node 1 info string Binding thread 25 to node 1 info string Binding thread 26 to node 1 info string Binding thread 27 to node 1 info string Binding thread 28 to node 1 info string Binding thread 29 to node 1 info string Binding thread 30 to node 1 info string Binding thread 31 to node 1 info string Binding thread 32 to node 1 info string Binding thread 33 to node 1 info string Binding thread 34 to node 1 info string Binding thread 35 to node 1 info string Transposition table allocated using large pages. readyok

And here with 72 cores:

Cfish 061116 64 BMI2 NUMA by Syzygy based on Stockfish info string NUMA enabled. setoption name Threads value 72 isready info string Binding thread 0 to node 0 info string Binding thread 1 to node 0 info string Binding thread 2 to node 0 info string Binding thread 3 to node 0 info string Binding thread 4 to node 0 info string Binding thread 5 to node 0 info string Binding thread 6 to node 0 info string Binding thread 7 to node 0 info string Binding thread 8 to node 0 info string Binding thread 9 to node 0 info string Binding thread 10 to node 0 info string Binding thread 11 to node 0 info string Binding thread 12 to node 0 info string Binding thread 13 to node 0 info string Binding thread 14 to node 0 info string Binding thread 15 to node 0 info string Binding thread 16 to node 0 info string Binding thread 17 to node 0 info string Binding thread 18 to node 1 info string Binding thread 19 to node 1 info string Binding thread 20 to node 1 info string Binding thread 21 to node 1 info string Binding thread 22 to node 1 info string Binding thread 23 to node 1 info string Binding thread 24 to node 1 info string Binding thread 25 to node 1 info string Binding thread 26 to node 1 info string Binding thread 27 to node 1 info string Binding thread 28 to node 1 info string Binding thread 29 to node 1 info string Binding thread 30 to node 1 info string Binding thread 31 to node 1 info string Binding thread 32 to node 1 info string Binding thread 33 to node 1 info string Binding thread 34 to node 1 info string Binding thread 35 to node 1 info string Binding thread 36 to node 0 info string Binding thread 37 to node 1 info string Binding thread 38 to node 0 info string Binding thread 39 to node 1 info string Binding thread 40 to node 0 info string Binding thread 41 to node 1 info string Binding thread 42 to node 0 info string Binding thread 43 to node 1 info string Binding thread 44 to node 0 info string Binding thread 45 to node 1 info string Binding thread 46 to node 0 info string Binding thread 47 to node 1 info string Binding thread 48 to node 0 info string Binding thread 49 to node 1 info string Binding thread 50 to node 0 info string Binding thread 51 to node 1 info string Binding thread 52 to node 0 info string Binding thread 53 to node 1 info string Binding thread 54 to node 0 info string Binding thread 55 to node 1 info string Binding thread 56 to node 0 info string Binding thread 57 to node 1 info string Binding thread 58 to node 0 info string Binding thread 59 to node 1 info string Binding thread 60 to node 0 info string Binding thread 61 to node 1 info string Binding thread 62 to node 0 info string Binding thread 63 to node 1 info string Binding thread 64 to node 0 info string Binding thread 65 to node 1 info string Binding thread 66 to node 0 info string Binding thread 67 to node 1 info string Binding thread 68 to node 0 info string Binding thread 69 to node 1 info string Binding thread 70 to node 0 info string Binding thread 71 to node 1 info string Transposition table allocated using large pages. readyok

With asmFish it's quite different:

asmFishW_2016-11-05_bmi2 setoption name threads value 36 isready info string node 0 has threads 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 info string node 1 has threads 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 3 4 35 readyok

setoption name threads value 72 isready info string node 0 has threads 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 info string node 1 has threads 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 3 4 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 readyok

syzygy1 commented 8 years ago

You were just too fast reading my reply before I corrected myself ;-)

Thanks, this is very helpful. Cfish detects 2 nodes, so it is correctly using the processor group API. Somehow it does not succeed in binding search threads to node 1. Probably a silly bug. I now know where to look.

asmFish's output is a bit more concise but the result is essentially the same (except that it does succeed in binding threads to node 1).

hero2017 commented 8 years ago

Glad I was able to help. Hopefully we'll see a fix for this in the next release soon?

And when it is fixed we'll be able to sing numa numa... :-)

https://www.youtube.com/watch?v=0HR4hp_-kSI

syzygy1 commented 8 years ago

Unfortunately I don't see anything wrong (yet) with how threads are bound to nodes. Could you compile and run this version: https://github.com/syzygy1/Cfish/archive/numa_debug.zip It will not fix the problem, but it will print some extra output that might help find the cause of the problem.

If you can't compile it yourself, then I can try posting a binary tomorrow. That's it for today :)

hero2017 commented 8 years ago

No problem. Here you go:

Cfish 071116 64 BMI2 NUMA by Syzygy based on Stockfish GROUPS node = 0, node_number = 0, mask.group = 0, mask.mask = 68719476735 node = 1, node_number = 1, mask.group = 1, mask.mask = 68719476735 info string NUMA enabled. setoption name threads value 36 isready info string Binding thread 0 to group 0, node 0 info string Binding thread 1 to group 0, node 0 info string Binding thread 2 to group 0, node 0 info string Binding thread 3 to group 0, node 0 info string Binding thread 4 to group 0, node 0 info string Binding thread 5 to group 0, node 0 info string Binding thread 6 to group 0, node 0 info string Binding thread 7 to group 0, node 0 info string Binding thread 8 to group 0, node 0 info string Binding thread 9 to group 0, node 0 info string Binding thread 10 to group 0, node 0 info string Binding thread 11 to group 0, node 0 info string Binding thread 12 to group 0, node 0 info string Binding thread 13 to group 0, node 0 info string Binding thread 14 to group 0, node 0 info string Binding thread 15 to group 0, node 0 info string Binding thread 16 to group 0, node 0 info string Binding thread 17 to group 0, node 0 info string Binding thread 18 to group 1, node 1 info string Binding thread 19 to group 1, node 1 info string Binding thread 20 to group 1, node 1 info string Binding thread 21 to group 1, node 1 info string Binding thread 22 to group 1, node 1 info string Binding thread 23 to group 1, node 1 info string Binding thread 24 to group 1, node 1 info string Binding thread 25 to group 1, node 1 info string Binding thread 26 to group 1, node 1 info string Binding thread 27 to group 1, node 1 info string Binding thread 28 to group 1, node 1 info string Binding thread 29 to group 1, node 1 info string Binding thread 30 to group 1, node 1 info string Binding thread 31 to group 1, node 1 info string Binding thread 32 to group 1, node 1 info string Binding thread 33 to group 1, node 1 info string Binding thread 34 to group 1, node 1 info string Binding thread 35 to group 1, node 1 info string Transposition table allocated using large pages. readyok setoption name threads value 72 isready info string Binding thread 36 to group 0, node 0 info string Binding thread 37 to group 1, node 1 info string Binding thread 38 to group 0, node 0 info string Binding thread 39 to group 1, node 1 info string Binding thread 40 to group 0, node 0 info string Binding thread 41 to group 1, node 1 info string Binding thread 42 to group 0, node 0 info string Binding thread 43 to group 1, node 1 info string Binding thread 44 to group 0, node 0 info string Binding thread 45 to group 1, node 1 info string Binding thread 46 to group 0, node 0 info string Binding thread 47 to group 1, node 1 info string Binding thread 48 to group 0, node 0 info string Binding thread 49 to group 1, node 1 info string Binding thread 50 to group 0, node 0 info string Binding thread 51 to group 1, node 1 info string Binding thread 52 to group 0, node 0 info string Binding thread 53 to group 1, node 1 info string Binding thread 54 to group 0, node 0 info string Binding thread 55 to group 1, node 1 info string Binding thread 56 to group 0, node 0 info string Binding thread 57 to group 1, node 1 info string Binding thread 58 to group 0, node 0 info string Binding thread 59 to group 1, node 1 info string Binding thread 60 to group 0, node 0 info string Binding thread 61 to group 1, node 1 info string Binding thread 62 to group 0, node 0 info string Binding thread 63 to group 1, node 1 info string Binding thread 64 to group 0, node 0 info string Binding thread 65 to group 1, node 1 info string Binding thread 66 to group 0, node 0 info string Binding thread 67 to group 1, node 1 info string Binding thread 68 to group 0, node 0 info string Binding thread 69 to group 1, node 1 info string Binding thread 70 to group 0, node 0 info string Binding thread 71 to group 1, node 1 readyok

syzygy1 commented 8 years ago

Thanks again. The output is exactly what it should be, so I still don't understand why it is not working.

I have made one more modification. Could you please try once more: https://github.com/syzygy1/Cfish/archive/numa_debug.zip

This adds another "info string" line after each "info string Binding ..." line.

If all these lines are "info string OK", then please type "go infinite" and see if Cfish is still leaving node 1 unused.

If some of the lines are "info string error code = ...", then the windows call that should bind the search thread to a node is failing and the error code might tell us why.

syzygy1 commented 8 years ago

Hmm I may have found the problem... I will make another modification.

syzygy1 commented 8 years ago

Ok, please try again: https://github.com/syzygy1/Cfish/archive/numa_debug.zip

I don't know if this fixes the problem, but it might. If it does not, please see if there are any "info string error code = ..." lines.

hero2017 commented 8 years ago

Same issue but at least you have more info:

Cfish 081116 64 BMI2 NUMA by Syzygy based on Stockfish GROUPS node = 0, node_number = 0, mask.group = 0, mask.mask = 68719476735 node = 1, node_number = 1, mask.group = 1, mask.mask = 68719476735 info string NUMA enabled. setoption name threads value 36 isready info string Binding thread 0 to group 0, node 0 info string error code = 87 info string Binding thread 1 to group 0, node 0 info string error code = 87 info string Binding thread 2 to group 0, node 0 info string error code = 87 info string Binding thread 3 to group 0, node 0 info string error code = 87 info string Binding thread 4 to group 0, node 0 info string error code = 87 info string Binding thread 5 to group 0, node 0 info string error code = 87 info string Binding thread 6 to group 0, node 0 info string error code = 87 info string Binding thread 7 to group 0, node 0 info string error code = 87 info string Binding thread 8 to group 0, node 0 info string error code = 87 info string Binding thread 9 to group 0, node 0 info string error code = 87 info string Binding thread 10 to group 0, node 0 info string error code = 87 info string Binding thread 11 to group 0, node 0 info string error code = 87 info string Binding thread 12 to group 0, node 0 info string error code = 87 info string Binding thread 13 to group 0, node 0 info string error code = 87 info string Binding thread 14 to group 0, node 0 info string error code = 87 info string Binding thread 15 to group 0, node 0 info string error code = 87 info string Binding thread 16 to group 0, node 0 info string error code = 87 info string Binding thread 17 to group 0, node 0 info string error code = 87 info string Binding thread 18 to group 1, node 1 info string error code = 87 info string Binding thread 19 to group 1, node 1 info string error code = 87 info string Binding thread 20 to group 1, node 1 info string error code = 87 info string Binding thread 21 to group 1, node 1 info string error code = 87 info string Binding thread 22 to group 1, node 1 info string error code = 87 info string Binding thread 23 to group 1, node 1 info string error code = 87 info string Binding thread 24 to group 1, node 1 info string error code = 87 info string Binding thread 25 to group 1, node 1 info string error code = 87 info string Binding thread 26 to group 1, node 1 info string error code = 87 info string Binding thread 27 to group 1, node 1 info string error code = 87 info string Binding thread 28 to group 1, node 1 info string error code = 87 info string Binding thread 29 to group 1, node 1 info string error code = 87 info string Binding thread 30 to group 1, node 1 info string error code = 87 info string Binding thread 31 to group 1, node 1 info string error code = 87 info string Binding thread 32 to group 1, node 1 info string error code = 87 info string Binding thread 33 to group 1, node 1 info string error code = 87 info string Binding thread 34 to group 1, node 1 info string error code = 87 info string Binding thread 35 to group 1, node 1 info string error code = 87 info string Transposition table allocated using large pages. readyok

setoption name threads value 72 isready info string Binding thread 36 to group 0, node 0 info string error code = 87 info string Binding thread 37 to group 1, node 1 info string error code = 87 info string Binding thread 38 to group 0, node 0 info string error code = 87 info string Binding thread 39 to group 1, node 1 info string error code = 87 info string Binding thread 40 to group 0, node 0 info string error code = 87 info string Binding thread 41 to group 1, node 1 info string error code = 87 info string Binding thread 42 to group 0, node 0 info string error code = 87 info string Binding thread 43 to group 1, node 1 info string error code = 87 info string Binding thread 44 to group 0, node 0 info string error code = 87 info string Binding thread 45 to group 1, node 1 info string error code = 87 info string Binding thread 46 to group 0, node 0 info string error code = 87 info string Binding thread 47 to group 1, node 1 info string error code = 87 info string Binding thread 48 to group 0, node 0 info string error code = 87 info string Binding thread 49 to group 1, node 1 info string error code = 87 info string Binding thread 50 to group 0, node 0 info string error code = 87 info string Binding thread 51 to group 1, node 1 info string error code = 87 info string Binding thread 52 to group 0, node 0 info string error code = 87 info string Binding thread 53 to group 1, node 1 info string error code = 87 info string Binding thread 54 to group 0, node 0 info string error code = 87 info string Binding thread 55 to group 1, node 1 info string error code = 87 info string Binding thread 56 to group 0, node 0 info string error code = 87 info string Binding thread 57 to group 1, node 1 info string error code = 87 info string Binding thread 58 to group 0, node 0 info string error code = 87 info string Binding thread 59 to group 1, node 1 info string error code = 87 info string Binding thread 60 to group 0, node 0 info string error code = 87 info string Binding thread 61 to group 1, node 1 info string error code = 87 info string Binding thread 62 to group 0, node 0 info string error code = 87 info string Binding thread 63 to group 1, node 1 info string error code = 87 info string Binding thread 64 to group 0, node 0 info string error code = 87 info string Binding thread 65 to group 1, node 1 info string error code = 87 info string Binding thread 66 to group 0, node 0 info string error code = 87 info string Binding thread 67 to group 1, node 1 info string error code = 87 info string Binding thread 68 to group 0, node 0 info string error code = 87 info string Binding thread 69 to group 1, node 1 info string error code = 87 info string Binding thread 70 to group 0, node 0 info string error code = 87 info string Binding thread 71 to group 1, node 1 info string error code = 87 readyok

syzygy1 commented 8 years ago

OK, I think we are getting closer :) I have made another modification. Hopefully this will make the difference.

syzygy1 commented 8 years ago

Seems to be fixed by https://github.com/syzygy1/Cfish/commit/403811a0c6435ecc7ecf0c43e42dfcb56261f39f