official-stockfish / Stockfish

A free and strong UCI chess engine
https://stockfishchess.org/
GNU General Public License v3.0
11.49k stars 2.26k forks source link

Why does my computer’s 128-thread test run slower than the 64-thread speed? #5635

Open Sivous opened 4 days ago

Sivous commented 4 days ago

Describe the issue

In recent observations, it has been noted that the utilization of Stockfish on Windows 11, version 24H2, does not exhibit a significant performance enhancement from the processor’s hyper-threading capabilities. There is negligible difference in calculation speeds between configurations of 64 threads and 128 threads. Furthermore, Task Manager analysis reveals that the 64-thread setup is not effectively leveraging the full potential of the processor’s physical cores.

test02

64-thread-01

test

Following the reset of my Windows 11 system to version 23H2, there has been an enhanced utilization of the processor’s physical cores by the 64-thread configuration. Nonetheless, the issue concerning reduced speed under maximum load remains unresolved. Furthermore, it has been observed that the speed with 128 threads is inferior to that of 64 threads. Task Manager data also indicates that the processor frequency during peak load with 128 threads is unexpectedly higher than when operating with 64 threads, a phenomenon observed in both the 24H2 and 23H2 versions of the system.

af893fee2d28756263fa869391b012bb

128-thread

Expected behavior

I hope to resolve the issue of not being able to achieve higher full load speeds when using a processor with many cores in the Windows 11 system.

Steps to reproduce

Launch stockfish_24101214_x64_avx2 on both Windows 11 24H2 and Windows 11 23H2, then enter speedtest 64 16384 and speedtest 128 16384.

Anything else?

No response

Operating system

Windows

Stockfish version

stockfish_24101214_x64_avx2

Sopel97 commented 4 days ago
  1. how many active memory channels?
  2. is there anything else running on the system? the screenshots from 64 thread runs look a bit abnormal, there seems to be more than 50% utilization
  3. I assume the first screenshot is from a 64 thread run? not sure why the processor assignment is so chaotic, though we know already from other tickets that it doesn't really follow the internal numbering
Sivous commented 4 days ago

The system has 4 memory channels and is not running any other programs.

Sopel97 commented 4 days ago

The higher clocks in the 128 thread run suggests it may be heavily memory bound, further exacerbated by larger cache pressure. I don't know what performance to expect on a 3990x so it's hard for me to say what's going on. A test under linux would be a good start, the behaviour there should be more predictable.

I can't explain the >50% utilization reading other than windows jumping the job between SMT threads and task manager aggregating the data inexactly.

Sopel97 commented 4 days ago

https://www.hwinfo.com/ can provide DRAM Read/Write Bandwidth information from CPU sensors on some systems. May be worth checking if it's available.

Sivous commented 4 days ago

But I haven’t noticed any such problems when running Stockfish 16 or earlier versions.

Sivous commented 4 days ago

image cachemem

Sivous commented 4 days ago

Perhaps it is related to the new NumaPolicy strategy?

Disservin commented 4 days ago

numa would probably be only problematic here, if the 128 thread run, only used 64 threads, which is not the case I think?

Sivous commented 3 days ago

In the Stockfish 15 version, comprehensive testing was performed using the commands “bench 8196 64 20” and “bench 8196 128 20”, with each test conducted five times. The results showed that the 64-thread configuration effectively utilized the physical cores, with no abnormalities in CPU frequency under full load. Furthermore, the implementation of hyper-threading contributed to a significant increase in processing speed, a feature not present in the latest Stockfish version.

64thread test:

image image image image image image

128thread test:

image image image image image image

Disservin commented 3 days ago

If you repeat the bench you did in stockfish 15 in the latest version, I assume 64 and 128 threads are also close to eachother, nps wise?

Sopel97 commented 3 days ago

regarding hwinfo i was thinking about this readout from the CPU sensors while the engine is running image

if it's similar and high when running both 64 and 128 threads it could be memory bound

stockfish 15 had a much smaller network afaik. i'm not sure if there's a comparable test to be done here.

Sivous commented 3 days ago

Alright, it looks like the confusion has been resolved. With the expansion of the network, there’s an increasing need for memory bandwidth

stockfish 15: 1c3a7a07445b79055d6a520f896289ab 64thread 87118c509bd52557093fcf697d65d191 128thread

Stockfish dev-20241012-9766db81:

494d88c7af6b855529ab57a24203a139 64thread 9cf74330f1d796f7e81836a2f0a14409 128thread