Open Sivous opened 4 days ago
The system has 4 memory channels and is not running any other programs.
The higher clocks in the 128 thread run suggests it may be heavily memory bound, further exacerbated by larger cache pressure. I don't know what performance to expect on a 3990x so it's hard for me to say what's going on. A test under linux would be a good start, the behaviour there should be more predictable.
I can't explain the >50% utilization reading other than windows jumping the job between SMT threads and task manager aggregating the data inexactly.
https://www.hwinfo.com/ can provide DRAM Read/Write Bandwidth information from CPU sensors on some systems. May be worth checking if it's available.
But I haven’t noticed any such problems when running Stockfish 16 or earlier versions.
Perhaps it is related to the new NumaPolicy strategy?
numa would probably be only problematic here, if the 128 thread run, only used 64 threads, which is not the case I think?
In the Stockfish 15 version, comprehensive testing was performed using the commands “bench 8196 64 20” and “bench 8196 128 20”, with each test conducted five times. The results showed that the 64-thread configuration effectively utilized the physical cores, with no abnormalities in CPU frequency under full load. Furthermore, the implementation of hyper-threading contributed to a significant increase in processing speed, a feature not present in the latest Stockfish version.
If you repeat the bench you did in stockfish 15 in the latest version, I assume 64 and 128 threads are also close to eachother, nps wise?
regarding hwinfo i was thinking about this readout from the CPU sensors while the engine is running
if it's similar and high when running both 64 and 128 threads it could be memory bound
stockfish 15 had a much smaller network afaik. i'm not sure if there's a comparable test to be done here.
Alright, it looks like the confusion has been resolved. With the expansion of the network, there’s an increasing need for memory bandwidth
stockfish 15: 64thread 128thread
Stockfish dev-20241012-9766db81:
64thread 128thread
Describe the issue
In recent observations, it has been noted that the utilization of Stockfish on Windows 11, version 24H2, does not exhibit a significant performance enhancement from the processor’s hyper-threading capabilities. There is negligible difference in calculation speeds between configurations of 64 threads and 128 threads. Furthermore, Task Manager analysis reveals that the 64-thread setup is not effectively leveraging the full potential of the processor’s physical cores.
Following the reset of my Windows 11 system to version 23H2, there has been an enhanced utilization of the processor’s physical cores by the 64-thread configuration. Nonetheless, the issue concerning reduced speed under maximum load remains unresolved. Furthermore, it has been observed that the speed with 128 threads is inferior to that of 64 threads. Task Manager data also indicates that the processor frequency during peak load with 128 threads is unexpectedly higher than when operating with 64 threads, a phenomenon observed in both the 24H2 and 23H2 versions of the system.
Expected behavior
I hope to resolve the issue of not being able to achieve higher full load speeds when using a processor with many cores in the Windows 11 system.
Steps to reproduce
Launch stockfish_24101214_x64_avx2 on both Windows 11 24H2 and Windows 11 23H2, then enter speedtest 64 16384 and speedtest 128 16384.
Anything else?
No response
Operating system
Windows
Stockfish version
stockfish_24101214_x64_avx2