sp00n / corecycler

Script to test single core stability, e.g. for PBO & Curve Optimizer on AMD Ryzen or overclocking/undervolting on Intel processors
Other
663 stars 30 forks source link

Doesn't work with more than 64 threads #64

Open dkit opened 2 months ago

dkit commented 2 months ago

Looks like in this case a processor group needs to be used to break the affinities into a group of 64 bits.

Cannot convert value "2.07691874341393E+34" to type "System.Int64". Error: "Arithmetic operation resulted in an overflow." At CoreCycler-v0.9.5.0alpha2\script-corecycler.ps1:4908 char:21

sp00n commented 2 months ago

Fittingly this issue has the number 64.

Not sure I'm going to fix this, leaving this here for future reference: https://learn.microsoft.com/en-us/windows/win32/procthread/processor-groups https://stackoverflow.com/questions/76317127/windows-11-thread-affinities-spanning-multiple-processor-groups-explicitly https://learn.microsoft.com/en-us/windows/win32/api/processtopologyapi/nf-processtopologyapi-setthreadgroupaffinity

dkit commented 1 month ago

Didn't even notice the number match!

Thank you for the references. And having done a PoC in C++, I understand the reluctance to tackle this issue. The CPU groups thing is such a hack that I have yet to find a tool that shows the information correctly. I haven't been able to see if my code does what it's supposed to (despite the functions returning true for success). I'm putting it down for now as well.

dkit commented 1 month ago

I was unable to get SetThreadSelectedCpuSetMasks working reliably (probably because some other function set affinities as mentioned in the stackoverflow answer). I ended up looking at https://github.com/winsiderss/systeminformer because it was able to correctly set the affinity and it appears to use SetThreadGroupAffinity, which does appear to work reliably when I used it. I wrote a helper c# program that takes a processid and a list of logical cpu ids to pin to and so far it looks like its working.

djangoa commented 1 month ago

@dkit Please can you share your helper program source? I'm trying to get my 56 core CPU working with corecycler and have hit the same problem.

@sp00n please reconsider a fix for this issue, it's only going to effect more users in the future.

djangoa commented 1 month ago

FYI: I've hacked together some powershell script that implements SetThreadGroupAffinity, see attached zip: affinity.ps1.zip.

I imagine integrating this into corecycler.ps1 should be fairly trivial. Using SetProcessDefaultCpuSetMasks & SetThreadSelectedCpuSetMasks might be a good alternative for Windows11+ and stress tests that do not manage their own affinity.

The downside with SetThreadGroupAffinity approach is that there is no way to alter the process only child threads, so if that process spawns new threads then the affinity mask will need to be reapplied and is not inherited, I don't think this is an issue for its use in corecycler. Additionally, a process's threads can only belong to a single processor group at any one time. I think this limitation is fine for corecycler too as we only need to apply affinity masks with 1 or 2 logical processors.

@sp00n can you take a look and see what you think? We're getting closer to where 64+ thread machines are going to become more and more common.

sp00n commented 1 month ago

@djangoa Thanks for this, I looked into it yesterday, but couldn't find a way to do this purely with PowerShell, so would've needed to look up how to invoke the relevant system calls. It seems you did this work for me already. 👍 I had also opened a question on StackOverflow for this.

Can you check if the processor groups are actually filled up to 64 before they spill over to another group, or if they are evenly split between the cores? I.e. if for your 56 cores CPU with 112 threads it is 64 + 48 or 56 + 56.

According to this blog post, his groups are 48+48, so evenly split up. He also mentions SysInternals CoreInfo to check the core grouping/assignment.

Unfortunately there's no way I can test all this myself, I'll have to dry-code all of it.

djangoa commented 1 month ago

Can you check if the processor groups are actually filled up to 64 before they spill over to another group, or if they are evenly split between the cores?

Sure, cores fill up the first Group 0 to 64 then spill over into the second Group 1 which in my case has the other 48 Cores in it. Logical processors (i.e. threads) are consecutive. E.g. Core 0 has Logical Processors 0 and 1 in Group 0. Core 32 has Logical Processors 0 & 1 in Group 1:

image

I think that when a CPU makes use of groups, the OS changes how logical cores are presented too so that the remain in the same group. I used a combination of System Informer (the new Process Hacker made by the SysInternals guys) and HWInfo64 to deduce what cores are in what group.

Understand this is probably going to be difficult to test if you don't have a multi group machine, will help in anyway needed. I'd be happy to fund some time on a Cloud provider if that's helpful for testing too.

djangoa commented 1 month ago

Here's the output from coreinfo for my machine: coreinfo.txt

I think in the blog you reference the reason his groups are split evenly is due to the machine having multiple sockets.

sp00n commented 1 month ago

Yeah, apparently it was an 8 socket system (the E7540 in that blog had 6 cores / 12 threads, and 12*8 = 96 = 48+48 makes sense).

So it seems there can be multiple setups, one-socket systems just fill up the first group and then proceed to the next one, while multi-socket systems want to evenly distribute them. But multi-socket systems probably won't be using CoreCycler, so I guess I could stick to filling up the group to 64.

@dkit Could you also post a CoreInfo output? n=2 would be twice as precise as n=1! 😁 Unless you're running a 64 core Threadripper, which makes it ambiguous (but still informative).

sp00n commented 1 month ago

I'm currently testing your code @djangoa. One thing I noticed is that the main process affinity does not change if you change its threads affinity values. It also shows up unchanged in Process Explorer (and I assume in the Task Manager as well). So it doesn't seem to propagate upwards to the main process.

This seems to be mostly just a visual problem though, as both y-Cruncher and Prime95 correctly switched the processor load to the provided cores when using SetThreadGroupAffinity on my Windows 10 machine.

I also noticed that you commented out the call to SetProcessAffinityMask due to apparent bugs. This call apparently doesn't take the Processor Group as an argument, and since all affinity bitmasks are the same across the various groups, I assume it would effectively be the same as the vanilla .ProcessorAffinity from PowerShell itself?

It that's the case, maybe I could use this to then set the affinity of the main process as well, after having assigned the thread group affinity. And if it's only to avoid having an open ticket because someone was confused because the CPU affinity wasn't "correctly" set. 😁

But of course I can't test this myself over multiple processor groups.

Anyway, I'll try to add the code to the main CoreCycler script now, so a first test version shouldn't be too far away.

sp00n commented 1 month ago

Here's a first experimental version. It does try to set the main process affinity after having set the thread group affinity, let me know if this actually sets the correct CPUs in the correct Processor Group or not. And if it somehow interferes with the program running correctly, comment out line 4531:

# Maybe also set the process affinity now?
$Script:stressTestProcess.ProcessorAffinity = $affinity

It seems to work fine with my 24 core system, but I only have one Processor Group, so no idea how this will interact when there are multiple groups.

I've forced it to run the new code in this version, but I plan to use the old default .ProcessAffinity setting for systems with less than 64 cores. No need to set the individual threads there I guess.

script-corecycler-0.9.5.0alpha4-experimental.zip

djangoa commented 1 month ago

@sp00n Hi,

One thing I noticed is that the main process affinity does not change if you change its threads affinity values. It also shows up unchanged in Process Explorer (and I assume in the Task Manager as well). So it doesn't seem to propagate upwards to the main process.

Yes it's a limitation and unfortunately I think this means that if a process spawns a new thread it will have the parent's process affinity instead of the affinity of the threads previously set.

I assume it would effectively be the same as the vanilla .ProcessorAffinity from PowerShell itself?

I think your assertion here is correct.

It that's the case, maybe I could use this to then set the affinity of the main process as well, after having assigned the thread group affinity. And if it's only to avoid having an open ticket because someone was confused because the CPU affinity wasn't "correctly" set.

From my testing the SetProcessAffinityMask only sets the mask of the processor in the given group it was created in when the process was spawned. If you've added threads from this process into another group making the process "multi group" it breaks further affinity interaction with the process.

Here's a first experimental version.

Perfect thank! I'll give it a try tomorrow and report back.

It does try to set the main process affinity after having set the thread group affinity, let me know if this actually sets the correct

As per my previous comment this will not do any harm but will stop the process from being changeable in the future. I'll capture a screen shot of the affinity after this is applied and show you what I mean.

I was thinking it might be worth seeing if a virtual machine can be used to test different process group layouts.

djangoa commented 1 month ago

@sp00n Hi,

So I tested and can report that affinity is set correctly but I had to comment out:

# Maybe also set the process affinity now?
#$Script:stressTestProcess.ProcessorAffinity = $affinity

As it gives the error: FATAL ERROR: Could not set the affinity to Core 0 (CPU 0)!

Which makes sense as after you set the affinity of a thread the process is now multi-group and SetProcessAffinityMask no longer functions.

I also noticed after each core is cycled, I get the following error:

ERROR: 12:33:18
ERROR: There has been an error while running Prime95!
ERROR: At Core 33 (CPU 66)
ERROR MESSAGE: The Prime95 process doesn't use enough CPU power anymore (only 0.89% instead of the expected 0.89%)
                 + No FFT size provided in the error message, make an educated guess.
ERROR: The last *passed* FFT size before the error was: 8960K
ERROR: Unfortunately FFT size fail detection only works for Smallest, Small or Large FFT sizes.
                 + The max FFT size was outside of the range where it still follows a numerical order

Any ideas on that one?

djangoa commented 1 month ago

Hi again,

I disabled the CPU utilisation check and that resolved the previous error.

But I also checked using 2 threads and have come across another problem:

13:26:03 - Set to Core 31 (CPU 62 and 63)
                 + Setting affinity to CPU(s): 62 and 63
                 + More than 64 cores detected, try to get the correct group affinity
                 + The number of Processor Groups:       2
                 + The number of CPUs in the last group: 48
                 + The group ID of the CPU to set to: 0
                 + The number of processors in this group: 64
                 + The IDs of the CPUs in its own group: 62 63
                 + Setting the affinity has failed, trying again...
                 + Setting affinity to CPU(s): 62 and 63
                 + More than 64 cores detected, try to get the correct group affinity
                 + The number of Processor Groups:       2
                 + The number of CPUs in the last group: 48
                 + The group ID of the CPU to set to: 0
                 + The number of processors in this group: 64
                 + The IDs of the CPUs in its own group: 62 63
                 + Trying to close the stress test program
                 + Trying to close Prime95
                 + Trying to gracefully close Prime95
                 + Could not gracefully close Prime95, killing the process
FATAL ERROR: Could not set the affinity to Core 31 (CPU 62 and 63)!

I've attached a couple of logs showing this behaviour: logs.zip. It seems setting affinity on the last core (CPU 63) in group 0 fails when assigning the stress test to both logical processors (i.e. threads).

As far as I can tell and other than the above, everything is working correctly including setting the affinity of threads to logical CPUs.

sp00n commented 1 month ago

The signed [Int64] I was using actually had a buffer overflow at core 63, I changed it [UInt64], so it should work.

At least for the new functionality with SetThreadGroupAffinity, the regular PowerShell .ProcessorAffinity property doesn't take [UInt64] or [BigInt], even when trying to set to core 0. 😑

sp00n commented 1 month ago

@djangoa Can you check which affinity is returned after you've manually set a process to CPU 63 (and 62+63), e.g. via the Task Manager?

The calculated value is 2^63, so 9223372036854775808. But this is actually 1 above the 64bit integer max value of 9223372036854775807, so it might return a negative value instead -9223372036854775808). And since apparently you cannot set an unsigned integer above the signed max value to . ProcessorAffinity, it might actually need a negative value instead.

sp00n commented 1 month ago

Also, here's a second version, which should fix the error for core 63. script-corecycler-0.9.5.0alpha4-experimental2.zip

Can you check that both the new functionality as well as the old one works? For the new one, just let it run (but make sure it does include core 63), and to check the old one, uncomment #$hasMoreThan64Cores = $false in line 333 and set the coreTestOrder in the config to something like 61, 62, 63 (so it doesn't try to set anything beyond the last core in your first processor group).

djangoa commented 1 month ago

@djangoa Can you check which affinity is returned after you've manually set a process to CPU 63 (and 62+63), e.g. via the Task Manager?

The calculated value is 2^63, so 9223372036854775808. But this is actually 1 above the 64bit integer max value of 9223372036854775807, so it might return a negative value instead -9223372036854775808). And since apparently you cannot set an unsigned integer above the signed max value to . ProcessorAffinity, it might actually need a negative value instead.

Sorry I don't fully understand your question. What method would you like me to invoke to check affinity set to CPU 63?

In my implementation I used System.UInt64 only but experienced issues with the toString method and displaying the affinity of CPU 63 too. This looks like an issue with System.Int64 as it has a range of -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

https://devblogs.microsoft.com/scripting/understanding-numbers-in-powershell/

I assume that setting core 63 with the old method if it uses System.Int64 wouldn't work either.

djangoa commented 1 month ago

Also, here's a second version, which should fix the error for core 63. script-corecycler-0.9.5.0alpha4-experimental2.zip

Can you check that both the new functionality as well as the old one works? For the new one, just let it run (but make sure it does include core 63), and to check the old one, uncomment #$hasMoreThan64Cores = $false in line 333 and set the coreTestOrder in the config to something like 61, 62, 63 (so it doesn't try to set anything beyond the last core in your first processor group).

Yes I'll do that now and report back shortly.

sp00n commented 1 month ago

@djangoa Can you check which affinity is returned after you've manually set a process to CPU 63 (and 62+63), e.g. via the Task Manager? The calculated value is 2^63, so 9223372036854775808. But this is actually 1 above the 64bit integer max value of 9223372036854775807, so it might return a negative value instead -9223372036854775808). And since apparently you cannot set an unsigned integer above the signed max value to . ProcessorAffinity, it might actually need a negative value instead.

Sorry I don't fully understand your question. What method would you like me to invoke to check affinity set to CPU 63?

In my implementation I used UInt64 only but experienced issues with the toString method and displaying the affinity of CPU 63 too. This looks like an issues with as INT64 should has a range of -9,223,372,036,854,775,808 to -9,223,372,036,854,775,807

https://devblogs.microsoft.com/scripting/understanding-numbers-in-powershell/

You can just run (Get-Process 'notepad').ProcessorAffinity, which returns a value when executed in a PowerShell terminal.

In the new version above, I'm now using your bit mask string function, but convert it to Int64 instead of UInt64, which will actually return a negative value for core 63. Which is why I'd like to see if it actually works correctly also with the old functionality. SetThreadGroupAffinity for the new functionality actually accepts unsigned int 64, so the problem doesn't appear there (now that I changed it to be passed an UInt64 value).

djangoa commented 1 month ago

You can just run (Get-Process 'notepad').ProcessorAffinity, which returns a value when executed in a PowerShell terminal.

It returns -1 as you expected.

djangoa commented 1 month ago

Can you check that both the new functionality as well as the old one works? For the new one, just let it run (but make sure it does include core 63), and to check the old one, uncomment #$hasMoreThan64Cores = $false in line 333 and set the coreTestOrder in the config to something like 61, 62, 63 (so it doesn't try to set anything beyond the last core in your first processor group).

Sorry neither worked, I've attached the logs:

Cores 61, 62, 63 and uncommented Line 333: $hasMoreThan64Cores = $false CoreCycler_2024-05-24_23-55-14_PRIME95_AVX2.log

Cores 63, 64 and uncommented Line 332: $hasMoreThan64Cores = $true CoreCycler_2024-05-24_23-56-56_PRIME95_AVX2.log

sp00n commented 1 month ago

You can just run (Get-Process 'notepad').ProcessorAffinity, which returns a value when executed in a PowerShell terminal.

It returns -1 as you expected.

Huh, I actually expected it to return -9223372036854775808 for core 63. 😶 Because [System.Convert]::ToInt64('1000000000000000000000000000000000000000000000000000000000000000', 2) does return this instead of -1. Let's see if it still works correctly. If it doesn't accept -9223372036854775808 for core 63, then I guess I will have to do an if/else clause.

Can you also check if setting the affinity via PowerShell with (Get-Process 'notepad').ProcessorAffinity = -9223372036854775808 and/or (Get-Process 'notepad').ProcessorAffinity = -1 correctly sets the affinity to core 63 in the Task Manager? That would all be things that would be easier if I had access to a 64+ core system. But at least locally I cannot set up a virtual machine, as any "virtual" core needs at least one physical core to match to (so I can't get more than my 24 cores in a VM).

Can you check that both the new functionality as well as the old one works? For the new one, just let it run (but make sure it does include core 63), and to check the old one, uncomment #$hasMoreThan64Cores = $false in line 333 and set the coreTestOrder in the config to something like 61, 62, 63 (so it doesn't try to set anything beyond the last core in your first processor group).

Sorry neither worked, I've attached the logs:

Cores 61, 62, 63 and uncommented Line 333: $hasMoreThan64Cores = $false CoreCycler_2024-05-24_23-55-14_PRIME95_AVX2.log

Cores 63, 64 and uncommented Line 332: $hasMoreThan64Cores = $true CoreCycler_2024-05-24_23-56-56_PRIME95_AVX2.log

Aaand I forgot about Hyperthreading. Instead of core 61, 62, 63 in the config it should be core 29, 30, 31 in the config file (the latter becomes the logical processors 62 & 63 if used with 2 threads and Hyperthreading active).

djangoa commented 1 month ago

Can you also check if setting the affinity via PowerShell with (Get-Process 'notepad').ProcessorAffinity = -9223372036854775808 and/or (Get-Process 'notepad').ProcessorAffinity = -1 correctly sets the affinity to core 63 in the Task Manager?

Both -1 and -9223372036854775808 set the affinity correctly to logical processor 63 in the task manager.

Instead of core 61, 62, 63 in the config it should be core 29, 30, 31 in the config file (the latter becomes the logical processors 62 & 63 if used with 2 threads and Hyperthreading active).

Uncomment #$hasMoreThan64Cores = $false with 29, 30, 31 still fails: CoreCycler_2024-05-25_01-00-22_PRIME95_AVX2.log

With #$hasMoreThan64Cores = $false commented and cores 31, 32 everything works as expected.

djangoa commented 1 month ago

But at least locally I cannot set up a virtual machine, as any "virtual" core needs at least one physical core to match to (so I can't get more than my 24 cores in a VM).

That's a shame, if you have access to a Linux box there should be no such limitation with with libvirt and kvm/qemu: https://libvirt.org/formatdomain.html#cpu-model-and-topology "Guest NUMA topology can be specified using the numa element" but that's probably a lot of work to setup.

sp00n commented 1 month ago

Can you also check if setting the affinity via PowerShell with (Get-Process 'notepad').ProcessorAffinity = -9223372036854775808 and/or (Get-Process 'notepad').ProcessorAffinity = -1 correctly sets the affinity to core 63 in the Task Manager?

Both -1 and -9223372036854775808 set the affinity correctly to logical processor 63 in the task manager.

Instead of core 61, 62, 63 in the config it should be core 29, 30, 31 in the config file (the latter becomes the logical processors 62 & 63 if used with 2 threads and Hyperthreading active).

Uncomment #$hasMoreThan64Cores = $false with 29, 30, 31 still fails: CoreCycler_2024-05-25_01-00-22_PRIME95_AVX2.log

With #$hasMoreThan64Cores = $false commented and cores 31, 32 everything works as expected.

Ah ok, forcing $hasMoreThan64Cores = $false actually interferes with the processor group calculation.

Instead you can manually set the number of cores after line 298, e.g.

$numLogicalCores = 64
$numPhysCores    = $numLogicalCores/2

This will then also cause the old behavior. I actually expect this to fail for core 63 as well, since if both -1 and -9223372036854775808 correctly set the affinity to core 63, but reading the property only returns -1 instead of -9223372036854775808, then the affinity check will not match.

djangoa commented 1 month ago

Instead you can manually set the number of cores after line 298, e.g.

$numLogicalCores = 64
$numPhysCores    = $numLogicalCores/2

This will then also cause the old behavior. I actually expect this to fail for core 63 as well, since if both -1 and -9223372036854775808 correctly set the affinity to core 63, but reading the property only returns -1 instead of -9223372036854775808, then the affinity check will not match.

It failed earlier on the first core 29 (58/59) CoreCycler_2024-05-25_02-08-59_PRIME95_AVX2.log

sp00n commented 1 month ago

Ok, that's weird. Can you set that affinity over the command line? (Get-Process 'notepad').ProcessorAffinity = 864691128455135232

djangoa commented 1 month ago

Ok, that's weird. Can you set that affinity over the command line? (Get-Process 'notepad').ProcessorAffinity = 864691128455135232

Yes but only if notepad is created on group 0:

PS C:\Users\Django\Desktop\Overlocking\CoreCycler-v0.9.4.2> (Get-Process 'notepad').ProcessorAffinity = 864691128455135232
PS C:\Users\Django\Desktop\Overlocking\CoreCycler-v0.9.4.2> (Get-Process 'notepad').ProcessorAffinity = -1
PS C:\Users\Django\Desktop\Overlocking\CoreCycler-v0.9.4.2> (Get-Process 'notepad').ProcessorAffinity = 9223372036854775808
Exception setting "ProcessorAffinity": "Cannot convert the
"9223372036854775808" value of type "System.Decimal" to type "System.IntPtr"."
At line:1 char:1
+ (Get-Process 'notepad').ProcessorAffinity = 9223372036854775808
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [], SetValueInvocationExceptio
   n
    + FullyQualifiedErrorId : ExceptionWhenSetting

PS C:\Users\Django\Desktop\Overlocking\CoreCycler-v0.9.4.2> (Get-Process 'notepad').ProcessorAffinity = 9223372036854775807
PS C:\Users\Django\Desktop\Overlocking\CoreCycler-v0.9.4.2> (Get-Process 'notepad').ProcessorAffinity = 864691128455135232
PS C:\Users\Django\Desktop\Overlocking\CoreCycler-v0.9.4.2>

image

If the process gets created on group1 then it fails.

The operating system initially assigns each process to a single group in a round-robin manner across the groups in the system[1]

[1] https://learn.microsoft.com/en-us/windows/win32/procthread/processor-groups

I can understand wanting to keep the original method for setting affinity but if the thread method works for machines with only a single group, it might not be as pretty given you can't see the affinity on the parent process but will probably save you time to use only that and avoid having to maintain both in the future? You can still use tools like System Informer to view the affinity of threads visually.

sp00n commented 1 month ago

I wanted to keep the original functionality to be able to debug issues more easily. It's much more convenient if you can tell people to just check the affinity in the Task Manager, instead of having to go through the process of checking each thread.

I already assumed that your error might have something to do with the created process maybe being in the wrong processor group, or already being a "multi-group" process, and this caused the affinity to fail. I guess I could create a special revision for you so that the stress test process is always created in the first processor group. I really wanted to test two things:

sp00n commented 1 month ago

Here's a new script which should initially start the stress test program assigned to processor group 0 and CPUs 2+3. Although the description for start.exe only mentions NUMA nodes for that command, I assume it also directly or indirectly sets the processor group as well.

At least it starts just fine for my computer that doesn't have any NUMA nodes if I set the value to 0, while setting it to 1 and 2 failed.

script-corecycler-0.9.5.0alpha4-experimental3.zip

djangoa commented 1 month ago

script-corecycler-0.9.5.0alpha4-experimental3.zip

I'll test again later this evening.

djangoa commented 1 month ago

@sp00n I tested the new version but it errors when I set:

$numLogicalCores = 64
$numPhysCores    = $numLogicalCores/2

and coreTestOrder = 29, 30, 31

CoreCycler_2024-05-26_00-27-15_PRIME95_SSE.log

Testing coreTestOrder = 31, 32 there was no issues.

CoreCycler_2024-05-26_00-33-24_PRIME95_SSE.log

Let me know if you want to test anything else.

sp00n commented 1 month ago

ffffffffff.... I changed the command for the 'prime95_dev' entry and not the regular 'prime95' entry, so nothing has actually changed in your last test -.-

Sooo here's a new version... script-corecycler-0.9.5.0alpha4-experimental4.zip

djangoa commented 1 month ago

@sp00n I tested the new version but it errors when I set:

$numLogicalCores = 64
$numPhysCores    = $numLogicalCores/2

and coreTestOrder = 29, 30, 31

Still errors for me using the new version: CoreCycler_2024-05-26_01-46-43_PRIME95_AVX2.log

sp00n commented 1 month ago

Mh. Maybe the command doesn't work at all or something else is doesn't. I've created a trimmed down script that tries to set the affinities for notepad, just to see what happens. test_affinities_group_1.zip

djangoa commented 1 month ago

@sp00n I've run the script: test_affinities_group_1.log

sp00n commented 1 month ago

Ok, this is wild. It fails at CPU 48 (core 24, incorrectly displayed in the script).

So it may actually not have run in processor group 0. You have 56 cores, that minus 24 is 32, which translates to 64 threads/logical processors/CPUs. Which would be the full processor group 0, so if it would have run in group 1, it would make sense that it fails when it tries to assign to the non-existing core 24 (the 25th one).

If not, I have no idea. I don't think it's a 48-bit limitation.

What happens if you replace the $command with node 1 instead of node 0? $command = 'cmd /C start /MIN /NODE 1 /AFFINITY 0xC "" "%windir%\system32\notepad.exe"'

If it doesn't start at all, then you cannot actually set the processor group with the /NODE parameter. I.e. for me this says "The system cannot accept the START command parameter 1." For you it should at least start IF the node equals the processor group.

Besides that, I've now seen that QEMU is also available for Windows. I'm trying to get it to work right now, so that hopefully I can test this myself without having to bother you.

djangoa commented 1 month ago

With node 1:

PS C:\Users\Django\Desktop\Overlocking\CoreCycler-v0.9.4.2> .\test_affinities_group_1.ps1
Number of logical cores: 64
The command using to start:
cmd /C start /MIN /NODE 1 /AFFINITY 0xC "" "%windir%\system32\notepad.exe"
The system cannot accept the START command parameter 1.
More than one open Notepad window found!
At C:\Users\Django\Desktop\Overlocking\CoreCycler-v0.9.4.2\test_affinities_grou
p_1.ps1:205 char:9
+         throw('More than one open Notepad window found!')
+         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationStopped: (More than one open Notepad wi
   ndow found!:String) [], RuntimeException
    + FullyQualifiedErrorId : More than one open Notepad window found!

Don't worry about bothering me, Happy to help :)

I've not used QEMU on windows be interesting to see how you get one with it.

sp00n commented 1 month ago

It seems you had more than one notepad window open. Can you close all and try again, just to make sure? Although it seems that setting the node to 1 actually doesn't work, and with node 0 it just selects a random processor group.

Maybe execute the script multiple times (while closing the notepad window after each attempt), until by chance it's assigned to group 0 and can proceed beyond CPU 48. 😅

QEMU proves a bit difficult to set up in Windows, my first attempt with the QTEmu GUI failed (won't even start a window for the VM), so I'm now trying directly over the CLI.

djangoa commented 1 month ago

Sorry for not checking carefully the error. Selecting node 1 works: test_affinities_group_1.log

Doesn't matter how many times I try with node 0, it always errors out at CPU 48.

sp00n commented 1 month ago
The command using to start:
cmd /C start /MIN /NODE 1 /AFFINITY 0xC "" "%windir%\system32\notepad.exe"
The system cannot accept the START command parameter 1.
The process ID: 9608
The assigned affinity at startup: -1

Wait, what? 😄 So it doesn't accept the parameter 1, but still starts the notepad process? And runs through?

Here's another script revision where I added assignment to the second thread instead of only to the first or both threads, just to see if CPU is -1 or that other large negative number. Not really necessary, since apparently for CPU 62+63 the set affinity matches the queried affinity, so it should work also fine in CoreCycler! But I'm curious now.

And also curious... what happens if you set the node parameter to 2 instead of 1 after that. Will it also start? Will it then assign to the processor group with the remaining 24 processors?🤔 test_affinities_group_1.zip

djangoa commented 1 month ago

@sp00n Hi again,

Does seem strange it runs but when running the latest script unedited and when setting "NODE 2" without notepad open I'm getting the following errors:

PS C:\Users\Django\Desktop\Overlocking\CoreCycler-v0.9.4.2> .\test_affinities_group_1.ps1
Number of logical cores: 64
The command using to start:
cmd /C start /MIN /NODE 1 /AFFINITY 0xC "" "%windir%\system32\notepad.exe"
The system cannot accept the START command parameter 1.
Get-Process : Cannot find a process with the name "notepad". Verify the
process name and call the cmdlet again.
At C:\Users\Django\Desktop\Overlocking\CoreCycler-v0.9.4.2\test_affinities_grou
p_1.ps1:202 char:33
+     $Script:stressTestProcess = Get-Process 'notepad'
+                                 ~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (notepad:String) [Get-Process],
   ProcessCommandException
    + FullyQualifiedErrorId : NoProcessFoundForGivenName,Microsoft.PowerShell.
   Commands.GetProcessCommand

Notepad wasn't started?
At C:\Users\Django\Desktop\Overlocking\CoreCycler-v0.9.4.2\test_affinities_grou
p_1.ps1:209 char:9
+         throw('Notepad wasn''t started?')
+         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationStopped: (Notepad wasn't started?:Strin
   g) [], RuntimeException
    + FullyQualifiedErrorId : Notepad wasn't started?

If I open notepad then run, depending on which group is chosen get the following output: test_affinities_group_1_a.log / test_affinities_group_1_b.log

I don't think that setting NUMA on start would set the process group as these seem to be distinct from each other from what I've read.

dkit commented 1 month ago

Hello. Apologies for not responding sooner, I didn't have access to the code for a few days.

I have a single CPU with 64 cores and 128 threads (hyper threading). I used the attached C# program to help me set the affinity for stressTestProgram = YCRUNCHER. It seemed to work, but I agree that for another stressTestProgram it might not work if that program spawns threads after the affinities are set.

AffinityModifier.zip

The helper program calculates the group from the core id (actually sets it based on the last core id), so I don't expect it to work for multi-cpu systems or any application that requires affinities across groups.

Here is how I used it:

diff -r ./script-corecycler.ps1 ..\CoreCycler-v0.9.5.0alpha2\/script-corecycler.ps1
4908c4908
<                     $affinity        += [Int64] [Math]::Pow(2, $cpuNumber)
---
>                     # $affinity        += [Int64] [Math]::Pow(2, $cpuNumber)
4911c4911
<
---
>
4922c4922
<                         $affinity        += [Int64] [Math]::Pow(2, $cpuNumber)
---
>                         # $affinity        += [Int64] [Math]::Pow(2, $cpuNumber)
4932c4932
<                     $affinity         = [Int64] [Math]::Pow(2, $cpuNumber)
---
>                     #$affinity         = [Int64] [Math]::Pow(2, $cpuNumber)
5032a5033,5035
>             $paramCPUString = (($cpuNumbersArray | sort) -Join ' ')
>
>             $affinityCmd = 'helpers\AffinityModifier.exe ' + $stressTestProcessId + ' ' + $paramCPUString
5036,5037c5039,5042
<
<                 $stressTestProcess.ProcessorAffinity = $affinity
---
>                 Write-ColorText ('Running: ' + $affinityCmd) Yellow
>                 iex $affinityCmd
>                 # $stressTestProcess.ProcessorAffinity = $affinity
5045c5050,5055
<                     $stressTestProcess.ProcessorAffinity = $affinity
---
>
>                     # $stressTestProcess.ProcessorAffinity = $affinity
>                     Write-ColorText ('Running: ' + $affinityCmd) Yellow
>                     iex $affinityCmd
>
5054c5064
<             $checkingAffinity = $stressTestProcess.ProcessorAffinity
---
>             # $checkingAffinity = $stressTestProcess.ProcessorAffinity
5056,5062c5066,5072
<             if ($checkingAffinity -ne $affinity) {
<                 Write-Verbose('The affinity could NOT be set correctly!')
<                 Write-Verbose(' - affinity trying to set: ' + $affinity)
<                 Write-Verbose(' - actual affinity:        ' + $checkingAffinity)
<
<                 Exit-WithFatalError('The affinity could not be set correctly!')
<             }
---
>             # if ($checkingAffinity -ne $affinity) {
>             #    Write-Verbose('The affinity could NOT be set correctly!')
>             #    Write-Verbose(' - affinity trying to set: ' + $affinity)
>             #    Write-Verbose(' - actual affinity:        ' + $checkingAffinity)
>             #
>             #    Exit-WithFatalError('The affinity could not be set correctly!')
>             #}
sp00n commented 1 month ago

@djangoa

Yeah, it seems the NODE parameter doesn't set the processor group then. And I've found no other way to actually do so from the command line. Well, at least you managed to get a complete run of the script, and actually everything worked fine there! And since this is basically the same code as in the main script, I'm pretty confident now that it'll work for a 1-group 64 processor.

So I guess CoreCycler now supports CPUs up to and above 64 cores/threads. Thanks to you. 👍

Only setups with multiple sockets probably won't work with the current implementation, as they will probably split up the processor groups evenly, and not fill up the first group before proceeding to the next one. But I don't assume someone with a true server mainboard will want to check single-core overclocking/undervolting stability. :shrug:

PS: QEMU on Windows is not something I can really recommend so far. I was able to set up a VM with 96 cores (no hyperthreading though), but haven't been able to get network access so far.

sp00n commented 1 month ago

@djangoa @dkit

Here's a CoreCycler version that should be able to run on 64+ cores. Let me know of any problems.

script-corecycler-0.9.5.0alpha4-experimental5.zip

djangoa commented 1 month ago

So I guess CoreCycler now supports CPUs up to and above 64 cores/threads. Thanks to you. 👍

No problem, thank you for taking the time to develop support for hardware you don't have access too. I know it's been challenging.

PS: QEMU on Windows is not something I can really recommend so far. I was able to set up a VM with 96 cores (no hyperthreading though), but haven't been able to get network access so far.

I've never used it on windows only Linux, hopefully you can get it working. If you can get it working with libvirt it will do most of the heavy lifting in configuring the hardware of a VM for you.

Here's a CoreCycler version that should be able to run on 64+ cores. Let me know of any problems.

script-corecycler-0.9.5.0alpha4-experimental5.zip

I'll report back if I experience any issues.

djangoa commented 3 weeks ago

Hi again, I just wanted to report back and state I've been using experimental5 for some time now and over the last 2 weeks of testing I have not experienced any issues.