sp00n / corecycler

Script to test single core stability, e.g. for PBO & Curve Optimizer on AMD Ryzen or overclocking/undervolting on Intel processors
Other
663 stars 30 forks source link

The Prime95 process doesn't use enough CPU power anymore #49

Open roaminghawk opened 11 months ago

roaminghawk commented 11 months ago

Hi,

Does this error indicate instability?

ERROR: 10:53:50 ERROR: There has been an error while running Prime95! ERROR: At Core 0 (CPU 0) ERROR MESSAGE: The Prime95 process doesn't use enough CPU power anymore (only 0% instead of the expected 4.17%) ERROR: The last passed FFT size before the error was: 13824K ERROR: Unfortunately FFT size fail detection only works for Smallest, Small or Large FFT sizes.

I'm getting it at stock settings on a 7900X.

roaminghawk commented 11 months ago

For anyone having that error in the future: That error is a combination of CoreCycler/Prime95/Windows 11/CPU C-State. Testing with yCruncher passes. Testing in safe mode with Prime95 passes. Disabling Global C-State and testing in Windows (Not safe mode) passes.

sp00n commented 11 months ago

The CPU utilization error fires if over a course of 8 seconds the stress test program process doesn't run with the expected amount of CPU utilization and no error message could be found in the stress test log file. This CPU utilization check depends on the Windows Performance Counters, which may become buggy. Check the readme file for further details on this, there are ways to fix this if they have somehow corrupted (which happens more often than I had imagined)

Normally it should run with C-States enable if you're running with stock settings (which means no PBO2 and no Curve Optimizer active). I neither have a Ryzen 7000 nor do I run Windows 11 though, so there may be some incompatibilities with the new chips. Or the chip is actually faulty. C-States have been known to cause trouble while overclocking, but this is normally not the case with stock settings, so this might actually be the case. Or the Performance Counters just have freaked out, as explained above. Do you still have both the CoreCycler and the Prime95 log file this run?

roaminghawk commented 11 months ago

Thanks for the reply. I've passed the all cores 6 minute test with C-State disabled with no errors. Logs included with one of the "not enough power" runs. Currently I'm running a Curve Optimizer with an offset of -30 run.

logs.zip

sp00n commented 11 months ago

You have quite a bit of the CPU utilization messages in your log with the C-States enabled, most of the time it recovered in time though so the error wasn't thrown. There are no message at all with the C-States disabled, so there might actually be a problem for your setup with the C-States. Or disabling them somehow reset the Performance Counters and they're now working as expected. It's hard to tell.

The Prime95 log file also doesn't show any errors, so it seems to be running fine for now, although 6 minutes per core is only a first check for the most obvious instabilities. If the problem re-occurs you could try to fix the Performance Counters as described in the readme, or as the ultimate workaround you could disabled the CPU utilization check altogether by setting disableCpuUtilizationCheck = 1 in the config.ini file, under the Debug section.

I'll have to keep the combination of Windows 11 and C-States in mind for when this error is reported in the future.

roaminghawk commented 11 months ago

I'm now testing in safe mode with C-States at auto with default settings and it does not show those errors. In your opinion, what is the next step after passing the 6min test, for less than obvious instabilities, but still not the 12 hr per core test?

LucidLuxxx commented 9 months ago

Not sure if this could be helpful with the dev or devs, but I have a sensor panel in my tower that uses aida64 to show different sensors (CPU utilization, speed, etc). Aida64 had a bug with it as well in windows 11 with the CPU utilization. My cpu was showing 3% while running cinebench multicore lol. Aida64 had an update in their latest beta, in the main applications stability settings there's a new checkbox for windows 11 CPU utilization fix. Now it's reading correctly. I'm not a programmer and don't understand code, but maybe devs could look at aida's application files to see how they fixed it and maybe piggy back off that? I always just run it in safe mode for absolute best stability anyways but just wanted to throw that out of it can help.

sp00n commented 9 months ago

After googling around, it seems to be a problem that was introduced with Windows 11 22H2. Many monitoring programs suddenly reported very low usage, even if the CPU was almost fully loaded.

I've found this (Sidewinder being the author for MSI Afterburner if I remember correctly):

AUWUXfg

I'm using \Process(name_of_process)\% Processor Time for the Performance Counter path to get the CPU utilization, which seems to be affected by this Windows bug. Unfortunately I have no way of testing any alternative myself.

LucidLuxxx commented 9 months ago

Is there any way to use the system idle process and then some kind of calculation to get the opposite? Like (100%_total_cpu-4%_idle=96%_usage)? Kinda like that msi article you shared was saying. Maybe use a VM if your setup is what's preventing you from being able test an alternative? Again, Im not a programmer and I'm probably way off on offering any help lol. Figured its worth a shot. If it helps, good. If not, I still enjoy your program regardless lol.

LucidLuxxx commented 9 months ago

I should also say that I had windows 11 pro before and I had this issue with corecycler reporting not enough power. I did a fresh install of windows 11 pro, for other reasons, and corecycler is now working correctly and doesn't report the power issue anymore. No clue why that would fix it but it did.

sp00n commented 9 months ago

The idle "process" is not a real process, it's just the remainder of the CPU resources that are currently not used. There is a performance counter for % idle time, however I really need to keep track of the stress test process itself, if instead I would check for total CPU usage (or total idle percentage, which is just the reverse), other running processes could interfere and cause false (negative) detections.

leorg99 commented 4 months ago

I am also seeing this issue with a 7600x at stock settings on Windows 11 23H2 with latest cumulative update.

15:22:56 - Set to Core 2 (CPU 4 and 5)
                 + Setting the affinity to 48
                 + Successfully set the affinity to 48
           Running until all FFT sizes have been tested...
                 + 15:23:05 - Suspending the stress test process for 1000 milliseconds
                 +            Resuming the stress test process
                 + 15:23:07 - Checking CPU usage: 8.2%
                 + 15:23:09 - ...the CPU usage was too low, waiting 2000ms for another check...
                 + Process Id: 148932
                 + 15:23:13 - Checking CPU usage again (#1): 8.29%
                 +            Still not enough usage (#1)
                 + 15:23:13 - ...the CPU usage was too low, waiting 2000ms for another check...
                 + Process Id: 148932
                 + 15:23:18 - Checking CPU usage again (#2): 8.15%
                 +            Still not enough usage (#2)
                 + 15:23:18 - ...the CPU usage was too low, waiting 2000ms for another check...
                 + Process Id: 148932
                 + 15:23:22 - Checking CPU usage again (#3): 8.41%
                 +            The process seems to have recovered, continuing with stress testing

When I run prime95 manually (even the latest beta version 30.19 build 9) after corecycler has created the prime.txt and stress.txt files, I see no errors.

Interestingly enough, when I connect to this PC while corecycler is running prime95, the CPU usage does not recover and it throws an error for CPU usage being too low (usually <1%). Again, no issue when running prime95 directly. This happens on every core. My suspicion looking through source is that it's just not detecting the cpu usage/load properly.

Edit: I can give you access to this box as it's just sitting to the side while I test it. You can remote in through anydesk or something and try to debug?

config.zip

Zrrrrrrg commented 3 months ago

I meet this question mostly under the following scene: I set up the program, let it runs itself and leave the machine alone, close the screen, etc.... After a few hours when I come back, open the screen and check the current situation, it shows that it passes all iteration and still running... But just after a few seconds (maybe 10?), it pumps out that "doesn't use enough CPU power anymore". If it is caused by Windows mechanism, could we develop a WSL version using Linux ABI? Except Aida64, y-cruncher and prim95 should have linux executable.

sp00n commented 3 weeks ago

I now have a Windows 11 box which seems to show this behavior. I have disabled the CPU utilization check in the latest alpha3, so the error itself shouldn't appear anymore.

However I noticed that when I re-enable this CPU check, it happens when I enable 2 threads, and the Windows Task Manager (or Process Explorer, System Informer, etc, whatever you're using) then shows that it doesn't fully load both virtual CPUs of the core.

It looks something like this:

image

If anyone wants to test this, let me know if you see the same happening. And if you're using the latest 0.9.5.0alpha3, make sure to re-enable the CPU check by setting disableCpuUtilizationCheck = 0 in the config.ini.

I have no idea what's going on there right now. Right now I suspect the Windows Thread Director or whatever it's called to interfere.

LucidLuxxx commented 3 weeks ago

If anyone wants to test this, let me know if you see the same happening. And if you're using the latest 0.9.5.0alpha3, make sure to re-enable the CPU check by setting disableCpuUtilizationCheck = 0 in the config.ini.

I downloaded alpha3 and left everything at default settings except DisableCpuUtilizationCheck which I set to 0, and Threads I set to 2. Mine seems to be working normally. I'm on Windows 11 23H2 Build 22631.3672 Screenshot 2024-06-07 162436

sp00n commented 3 weeks ago

If anyone wants to test this, let me know if you see the same happening. And if you're using the latest 0.9.5.0alpha3, make sure to re-enable the CPU check by setting disableCpuUtilizationCheck = 0 in the config.ini.

I downloaded alpha3 and left everything at default settings except DisableCpuUtilizationCheck which I set to 0, and Threads I set to 2. Mine seems to be working normally. I'm on Windows 11 23H2 Build 22631.3672

Well, if you don't encounter the error anymore, as you previously said, it's not too unexpected that you also don't see this happening. 😁 I was hoping someone with the error could test this, but at least you did confirm that not all Windows 11 installations suffer from this weird behavior.