sp00n / corecycler

Script to test single core stability, e.g. for PBO & Curve Optimizer on AMD Ryzen or overclocking/undervolting on Intel processors
Other
663 stars 30 forks source link

Script "frozen" for long time (2,5h) #53

Closed Grzywax closed 3 weeks ago

Grzywax commented 10 months ago

Hi, first of all great job! I started to lay with this script and it's great. The fact that even Ryzen Master does not test in similar way to set PBO is beyond me. It often baffles me why obvious solutions are created by individuals like you.

Anyway I have this weird behavior where when I leave scrip unattended sometimes it freezes. And I have feeling that it only continues when I wake up my PC and click on CMD window - then it "wakes up". Also when it wake's up it spits out log for Pime95 for those 2.5h that was missing. It's like it got stuck on one core for some reason.

Here's fragment of the log -> jump from 22:16:06 to 00:42:30 (below I have pasted output as well + Prime95 log attached):

                 + 22:15:50 - Tick 22 of max 36
                 +            Remaining max runtime: -14s
                 +            The remaining run time (0) is less than the tick interval (10), this will be the last interval
                 + 22:15:50 - Suspending the stress test process for 1000 milliseconds
                 +            Suspended: True
                 +            Resuming the stress test process
                 +            Resumed: True
                 + 22:15:52 - Getting new log file entries
                 +            Getting new log entries starting at position 3562 / Line 135
                 +            The new log file entries:
                 +            - [Line 136] Self-test 9216K passed!
                 +            New file position: 3587 / Line 136
                 + 22:15:52 - Checking CPU usage: 3.12%
                 + One last error check before finishing this core
                 + 22:15:54 - Checking CPU usage: 3.12%
22:15:55 - Completed the test on Core 10 (CPU 20)
                 + Still available cores: 3, 11, 4, 12, 5, 13, 6, 14, 7, 15
                 + The selected core to test: 3
22:15:55 - Set to Core 3 (CPU 6)
                 + Setting the affinity to 64
                 + Successfully set the affinity to 64
           Running for 6 minutes...
                 + 
                 + 22:15:55 - Tick 1 of max 36
                 +            Remaining max runtime: 360s
                 + 22:16:04 - Suspending the stress test process for 1000 milliseconds
                 +            Suspended: True
                 +            Resuming the stress test process
                 +            Resumed: True
                 + 22:16:06 - Getting new log file entries
                 +            Getting new log entries starting at position 3587 / Line 136
                 +            The new log file entries:
                 +            - [Line 137] Self-test 9600K passed!
                 +            New file position: 3612 / Line 137
                 + 22:16:06 - Checking CPU usage: 27441.55%
                 + 
                 + 00:42:30 - Tick 2 of max 36
                 +            Remaining max runtime: -8435s
                 +            The remaining run time (0) is less than the tick interval (10), this will be the last interval
                 + 00:42:30 - Suspending the stress test process for 1000 milliseconds
                 +            Suspended: True
                 +            Resuming the stress test process
                 +            Resumed: True
                 + 00:42:32 - Getting new log file entries
                 +            Getting new log entries starting at position 3612 / Line 137
                 +            The new log file entries:
                 +            - [Line 138] [Fri Aug 18 22:16:18 2023]
                 +            - [Line 139] Self-test 10240K passed!
                 +            - [Line 140] Self-test 10752K passed!
                 +            - [Line 141] Self-test 11200K passed!
                 +            - [Line 142] [Fri Aug 18 22:17:25 2023]
                 +            - [Line 143] Self-test 20480K passed!
                 +            - [Line 144] Self-test 11520K passed!
                 +            - [Line 145] Self-test 12288K passed!
                 +            - [Line 146] Self-test 12800K passed!
                 +            - [Line 147] [Fri Aug 18 22:18:33 2023]
                 +            - [Line 148] Self-test 13440K passed!
                 +            - [Line 149] Self-test 13824K passed!
                 +            - [Line 150] Self-test 21504K passed!
                 +            - [Line 151] Self-test 14336K passed!
                 +            - [Line 152] [Fri Aug 18 22:19:46 2023]
                 +            - [Line 153] Self-test 15360K passed!
                 +            - [Line 154] Self-test 16000K passed!
                 +            - [Line 155] Self-test 16384K passed!
                 +            - [Line 156] Self-test 17920K passed!
                 +            - [Line 157] [Fri Aug 18 22:20:55 2023]
                 +            - [Line 158] Self-test 22400K passed!
                 +            - [Line 159] Self-test 18432K passed!
                 +            - [Line 160] Self-test 19200K passed!
                 +            - [Line 161] Self-test 20480K passed!
                 +            - [Line 162] [Fri Aug 18 22:22:11 2023]
                 +            - [Line 163] Self-test 21504K passed!
                 +            - [Line 164] Self-test 22400K passed!
                 +            - [Line 165] Self-test 23040K passed!
                 +            - [Line 166] Self-test 23040K passed!
                 +            - [Line 167] [Fri Aug 18 22:23:34 2023]
                 +            - [Line 168] Self-test 24576K passed!
                 +            - [Line 169] Self-test 25600K passed!
                 +            - [Line 170] Self-test 26880K passed!
                 +            - [Line 171] [Fri Aug 18 22:24:45 2023]
                 +            - [Line 172] Self-test 27648K passed!
                 +            - [Line 173] Self-test 24576K passed!
                 +            - [Line 174] Self-test 28672K passed!
                 +            - [Line 175] [Fri Aug 18 22:25:47 2023]
                 +            - [Line 176] Self-test 30720K passed!
                 +            - [Line 177] Self-test 32000K passed!
                 +            - [Line 178] Self-test 32768K passed!
                 +            - [Line 179] [Fri Aug 18 22:26:49 2023]
                 +            - [Line 180] Self-test 8960K passed!
                 +            - [Line 181] Self-test 25600K passed!
                 +            - [Line 182] Self-test 9216K passed!
                 +            - [Line 183] Self-test 9600K passed!
                 +            - [Line 184] [Fri Aug 18 22:27:57 2023]
                 +            - [Line 185] Self-test 10240K passed!
                 +            - [Line 186] Self-test 10752K passed!
                 +            - [Line 187] Self-test 11200K passed!
                 +            - [Line 188] Self-test 26880K passed!
                 +            - [Line 189] [Fri Aug 18 22:29:17 2023]
                 +            - [Line 190] Self-test 11520K passed!
                 +            - [Line 191] Self-test 12288K passed!

..... cut....

                 +            - [Line 683] Self-test 28672K passed!
                 +            - [Line 684] Self-test 30720K passed!
                 +            - [Line 685] [Sat Aug 19 00:39:39 2023]
                 +            - [Line 686] Self-test 32000K passed!
                 +            - [Line 687] Self-test 32768K passed!
                 +            - [Line 688] Self-test 8960K passed!
                 +            - [Line 689] Self-test 32768K passed!
                 +            - [Line 690] [Sat Aug 19 00:40:57 2023]
                 +            - [Line 691] Self-test 9216K passed!
                 +            - [Line 692] Self-test 9600K passed!
                 +            - [Line 693] Self-test 10240K passed!
                 +            - [Line 694] [Sat Aug 19 00:41:59 2023]
                 +            - [Line 695] Self-test 10752K passed!
                 +            - [Line 696] Self-test 11200K passed!
                 +            New file position: 18352 / Line 696
                 + 00:42:32 - Checking CPU usage: 3.15%
                 + One last error check before finishing this core
                 + 00:42:34 - Checking CPU usage: 3.14%
00:42:35 - Completed the test on Core 3 (CPU 6)
                 + Still available cores: 11, 4, 12, 5, 13, 6, 14, 7, 15
                 + The selected core to test: 11
00:42:35 - Set to Core 11 (CPU 22)
                 + Setting the affinity to 4194304
                 + Successfully set the affinity to 4194304
           Running for 6 minutes...
                 + 
                 + 00:42:35 - Tick 1 of max 36
                 +            Remaining max runtime: 360s
                 + 00:42:44 - Suspending the stress test process for 1000 milliseconds
                 +            Suspended: True
                 +            Resuming the stress test process
                 +            Resumed: True
                 + 00:42:46 - Getting new log file entries
                 +            Getting new log entries starting at position 18352 / Line 696
                 +            The new log file entries:
                 +            - [Line 697] Self-test 8960K passed!
                 +            New file position: 18377 / Line 697
                 + 00:42:46 - Checking CPU usage: 3.12%
                 + 
                 + 00:42:47 - Tick 2 of max 36

-> Output:

Starting the CoreCycler...
Press CTRL+C to abort
--------------------------------------------------------------------------------
----------- CoreCycler v0.9.5.0alpha2 started at 2023-08-18 21:38:18 -----------
--------------------------------------------------------------------------------
Log Level set to: ..................... 2 [Writing debug messages to log file]
Stress test program: .................. PRIME95
Selected test mode: ................... SSE
Logical/Physical cores: ............... 32 logical / 16 physical cores
Hyperthreading / SMT is: .............. ON
Selected number of threads: ........... 1
Assign both cores to stress thread: ... OFF
Runtime per core: ..................... 6 MINUTES
Suspend periodically: ................. ENABLED
Restart for each core: ................ OFF
Test order of cores: .................. DEFAULT (ALTERNATE)
Number of iterations: ................. 10000
Selected FFT size: .................... HUGE (8960K - 32768K)

--------------------------------------------------------------------------------
The log files for this run are stored in:
C:\corecycler-master\logs\
 - CoreCycler:   CoreCycler_2023-08-18_21-38-14_PRIME95_SSE.log
 - Prime95:      Prime95_2023-08-18_21-38-14_SSE_HUGE_FFT_8960K-32768K.txt
--------------------------------------------------------------------------------

21:38:21 - Iteration 1
----------------------------------
21:38:21 - Set to Core 0 (CPU 0)
           Running for 6 minutes...
21:44:28 - Completed the test on Core 0 (CPU 0)
21:44:28 - Set to Core 8 (CPU 16)
           Running for 6 minutes...
21:50:57 - Completed the test on Core 8 (CPU 16)
21:50:57 - Set to Core 1 (CPU 2)
           Running for 6 minutes...
21:57:26 - Completed the test on Core 1 (CPU 2)
21:57:26 - Set to Core 9 (CPU 18)
           Running for 6 minutes...
22:03:29 - Completed the test on Core 9 (CPU 18)
22:03:29 - Set to Core 2 (CPU 4)
           Running for 6 minutes...
22:09:36 - Completed the test on Core 2 (CPU 4)
22:09:36 - Set to Core 10 (CPU 20)
           Running for 6 minutes...
22:15:55 - Completed the test on Core 10 (CPU 20)
22:15:55 - Set to Core 3 (CPU 6)
           Running for 6 minutes...
00:42:35 - Completed the test on Core 3 (CPU 6)
00:42:35 - Set to Core 11 (CPU 22)
           Running for 6 minutes...
00:48:41 - Completed the test on Core 11 (CPU 22)
01:00:46 - Set to Core 4 (CPU 8)
           Running for 6 minutes...
01:06:56 - Completed the test on Core 4 (CPU 8)
......
sp00n commented 10 months ago

This very much looks like issue #15 The Windows Terminal window has a "feature" where all execution will stop once you select anything within it, a simple click will be enough for this. And another click or a key press will then remove the selection and return to normal processing.

There doesn't seem to be a way to disable this using Batch or PowerShell, only the user itself can disable the "Quick Editing" feature (see the links in the issue mentioned above).

Grzywax commented 10 months ago

I saw that issue before posting. Point is that after starting script at 21h38 I did not touch my PC. I came back around 00h42, when I discovered that it's frozen / stuck on one core. Also last line before it froze says: Checking CPU usage: 27441.55%.

Could it be something else ?

ps. for sure there is a chance that I came up to PC around this time and I don't remember it. I was watching Foundation episode and in one shot - and I'm pretty sure it was 23h00 when I finished it. So I couldn't be at the pc at 22h16... but like I said, it's Friday evening, after long week at work... maybe my memory is playing trick on me ;]

sp00n commented 10 months ago

At least it has all the indications of it. No log output or activity for CoreCycler whatsoever and then after being re-enabled it parses all the things that have happened in the meantime.

The CPU usage is indeed weird, and it may indeed point some other problem - but I have no idea what it could be that causes this spike and then the script to freeze and then re-awaken on user interaction besides the mentioned Quick Edit feature.

That being said, there are reports that Windows just "does" this from time to time, even if Quick Edit is disabled. So it may just have been one of these coincidences with no apparent reason. 🤷

Grzywax commented 8 months ago

So yeah... basically all those issues slowly go away the more my CO gets optimized. It seems that other cores ware causing the script to fail. Funny thing is that now I can run core-cycler for days and not throw any error on any of the cores, but every few days I would get random reboot/blue screen. So some cores/core are/is still set too high in CO but with core-cycler is now not enough to detect which one. I guess it would have to be super light fast task to would boost clocks to their max and give bad result... I hate Ryzen 7000. It tempts you with CO potential but then to get it perfectly tuned is like fighting windmills.

sp00n commented 3 weeks ago

Closing this, as it's impossible for me to reproduce. If you see this happening again, let me know.