sp00n / corecycler

Script to test single core stability, e.g. for PBO & Curve Optimizer on AMD Ryzen or overclocking/undervolting on Intel processors
Other
716 stars 32 forks source link

Core errors question #81

Open init5-SF opened 3 weeks ago

init5-SF commented 3 weeks ago

Hello there, thanks for the amazing tool, it's super detailed and catches errors that are usually missed by other software.

I have a quick question regarding the output of the tool, what is the best way to deal with the core(s) that have errors? i.e if I have per-core curve optimizing and CoreCycle showed an error on core 3, which is set to -29 in the curve optimizer, should I increase the value to something like -25 and re-test? image I am asking because CoreCycler found errors on nearly half of the cores, which is concerning ☹️

For reference, I am running

Ryzen 5900X + PBO
PPT: 190
TDC: 125
EDC: 170

The per-core curve optimizing values were chosen by Ryzen Master, I didn't decide any of them. Majority of cores are set to -29, couple of cores are set to -11 and -7, that's all.

Also what's the possibility of seeing false positives? (if any)

That would be it, thanks again!

sp00n commented 3 weeks ago

2-4 CO steps is a good adjustment value, depending on how quickly/detailed you want to proceed. The possibility for false positives is there, but only for "not using enough CPU power" errors. These can happen if some other program "steals" the core currently being tested, or the Windows thread scheduler somehow decides to act up. I've also had reports where disabling C-States in the BIOS fixed this issue. Generally this error is there to detect if a test has frozen, but it can lead to false positives. Any errors but this are "real" errors that were reported by the stress test program (like yours in the image above).

Another possible source for not really false positives, but confusion, is an unstable memory overclock. You have to be sure that your RAM is running stable, otherwise any error that is thrown might also be caused by the RAM triggering it, not the CPU.

Lastly, you could check out the new 0.10.0.0 alpha version (pre-release), which adds an Automatic Test Mode, which includes the automatic adjustment of the CO values when an error happens during testing. It's still pretty fresh and not very tested though. The /config/default.config.ini contains the settings and relevant comments on how to use them.

init5-SF commented 3 weeks ago

hey, thanks a lot for all the details!

I have been fiddling around with CO values manually, I don't think my RAM is involved, I've re-ran the tool multiple times and it faults on the same errors every time, if this was RAM I think it would fault on random cores, as for my RAM profile, I don't have anything crazy set, just the normal motherboard XMP and no manual overclocking.

For instance, some cores that faulted at -29 have been changed to -25 and now they no longer error.

I am actually using v0.10.0.0alpha2, so it's the latest, I'll definitely test the Automatic Test Mode and let you know how it goes!

EDIT: Does CoreCycler actually enforce OC values or just gives me the recommended value? If it does actually enforce it, will it conflict with the ones I have in the motherboard settings, or the bios settings will reflect the new values applied by CoreCycler?

sp00n commented 3 weeks ago

The CO values will be temporarily applied until the next reboot. That's one thing still missing from the documentation / comments that I'm going to add.

init5-SF commented 3 weeks ago

I've ran CoreCycler with automatic testing enabled, but I have a question regarding the core behavior. I set it to run only 2 iterations, some cores errored out on the first run, got their CO modified by CoreCycler, then passed the iteration, on the 2nd iteration they errored out again and had CoreCycler increase their OC again.

Is this normal? I mean if a core passes the stress test once, why would it fail on later runs?

sp00n commented 3 weeks ago

I've ran CoreCycler with automatic testing enabled, but I have a question regarding the core behavior. I set it to run only 2 iterations, some cores errored out on the first run, got their CO modified by CoreCycler, then passed the iteration, on the 2nd iteration they errored out again and had CoreCycler increase their OC again.

Is this normal? I mean if a core passes the stress test once, why would it fail on later runs?

Yes, absolutely. One passed stress test iteration is basically nothing. At least if you want some sort of stability, otherwise you could just stick with the Ryzen Master stability test. For my personal level of stability I strive for a (combined) passed duration of about 12 hours per core for my final stability test. So for my (and your) 12 core processor that would be 12*12 = 144 hours of stress testing (it was easier when we were still running with an all-core overclock/undervolt 😌). For me it was an iterative process though, e.g. I had some cores still fail while others ran just fine for many consecutive runs, so I deemed these as stable while figuring out the values for the final ones.

Of course this personal level and therefore the time needed can vary wildly. If you "just" game on that machine and are fine with a freak crash from time to time, then you can use these crashes to further fine-tune your CO values, and so a couple of hours for stress testing might be enough. But if you want a higher level of stability before you start to really use the system with the CO values, more time is needed up front (as an unlucky crash can corrupt your Windows installation and/or file system).

In the /configs directory there a couple of presets, the y-cruncher ones seem to find the obvious errors pretty quickly on Ryzen systems, so they might speed up the initial process. However these do not contain the Automatic Mode, so you would need to add the relevant settings yourself to activate it.