FFT Size defaults to "Huge" but for me "Large" crashes much quicker

verybadsoldier commented 3 years ago

I made some tests with an intentionally too low curve value and Prime95 crashes for me much faster when using Large as opposed to Huge. Large only lasts about 30 seconds for me while Huge lived for like 2 or 3 minutes.

So I guess at least for me Large is the better value or are there also other advantages using Huge?

sp00n commented 3 years ago

After testing with Large and a couple of successful runs, when I added the Huge preset I noticed a couple more errors, so I decided to set the Huge preset as the default. The idea behind the Huge preset is that these very large FFT sizes produce less heat than smaller ones and thus the boost clock can go slightly higher, revealing more stability issues for edge cases in the process. But I have no statistical data which preset might throw an error faster over another, only my own observations.

And realistically you should test with all available FFT sizes and all available modes (SSE/AVX/AVX2) anyway.

verybadsoldier commented 3 years ago

The idea behind the Huge preset is that these very large FFT sizes produce less heat than smaller ones and thus the boost clock can go slightly higher, revealing more stability issues for edge cases in the process.

I think that can be the case but don't has to. When having a different clock you will also get a different voltage (due to the clock/voltage curve applied) so it will be just a different situation. Testing all possible situations (all FFT sizes, SSE/AVX/AVX2) and run it for 12h to 24h each on up to 16 cores is not feasible as it would take weeks or even months. So I think the only chance is to limit what you are testing.

Some weeks ago it did test and compared several FFT sizes (1T, SSE) with a handful of runs.

Values indicate runtime in minutes until error occured.

Large 4	27	1	5	0	3

Small 10	14	20	10	4

Smallest

55min no crash	46	40min no crash

This might be specific to my CPU, no idea. Maybe other people see different values. But for me I will just assume that FFT size Large is the worst case for me and that having the CPU stable on Large also means it will be stable on Small. But also I am not sure if that is necessarily true.

And for some reason (at least for me) Large seems to reveal more errors than Huge.

sp00n commented 3 years ago

Testing all possible situations (all FFT sizes, SSE/AVX/AVX2) and run it for 12h to 24h each on up to 16 cores is not feasible as it would take weeks or even months.

Well yeah, that's the downside of stress testing a Curve Optimizer setting, you can only test a single core at a time, thus to reach the same level of confidence as you'd have in a 12 hour Prime95-stable all-core overclock, you need to test each core for these 12 hours (in total, doesn't necessarily need to be en bloc). There's just no way around this if you aim for a certain degree of stability.

I've had cores throw an error after more than 14 nights of running just fine during stress testing. That's just the nature of how this works for single core testing.

verybadsoldier commented 3 years ago

I am afraid there is no perfect 100% solution to this (yet?). I will go for testing just Large on each of my 16 cores for 12h which still needs at least 8 days. Then let's see what happens in daily usage. To account for this incomplete test I am thinking of just increasing the curve offset by a value of 1 or 2 on each core to just add a bit of stability after the test.

I guess we still have to figure out how to deal best with this curve optimizer as a new method of over/undervolting. Maybe better strategies are available now or maybe later.

Btw. Big thanks for this tool! It makes testing alot easier than the other tools I have tried!

n19htmare commented 3 years ago

The tool is great and I am also trying to figure out how to test without spending more weeks running tests. I really like the sheet you have here also. Did you create from a script run on the logs or was it done manually?

sp00n commented 3 years ago

The tool is great and I am also trying to figure out how to test without spending more weeks running tests. I really like the sheet you have here also. Did you create from a script run on the logs or was it done manually?

That's all manual.

sp00n / corecycler

FFT Size defaults to "Huge" but for me "Large" crashes much quicker #14