vmatare / thinkfan

The minimalist fan control program
GNU General Public License v3.0
541 stars 62 forks source link

enhancement: option to smooth temperature data #176

Open jdchristensen opened 2 years ago

jdchristensen commented 2 years ago

On my X1 Extreme Gen 4, the CPU package temp and core temps often briefly spike up, but then immediately drop down. This causes thinkfan to turn the fans on, but by the time they ramp up, the spike is already gone. I know about the bias option, but the spikes are too large to be managed by it without compromising desired increases. I propose adding an option -w weight which smooths things out by forming a weighted average of the new reading with the old accumulated value. Here weight is a floating point number between 0 and 1, and we would calculate:

  newvalue = (1.0-weight)*oldvalue + weight*newsensorreading

When weight=1.0, we just use the newsensorreading, as before, but when weight=0.7, say, we would smooth it out by averaging against the previous accumulated value (not the previous raw reading).

IgnatBeresnev commented 2 years ago

Can confirm the spikes on P1 Gen 4 (i7-11800H) as well

Frank-Steiner commented 2 years ago

I came here to ask for sth. similar and found this :) So I second such an option, seeing such spikes on my Lenovo T580 with i7-8550U due to the TurboBoost behaviour.

vmatare commented 2 years ago

It's certainly an interesting proposal, but the upcoming 2.0 release is pretty full already with some long-standing bugfixes and another major documentation overhaul for multi-fan support. So I think I'll take a closer look at this for 2.1...

vmatare commented 2 years ago

Just thought a little more about @jdchristensen's statement that the -b option doesn't have the desired effect. I'm not entirely sure what kind of temperature changes we're talking about here and what's actually desired there and why. Since you say "spikes", I'm thinking about short and sudden increases as they are often reported by modern Intel processors, so let's say you have:

temp: 50, 51, 65, 78, 60, 62

With -b-6 you arrive at:

temp: 50, 51, 57, 61, 60, 61 °C
sleep:   5   5   2   5   5    s

effectively canceling out the spike. With -b-3 you'd get:

50, 51, 61, 71, 60, 61 (same sleep times)

Which would keep some of the spike. In fact, I'd consider the behavior with -b-6 more desirable since in reality you don't care about the spike at all. It's just completely absorbed by the heatsink, which has a heat capacity that's probably some orders of magnitude greater than that of the tiny little DIE.

So I guess my question is: Why not use something like -b-6 or even -b-8 to effectively ignore everything except slow and consistent heatup? Or the other way round: what would be the temperature profile where a high negative bias would cancel out a desired increase? After all, if temperatures drop down again all by themselves, then isn't it a good thing we haven't turned on the fan?

edit: not entirely sure ;-) edit2: also keep in mind that thinkfan reduces the sleep time to 2 seconds on \delta_temp > 2°C

vmatare commented 2 years ago

Also note that in modern laptops the fan's heat exchanger never sits directly on the CPU's heatsink, but is connected to it with a heatpipe that further increases heat capacity and delays the effectiveness of active cooling. So unless I'm overlooking something fundamental, I'm pretty certain there's really no point in reacting at all if temperatures drop down again before we even started reacting.

jdchristensen commented 2 years ago

@vmatare I agree that the goal is to not turn the fan on at all when the spike is brief. But I don't understand the example bias calculations you did, and how they result from the formula current_tmax = current_tmax + delta_t * BIAS / 10. Can you explain?

Also, the spikes I'm talking about are much more dramatic. At one second intervals, I'll see temps like 39, 41, 39, 95, 40, 39. It's that 95 that I want to ignore, and so the amount of bias needed would be huge.

vmatare commented 2 years ago

Yeah, when calculating the examples I noticed that the formula in the manpage doesn't describe the whole truth. The actual behavior is a little more complex. However in the simple example you gave, it works just as the formula says, e.g. for -b-9 you'd get:

real temps:   39, 41, 39, 95, 40, 39
biased temps: 39, 41, 39, 45, 40, 39

So the only interesting part is the jump from 39 to 95, i.e.

HOWEVER:

The behavior is this weird because it was designed to also allow for exaggeration of temperature increases by giving a positive bias. Nowadays this seems to be getting rarer, but older CPUs could have a very slow load-to-temperature response because the temperature sensors were much further away from the DIE.

So most importantly: have you tried running thinkfan with e.g. -b-8 and actually noticed any undesirable behavior? And if you didn't I'd be interested to know what else you may have found lacking in the docs (beside the incomplete biasing logic description).

vmatare commented 2 years ago

btw, I also noticed an error in the example I gave above. The actual behavior should be more like this:

With -b-6:

real temps:   50, 51, 65, 78, 60, 62
biased temps: 50, 51, 57, 70, 60, 62

With -b-8:

real temps:   50, 51, 65, 78, 60, 62
biased temps: 50, 51, 54, 68, 60, 62

The difference here being that slow decrease of the bias offset doesn't kick in, but instead the normal biasing formula is applied on each increase because delta_t > 2. I guess it's pretty obvious the documentation for the -b option could use an overhaul ;-)

jdchristensen commented 2 years ago

@vmatare Thanks for giving more explanation. I think that this comment you made shows that there is still a fundamental problem with how the bias works:

If a temperature drops even by -1 °C, the bias for that sensor is eliminated immediately, so if you had e.g. 39, 95, 94, 40, with -b-9 you'd get 39, 45, 94, 40. Maybe that sometimes creates sudden and undesired fan speedups.

Because that will certainly happen.

A second issue is that I don't understand the other examples you gave. Also, with -b-9, shouldn't a temperature change of 40, 43 over five seconds be translated to 40, 38.5? This is what I meant when I said that if you choose a bias large enough to reduce spikes, it appears that it then also misses moderate increases. But I guess this is somehow handled by the undocumented mechanism you mentioned?

In summary, however bias works, it seems very hard to explain and doesn't really do what one wants. A smoothing filter, which is trivial to implement and explain, seems better suited to handling spikes. In a sense, this models how the heatsink itself will be smoothing out the die temperatures.

IgnatBeresnev commented 2 years ago

If this helps in any way, I can collect some data, for instance append CPU temp and fan speed every second to a txt file. Maybe that'll give some insight into the spikes.

If you want me to run some specific command, let me know, will do.