whizzscooters / android_kernel_oneplus_sm8250

Other
1 stars 0 forks source link

[Enhancement] Reduce thermal throttling of non-CPU hardware #12

Open hometue opened 3 years ago

hometue commented 3 years ago

Devfreq is the throttling for non-CPU hardware. Considering that it is reported that latency increases when temperature increase, should disable this setting, might resolve issue. Unsure if we can test to see what the effects are, but this is similar to CONFIG_CPU_THERMAL anyway, just for non-CPU hardware.

hometue commented 3 years ago

May need to push this out to all devices. Since we are pushing a new kernel release, can take the chance to do other optimizations before we make a new kernel release.

hometue commented 3 years ago

This is linked to increased frequency of stop start issues. As mentioned, because the device throttles non-CPU hardware, latency will also throttle. Have recorded such occurances, where after the appearance of Android's overheating notification, latency avg before throttling is 20ms (WIFI), but frequently doubles (inconsistant. Occurs maybe 30% of the time) and observed 100+ ms (1 frame only). Tested only for about 20 mins, so such an issue is only made much worse in more realistic scenario with 4G network instead of WIFI and running for a long time.

Possible that GPU is being throttled too.

Normal: msedge_9I5udlXBPz Throttle:

msedge_cQud5Hl8it msedge_JadQewRVJI

@MelvinFMQ @rehohoho

MelvinFMQ commented 3 years ago

@hometue do note that the latency reported on citadel is the max latency within the previous 500ms. It is probally the correct metric anyway as we are more concerned with spiky latency compared to the average.

hometue commented 3 years ago

Got it. What abt the ping that Cortex uses to decide whether to stop if above 1000ms, is that current ping or the max ping within 500ms?

MelvinFMQ commented 3 years ago

How that works is like a dead man switch, by default, it will stop within 600ms unless it gets a reply.

hometue commented 3 years ago

@MelvinFMQ Update on the issue: disabling the switch seems to fix the issue. There is still some sort of lag spikes, but in general the latency has decreased. Unsure if lag spikes are as bad or better (due to them being the outliers, data on them is statistically difficult to compare), however given that latency even at higher temperature behaves better than without the change, I like to think that lag spikes should be better. (In general, mean is lower. Amount of time spent in a "spiked" behavior has reduced too)

However, a problem with this is that now the phone overheats like mad. It just ran all the way to 75 deg C before Android decided it had enough of my shenanigans and shut down all application processes, even Cortex. Can't really argue with that, this is less of throttling and more of the safety switches that shut off things above a critical temperature. Don't like the idea of a phone skyrocketing too high, soon we can stop using the phone as just the brains of the operation and maybe start to cook a nice cup of coffee or tea. Doesn't sound safe for the batteries.

What this means is that the fix, if still possible, has become a much harder one. (Well, relatively, given that this is a 1 liner change, anything is a lot harder) I have some ideas for this, given that this impacts reliability I am still going to focus on tackling this. It shouldn't be that hard (will keep updated).

MelvinFMQ commented 3 years ago

ok, as discussesed IRL. If there is no trottling <75 deg at all, it might be what we want. We just have to trust that the cooling system does this.

hometue commented 3 years ago

Preliminary test: In shade when it is raining, temperature settle to about 69-70 deg C (Just phone on fan. Not connected to any electronics) Further testing needed.

hometue commented 3 years ago

@MelvinFMQ @rehohoho Currently this is a wontfix. We are waiting on 2 things: Rui En's migration to tensorflow and then use GPU, and Peng Lang's new mount, before we determine if it runs too hot and cause reliability issues when it keeps closing the app.