sonyxperiadev / bug_tracker

Empty repository that is used as a bugtracker for Open Devices project
52 stars 13 forks source link

[Tama][Apollo][AOSP11][4.14] wled_ovp uses 50+% CPU when the screen is on #757

Open faenil opened 2 years ago

faenil commented 2 years ago

Platform: Tama Device: Apollo Kernel version: 4.14 Android version: AOSP11 r46 Software binaries version: v9a

Previously working on n/a

Description

When the screen is on, top shows high CPU usage by wled_ovp

Symptoms UI is stuttery and laggy, perhaps also due to the high CPU usage of wled_ovp

How to reproduce

Additional context Top excerpt:

   PID USER         PR  NI VIRT  RES  SHR S[%CPU] %MEM     TIME+ ARGS 
   195 root         RT   0    0    0    0 S 61.6   0.0   4:11.87 [irq/94-wled_ovp]
  5856 shell        20   0  10G 4.8M 3.4M R  4.6   0.1   0:00.69 top
   420 root          0 -20    0    0    0 R  3.0   0.0   0:12.55 [kworker/u17:1]
  1120 system       18  -2  16G 407M 359M S  1.3  11.0   0:22.06 system_server
  1681 u0_a122      20   0  14G 212M 152M S  1.0   5.7   0:10.43 com.android.systemui
   358 root         20   0    0    0    0 I  1.0   0.0   0:02.98 [kworker/u16:10]
  5705 root         20   0    0    0    0 R  0.6   0.0   0:00.93 [kworker/u16:2]

Not much in dmesg, beside the radio service failing to start every 1 sec

MarijnS95 commented 2 years ago

Poor white LEDs.

(OVP stands for Over Voltage Protection. Let's pray the voltage limit is simply set too low)

faenil commented 2 years ago

PR up :)

MartinX3 commented 2 years ago

Woohoo

Auto-String-Detection means somewhere manual defined ones are wrong?

MarijnS95 commented 2 years ago

Let's link back to the PR here too for those not seeing GH's automated back-link: https://github.com/sonyxperiadev/kernel/pull/2499

faenil commented 2 years ago

Auto-String-Detection means somewhere manual defined ones are wrong?

Yeah that was exactly my thought too... I cross checked against stock 4.9 and sodp 4.19 dts and drivers and could not find anything clearly wrong, but noticed that they all use auto string detection, so I went for it.

It would be great to understand the root cause, indeed.

My current understanding of this driver is limited, so I appreciate any detailed pointer, else it will take my a few weeks before I put together enough spare time to go through it in detail 😀

Let's link back to the PR here too for those not seeing GH's automated back-link

Thanks for the feedback, will do so from now on 👍

MarijnS95 commented 2 years ago

I cross checked against stock 4.9 and sodp 4.10 dts and drivers and could not find anything clearly wrong, but noticed that they all use auto string detection, so I went for it.

That's totally scary. Fwiw, perhaps you meant 4.14 dts or some other version? I don't think we ever shipped on 4.10.

Would be great to paste some GH permalinks to the DTs enabling autodetection, to back up the reason for making such a PR. Cross-check if it's Apollo or Tama-wide, etc.

No need to dive into the driver for this, it's all configured from DT. But if you feel like playing around, delete it and import the much improved version from mainline, my patches should land in 5.17 that finally make it usable.

faenil commented 2 years ago

That's totally scary. Fwiw, perhaps you meant 4.14 dts or some other version? I don't think we ever shipped on 4.10.

Sorry, I meant 4.19 (I checked 4.9 and 4.19 to spot differences with 4.14), edited the post, thanks

Would be great to paste some GH permalinks to the DTs enabling autodetection, to back up the reason for making such a PR. Cross-check if it's Apollo or Tama-wide, etc.

Will do :+1: Of course it should not be applied to a different set of devices. I have already checked what DT block it was applied to in the other versions and I added it to the same block, but will post permalinks ;)

No need to dive into the driver for this, it's all configured from DT.

Yea, but since the variable names changed, especially some booleans, I also tried to have a quick look at the driver to make sure they were referring to same things, checking some of the register addresses matched etc

But if you feel like playing around, delete it and import the much improved version from mainline, my patches should land in 5.17 that finally make it usable.

Great, sounds like a plan!

faenil commented 2 years ago

Would be great to paste some GH permalinks to the DTs enabling autodetection, to back up the reason for making such a PR. Cross-check if it's Apollo or Tama-wide, etc.

Stock 4.9: https://github.com/sonyxperiadev/kernel-copyleft/blob/364b61ee85425f131d19a944ab5819bcb0b038c8/arch/arm64/boot/dts/qcom/pmi8998.dtsi#L585

SODP 4.19: https://github.com/sonyxperiadev/kernel/blob/85e4c4974990e0b7c1fdd65af7af2e5308f9c0af/arch/arm64/boot/dts/qcom/pmi8998.dtsi#L550

faenil commented 2 years ago

@MarijnS95 I have ported your 5.17 wled driver changes to sodp 4.14 (plus the other missing changes, except those requiring newer kernel APIs/functions such as backlight_get_brightness()) :+1:

Some of the bugs you fixed were also things that got me confused when I had a quick look at the sodp 4.14 driver (things like iterating over num-strings disregarding the enabled strings). Good to see a confirmation that they were actually bugs :+1:

With auto string detection disabled the issues reported in this thread is still present. I will investigate more as I get the chance. I am curious to learn more.

Regarding auto string detection feature, if my understanding is correct it is always good to have it so that it can take care of hardware faults, but I do agree it is not the right fix for this particular issue, so I want to dig more :)

MarijnS95 commented 2 years ago

@faenil great to hear that about the mainline WLED driver! (Though I think you meant 5.17?).

I won't be able to reply to everything for a while nor elaborate much, but I should have said that the mainline driver won't fix the "too many strings are enabled so you get OVP" bug, that still really seems to be a configuration issue.

If we've seriously used string-detection on previous kernels for apollo (I have not followed the links yet), then let's indeed keep that approach.

Though otoh, try enabling less strings (2 instead of 3) and maybe the ovps disappear, meaning that only 2 WLEDs are connected - and that should be the same for all phones, there's supposedly no panel lottery for Apollo.

EDIT: Don't forget to look at node modifiers that set different properties for this driver, you can find these under &pmi8998_wled.

faenil commented 2 years ago

@MarijnS95 yes, I meant 5.17 👍 No, I did not expect it to fix the ovp bug, just reported the test result as confirmation, but yes it was expected indeed :) I mainly hoped the fixes from the mainline driver would perhaps highlight a dt configuration issue. Need to have a better look at the logs to see if it is printing any additional info. If not, I will just see what values the auto-string-detection is settling on (need to look at the source, but I assume it tries all combinations from a hardcoded set and then settles on the ones that don't cause OVP to trigger?) and see if it makes sense to use those as default in the dts.

Yeah, I did look at the node modifiers.

Thanks for your advice, I'll tinker with it a bit and let you know.

If we've seriously used string-detection on previous kernels for apollo (I have not followed the links yet), then let's indeed keep that approach.

Thanks :+1: after enabling auto string detection (both on 4.14 and 5.17 drivers) the backlight is now flickering at every boot (I guess that's the auto-string-detection running its logic) while the Sony logo is on. That's not ideal, so I first would like to dig a bit more to see if there's anything we're missing, beside enabling auto string detection :+1: Like, it might be on Stock the dts values are correct and auto-sring-detection is only enabled to take care of hardware faults (and so it doesn't run at every boot) while here we might be using it to workaround a broken dt config (causing it to run at every boot -> flickering)

faenil commented 2 years ago

I tested with auto-string-detection on 5.17 wled driver, and unfortunately something is not working.

The dts is the same as sodp 4.14, but while with 4.14 driver the screen flickers at boot and then there are no OVPs, with 5.17 driver the screen turns off. After adding a few debug logs, it seems the auto string detection algo on 5.17 only runs fine for the first time (detecting that only strings 0 and 1 are valid). However faults keep getting triggered so the auto-string-detection algo is run again and those subsequent times it fails, reporting no valid sinks and leading to the display being turned off.

Note: both 4.14 and 5.17 drivers report only string 0 and 1 as valid (it seems string 2 is not, even though it's in the dts config), at least on my unit.

Need to investigate more.

MarijnS95 commented 2 years ago

Given the flickering and wasting startup time, I'd shoot for enabling just 2 strings and - as Alin mentioned - there's supposedly no panel lottery on these devices so there should be no variation on this rule. Still feel that this should be overridden just Apollo and Akari (and Akatsuki should have it disabled for obvious reasons), pmi8998.dtsi may work for our case but is just plain wrong in the global sense.

I haven't actually tested autodection on the mainline driver (neither before nor after my patches), perhaps it never worked or one of my patches broke it?

faenil commented 2 years ago

Thanks @MarijnS95, all good points. Yeah, more investigation is needed.

Interesting that the flag was enabled on a global level in the copyleft kernel (unless Akatsuki is using a different archive with that flag turned off)

MarijnS95 commented 2 years ago

Interesting that the flag was enabled on a global level in the copyleft kernel (unless Akatsuki is using a different archive with that flag turned off)

That's the big downside of downstream carrying a single (usually outdated and desynced) copy of various trees per device - code can just be modified at will since it "only runs on this specific device" - so to answer your question: yes Akatsuki (and I think even Apollo and Akari?) each have their own archive(s, if you count all the software versions). In contrast, mainline - and SODP to a diminishing extent - are generic and run on many different devices.