zen-kernel / zen-kernel

Zen Patched Kernel Sources
Other
1.98k stars 129 forks source link

High cpu usage on LG gram 17-ZD90Q #307

Open victorballester7 opened 1 year ago

victorballester7 commented 1 year ago

Hello,

I am have a LG-17ZD90Q (with Intel i7-1260P) and I am on Archlinux with KDE. Since the boot-up, one of my cores is always at 100% in idle mode. And so the fan is always at full power. At the beginning I thought I could be a BIOS problem, but I dual-booted with Windows and there the error seams to disappear. So it's something related to linux definitely.

I tried various kernels: linux-zen (my preferred one), linux, linux-lts, but anyone solved the issue.

One day I did something (probably update the whole packages) and the fan started to slow down as did the cpu... That was surprising but on the next boot the problem reappeared. :(

Also I cannot sleep my laptop because when I try to do it, the laptop freezes... And I think it is a problem related to the cpu's utilization too, because that time when the error disappeared magically I could sleep my laptop without any problems.

I thought the error could come from the inappropriate handling of Intel gen12 in the linux kernels (6.xxx) but they are supposed to handle well the intel gen13, so that makes me doubt... But I leave the problem here, just in case.

As this problem is also related with the large number of acpi interrupts, some people say that you should mask (at the boot-up) the gpeXX with a high number of interrupts that you get with the command

grep -r enabled /sys/firmware/acpi/interrupts

But that's not a solution!!! Because, it may reduce the high cpu usage, but on the other hand, other important interrupts that should be resolved 'instantaneously' (for example, key presses) also get masked... As a result keys such as the ones for increasing the brightness or the Captial letters one did not work properly.

Any help is welcome in order to solve this annoying issue.

damentz commented 1 year ago

So lets get one important elephant out of the room, pretty much if you're not using a laptop from a Linux vendor such as System76, or a vendor known to have great linux support (Lenovo ThinkPad / Dell Precision/XPS), you're going to get odd issues like this. From what you've said it definitely sounds like booting into Linux (de)activates something that change behavior with the fan curve.

You can try changing the OS name or the OSI to match whatever LG's BIOS wants: https://wiki.archlinux.org/title/DSDT. This will require first that you pull down the DST, decompile it, and inspect which operating systems its looking for and what values. And just as an example, I found a wiki page where all the possible combinations were tested; you can see for yourself that it takes trial and error to resolve issues caused by bad BIOS implementation: https://wiki.archlinux.org/title/Talk:ASUS_E403SA

And regarding interrupts, maybe irqbalance may help? https://archlinux.org/packages/extra/x86_64/irqbalance/

That's really it though, you most likely are dealing with a bug that is caused by LG. Not much you can do aside from maybe contact their support, but since they don't support Linux you're probably SOL.

damentz commented 1 year ago

Had to do a bit more research for this, but it looks like integration with the Thread Director feature in Alder Lake and newer Intel CPUs is still not merged into mainline. Here's the latest patch at this time covering a patch set to integrate process classification so the scheduler can mask which work goes to which CPU: https://lkml.kernel.org/lkml/20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com/

With that in mind, I recommend you disable the E-cores since the scheduler really doesn't know what to do in this case, and you'll get unusual and random performance whenever it flips tasks between the efficiency and performance cores.

victorballester7 commented 1 year ago

Thank you @damentz for your answers. (sorry I closed the issue for a second erroneously)

I tried several OS names in the kernel parameters but nothing work. I did not try this option about recompiling the DST and so on, as it seams a little complicated to do it in a short-time period (I don't understand really well much of that) and I have more to loss than to win... I will try it when I have more time available to do trial-and-error solutions.

Your second comment seams more encouraging. Do you know (if it's possible) how can I know when the patch will me merged into the mainline?

Finally, I don't know how to disable the E-cores, and I couldn't find any Arch Wiki article related to that. Could you please help me with this? Thank you very much, I really appreciate your assistance.

damentz commented 1 year ago

You can write a bash script to disable the E cores. Something like:

#!/bin/bash

test $(id -u) == 0 ||\
    echo "You must be root to run this script!"

for i in $(seq 8 15); do
    echo "Disabling core $i"
    echo 0 > /sys/devices/system/cpu/cpu$i/online
done

The values 8 and 15 are just examples, and they're inclusive, assuming cores 0 to 7 are P cores and 8-15 are E. You can probably figure out what your E cores are by looking at the output of /proc/cpuinfo.

Your second comment seams more encouraging. Do you know (if it's possible) how can I know when the patch will me merged into the mainline?

Probably not anytime soon. Also, since we also maintain Project-C in the scheduler, the level of effort to merge in might be more than expected or difficult to merge without breaking the alternative schedulers. It'll probably be best to wait for the kernel version that ships it natively.

victorballester7 commented 1 year ago

Okay, I'll will try that. Thank you.

And I also think the best I can do is to wait for a new kernel version. Hope it takes not too long... :(

AlleNeri commented 1 year ago

I think to have the same problem in the 6.3.2 kernel release: monitoring the interrupts with watch -n1 cat /proc/interrupts I noticed that number of acpi increases so quickly.

And I also think the best I can do is to wait for a new kernel version. Hope it takes not too long... :(

I'm also waiting for the new kernel release.

victorballester7 commented 1 year ago

I think to have the same problem in the 6.3.2 kernel release

What do you mean? Didn't you have that problem in older versions of the kernel? If that's the case, I'll try to downgrade the kernel.

In my case, from time to time something really weird happens (it happened two times). That thing is that for some reason, which I don't know, m laptop boots up and works as it should, without the high cpu usage and so on... But in both cases, whenever I reboot, the problem reappears... If it happens again, I don't know what to track in the 'good' case to later compare the problem with the 'bad' case. Obviously the number of interrupts will differ, but that doesn't help me to find the source of the error.

victorballester7 commented 1 year ago

I found a possible workaround to the problem:

  1. Boot into your linux as always (plugged-in or not, I think it doesn't matter).
  2. Immediately after logging into your desktop, suspend the laptop (I do that by closing the lid).
  3. Wait for a few seconds (1 or 2) and wake it again from sleep.
  4. The kworkers shouldn't appear now.

It seems that this workaround prevents the kworkers from initiating during startup. Once the laptop is suspended, they are unable to appear again, for some unknown reason.

AlleNeri commented 1 year ago

I solved it disabling an option in the bios, this option is called Trusted Platform Module or Trusted Computing(it depends from BIOS to BIOS). I hope it can help

EDIT: I've been using the suggested workaround for a while and it worked. But the problem came back whit this error message and I found the solution.

damentz commented 1 year ago

@victorballester7 can you try and/or confirm if the TPM is the issue on yours too?

heftig commented 1 year ago

Reminds me of https://lore.kernel.org/linux-integrity/20230620-flo-lenovo-l590-tpm-fix-v1-1-16032a8b5a1d%40bezdeka.de/.

Does booting with tpm_tis.interrupts=0 help?

victorballester7 commented 1 year ago

@AlleNeri, for me the workaround also stopped working... It lasted for 1 or 2 weeks only...

I tried both @AlleNeri and @heftig methods, but nothing works... Still plenty of kworkers hogging my cpu... :-(

I suppose I disabled the TPM correctly from the Bios from here (image below)

IMG_20230705_221747

Anyway, really appreciate your comments!

AlleNeri commented 1 year ago

I just found the solution last night; it's working, but I hope it continues to do so. We don't have the same BIOS, I can't help you further @victorballester7

victorballester7 commented 1 year ago

No worries @AlleNeri!

damentz commented 6 months ago

Anything new since the last few major kernels? A lot has changed including the introduction of EEVDF in 6.6.

victorballester7 commented 6 months ago

No... I tried just now to disable the mask of the interrupt that is causing the high cpu usage in the grub file, but still fans get crazy. Maybe there's a better solution... Until now I'm only sacrificing the super delayed button for lightness control, but with a kde widget is easily avoidable. So it's not that bad the feeling.