linrunner / TLP

TLP - Optimize Linux Laptop Battery Life
https://linrunner.de/tlp
GNU General Public License v2.0
2.51k stars 129 forks source link

tlp suspend not applying AC settings? #698

Closed sagarbehere closed 11 months ago

sagarbehere commented 11 months ago

[x] I've read and accepted the Bug Reporting Howto [ ] I've provided all required tlp-stat outputs via Gist (see below)

I'm not sure if this is a bug or just my misunderstanding. So I'm erring on the side of caution and opening a bug report.

Describe the bug

On my laptop (Macbook Air 6,2 aka early-2014 running Ubuntu 22.04 LTS with kernel 5.15.0-43) I've set PCIE_ASPM_ON_AC = default and PCIE_ASMP_ON_BAT = powersave. With this configuration, while on Battery, the laptop's NVMe drive fails immediately after resuming from suspend, with a message in the kernel logs like

nvme unable to change power state from d3cold to d0, device inaccessible

This does not happen if I configure PCIE_ASPM_ON_BAT = default. So I concluded that maybe PCIE_ASPM_ON_BAT = powersave is somehow too aggressive for my particular drive causing it to not "wake up" when the laptop resumes from suspend while on Battery power. This conclusion was reinforced by the fact that if the laptop is plugged in to AC power, suspend and resume works just fine. No NVMe errors.

I do want to use PCIE_ASPM_ON_BAT = powersave while operating the laptop on battery. Without it, the laptop does not reach PC7 state and the battery drains too quickly. So I thought: maybe if I set PCIE_ASPM_ON_BAT = default just before suspending, then the resume problem will be solved. The NVMe drive will "wake up" properly. So I went to /lib/systemd/system-sleep/ to create a pre-suspend hook which revealed the presence of this file which was already present as /lib/systemd/system-sleep/tlp. It already has the commands tlp suspend and tlp resume that apparently get executed before suspend and after resume, respectively. I couldn't find any documentation for what these commands do, except for this manpage. According to that manpage, tlp suspend should already be applying AC power settings before the laptop suspends i.e. the PCIE_ASPM_ON_AC setting should be applied (whose value I have set to 'default' for which the NVMe drive should "wake up" properly after resume). But I don't think this is happening. The NVMe drive does not wake up properly. However, if I modify the pre line to

pre) echo default > /sys/module/pcie_aspm/parameters/policy ; sleep 2 ; tlp suspend ;;

Then everything works as expected. I hypothesize that this modified pre) line causes the PCIE_ASPM to change from 'powersave' to 'default' just before the laptop suspends.. because of which the NVMe drive resumes properly.

Now, after resuming (while on battery power), I was expecting that cat /sys/module/pcie_aspm/parameters/policy would continue to show [default] because that is what the pre-suspend hook set it to. Except that does not happen. After resuming, cat /sys/module/pcie_aspm/parameters/policy shows powersave ?! I assume this is happening because the post) tlp resume ;; line re-applies the tlp configuration and sets PCIE_ASPM_ON_BAT = powersave. This is consistent with what it should do according to the manpage. But then, on a similar note, the man page says that the pre tlp suspend line should apply AC power settings and so my echo default > /sys/module/pcie_aspm/parameters/policy ; line should not be needed. What gives? Am I understanding this correctly?

Maybe tlp suspend is not, in fact, applying AC settings just before suspending?

Expected behavior

While on battery power, before suspending, the line tlp suspend in /lib/systemd/system-sleep/tlp should cause PCIE_ASPM settings to be default as configured with PCIE_ASPM_ON_AC = default. After resuming (while still on battery power), the line tlp resume in the same file should cause PCIE_ASPM setting to be powersave as configured with PCIE_ASPM_ON_BAT = powersave. The latter seems(?) to be happening, but the former is not.

linrunner commented 11 months ago

Hi,

matter of fact I removed applying AC settings during suspend with 1.4 [1], because it mainly delayed the suspend by 1-2 seconds without significant benefit. The actual behavior during suspend is described in the official documentation [2].

Really funny is the manpage link you quote. tlp-sleep.service was removed with 1.3 and from then on the related manpage is neither included in TLP nor the Debian or Ubuntu package [3]. You can easily check by typing man tlp-sleep.service. I can't tell you why the Debian page shows this zombie.

Bottom line: I understand this issue as a request to add a workaround for your hardware/kernel problem. TLP is full of such workarounds :-(. But you're in luck, since I've already included [4] for a similar problem with AHCI Runtime PM, I'll add the solution for you as well.

[1] https://github.com/linrunner/TLP/blob/main/changelog#L170 [2] https://linrunner.de/tlp/introduction.html#event-driven-architecture [3] https://github.com/linrunner/TLP/blob/main/changelog#L394 [4] https://github.com/linrunner/TLP/commit/56bb6ce2a11ff3f2424be6452af73f7a1cabf158

linrunner commented 11 months ago

Btw: what I will not include is your sleep 2. If you absolutely need that, you have to help yourself by using PCIE_ASPM_ON_BAT="default". Sorry.

sagarbehere commented 11 months ago

@linrunner Thank you for that explanation. At least now I understand what is happening and why it is happening that way.

It is perfectly fine that you will not include sleep 2. It probably does not need to be there. I just added it there in case it was somehow possible that the laptop went into suspend before the echo default > /sys/module/pcie_aspm/parameters/policy actually takes effect. I thought adding a small delay would ensure that the setting always takes effect before the suspend happens.

Is it correct to believe that after I download and use the 1.6 release (whenever that happens), then I don't need to edit the /lib/systemd/system-sleep/tlp file any more, because version 1.6 will apply the necessary changes before suspending?

By the way, is there documentation for what the tlp suspend and tlp resume commands do in the file /lib/systemd/system-sleep/tlp (I assume those are commands)? In the tlp man page, I see commands like 'start', 'bat', 'ac' etc. but no 'suspend' and 'resume'.

linrunner commented 11 months ago

Is it correct to believe that after I download and use the 1.6 release (whenever that happens), then I don't need to edit the /lib/systemd/system-sleep/tlp file any more, because version 1.6 will apply the necessary changes before suspending?

Of course. Please stay tuned because I'll need you to test the feature on your system. What distro do you run?

tlp suspend and tlp resume are internal commands intentionally undocumented for users. If above explanation [2] doesn't suffice you'll have to look at the shell code https://github.com/linrunner/TLP/blob/main/tlp.in#L417

sagarbehere commented 11 months ago

Of course. Please stay tuned because I'll need you to test the feature on your system. What distro do you run?

I'd be happy to test the feature. I run Ubuntu 22.04 LTS with kernel 5.15.0-43 on a Macbook Air 6,2 aka "Early-2014"

linrunner commented 11 months ago

Here you are: https://download.linrunner.de/packages/

The change writes default to /sys/module/pcie_aspm/parameters/policy before suspend (regardless of what is configured for PCIE_ASPM_ON_AC)

Please test the suspend / resume cycle and post the output of

sudo tlp-stat

after resume. Via https://gist.github.com/ please.

sagarbehere commented 11 months ago

@linrunner Thank you. This seems to work as expected. The output of sudo tlp-stat is in this gist.

Here are a few things I had to do. Please let me know if I should not have done them.

  1. Downloaded the .deb files to a folder

  2. Removed the existing (version 1.5.0) tlp and tlp-rdw with sudo apt remove tlp tlp-rdw

  3. Installed the new deb files by going to the download folder in terminal and sudo dpkg -i *.deb

  4. Confirmed that tlp-stat -s then showed the installed version as TLP 1.6.0-alpha.0

  5. It seemed that my carefully crafted tlp settings were replaced by default settings. I saw a file named /etc/tlp.conf.dpkg-old which seemed to have the settings I previously had. So I did sudo cp /etc/tlp.conf.dpkg-old /etc/tlp.conf and then sudo systemctl restart tlp

  6. Separately, after doing Step 3. above tlpui would crash when I started it. To make it work, I had to copy over two files as follows (but maybe you don't care about tlpui)

    sudo cp /usr/lib/python3/dist-packages/tlpui/defaults/tlp-1_5.conf /usr/lib/python3/dist-packages/tlpui/defaults/tlp-1_6.conf
    sudo cp /usr/lib/python3/dist-packages/tlpui/configschema/1_5.json /usr/lib/python3/dist-packages/tlpui/configschema/1_6.json
  7. After I did all of this, I confirmed that PCIE_ASPM_ON_BAT = powersave by doing cat /sys/module/pcie_aspm/parameters/policyand seeing the output as default performance [powersave] powersupersave

  8. Then I suspended the laptop, waited for some time, and resumed it. Everything came up as expected and there were no NVMe disk related errors. Hurray!

  9. After resuming cat /sys/module/pcie_aspm/parameters/policy continued to show output as default performance [powersave] powersupersave which is how it should be with my tlp settings.

Please let me know if you need anything else :)

linrunner commented 11 months ago

Thank you very much for testing.

  1. It seemed that my carefully crafted tlp settings were replaced by default settings.

This is not a feature of TLP or the package, but simply dpkg's way of dealing with configuration files in /etc. Whether the configuration is kept or not depends on the answer you give to the corresponding question of dpkg during installation.

You may as well move your individual settings lines to a file in /etc/tlp.d/. After that you will no longer be bothered with questions.

linrunner commented 11 months ago

Hi @sagarbehere : TLP 1.6 Beta 1 is out and contains the fix -> https://github.com/linrunner/TLP/issues/700