redhat-performance / tuned

Tuning Profile Delivery Mechanism for Linux
GNU General Public License v2.0
750 stars 171 forks source link

Nvidia not suspending when using tuned #649

Open wangwillian0 opened 1 week ago

wangwillian0 commented 1 week ago

Update: The issue seems to be the laptop's nvidia card, which never suspends when using tuned. When using PPD, /sys/bus/pci/devices/0000:01:00.0/power/runtime_status will report the card as suspended just seconds after boot.

I'm using Fedora 40, 6.9.5-200.fc40.x86_64.


I'm trying the tuned-ppd compatibility layer (tuned 2.23.0) and was expecting for the profiles to have similar effect compared to power-profiles-daemon, but the power consumption is always worse than the default power-saver in power-profiles-daemon.

I can get around 10W with the power-saver mode in PPD if I wait 3-5min in idle, but consistently +15W with powersave mode in tuned. I also tried using custom profiles to get as closest as possible to what I thought PPD was doing, but I didn't have any success:

[main]
summary=Attempt to imitate power-profiles-daemon power-saver mode

[cpu]
governor=powersave
energy_perf_bias=power
energy_performance_preference=power

[acpi]
platform_profile=low-power

I'm measuring power with /sys/class/power_supply/BAT0/power_now after a reboot. The cpu settings and acpi seems to be the same in PPD and tuned checking /sys/firmware/acpi/platform_profile and /sys/devices/system/cpu/cpu0/cpufreq/*.

yarda commented 1 week ago

Interesting, we merged recently some video plugin updates that should add ppd functionality related to graphics cards, but it seems in your case it's not enough. Do you know what to set to correctly suspend it?

wangwillian0 commented 1 week ago

The issue was actually with the audio plugin. Writing 0 to /sys/module/snd_hda_intel/parameters/power_save_controller is not compatible with Nvidia laptops (TLP has more documentation about this). In my case specifically, it breaks the suspension functionality until a reboot, even if I set it back to the original value of 1. This might be the reason my test was incorrect (or I was just testing it incorrectly). I also tested on an Ubuntu laptop, and the file value was automatically changing back to 1; I'm not sure what was causing that.

Because it will likely draw more power or not work at all on laptops with Nvidia GPUs, I think this should be considered just as dangerous as USB auto-suspension and, therefore, disabled by default. I have created a pull request at https://github.com/redhat-performance/tuned/pull/650.