ublue-os / hwe

Fedora variants with support for ASUS devices, Nvidia devices, and Surface laptops
https://universal-blue.org/images/hwe
Apache License 2.0
164 stars 35 forks source link

Possibility to add supergfxctl (or something similar) to the nvidia image? #61

Closed ludvigng closed 1 year ago

ludvigng commented 1 year ago

Hello!

Would it be a good idea to layer the supergfxctl package to the nvidia image? Supergfxctl was developed by the asus-linux project as a way to disable the nvidia device in dual-GPU laptops, to achieve better power savings than what's usually possible only using nvidias own methods. This is especially relevant on older nvidia cards which lack RTD3 available in newer GPUs, and on my machine it means the difference between 6 hours of battery and 10 hours of battery. Compared to other similar projects, this works without rebooting the machine, logging out and in again is usually enough. Despite being developed as part of the asus-linux project, it's supposed to work on most computers regardless of manufacturer.

Supergfxctl is currently not available in the fedora repos, but there's a copr available: https://copr.fedorainfracloud.org/coprs/lukenukem/asus-linux/

Other similar projects:

Personally, I think including one of these tools could be very beneficial for laptop users, especially those with pre-Turing hardware. However, most of them not being available on the official repos adds some issues in whether ublue wants to include unofficial packages from unverified sources such as copr, so I understand if this might be outside of the project aims for now. :)

xynydev commented 1 year ago

Could this affect non-dual-gpu users? If not, this could be a good idea. From a quick look at the repos, EnvyControl seems like the best option.

ludvigng commented 1 year ago

I don't think it should cause any problems with non-dual GPU setups. A quick skim of envycontrol seems to suggest it checks for the presence of other GPUs first (line 266 in envycontrol.py), but that looks like it's only used when switching to the dedicated nvidia mode? I don't know what happens if you manually disable the nvidia GPU when running on a single GPU.

It also seem to want to regenerate the initramfs (line 342). Will that cause any trouble with the way silverblue is setup maybe?

Either way, I think it'd have to be triggered by the user to mess anything up, and at least with supergfxctl you can append supergfxd.mode= to the kernel commandline to override it into a certain mode on reboot. In case the user disables the nvidia GPU on a single user system they could get the GPU back to default that way, I suppose. Just having either package installed on the system shouldn't affect anything AFAICT.

castrojo commented 1 year ago

Thanks for pointing this out! (Thinking out loud, but wouldn't it be neat if asus-linux.org had images .... )

joshua-stone commented 1 year ago

@ludvigng We've started a build which includes supergfxctl:

https://github.com/ublue-os/nvidia/pull/77

You can test this change by rebasing with the following:

rpm-ostree rebase ostree-unverified-registry:ghcr.io/ublue-os/silverblue-nvidia:pr-77-37-525
lilkidsuave commented 1 year ago

@ludvigng We've started a build which includes supergfxctl:

77

You can test this change by rebasing with the following:

rpm-ostree rebase ostree-unverified-registry:ghcr.io/ublue-os/silverblue-nvidia:pr-77-37-525

This command had icon issues and for my laptop with a mux, doesn't work as I would hope (Integrated raises power usage)

bsherman commented 1 year ago

You can test this change by rebasing with the following:

rpm-ostree rebase ostree-unverified-registry:ghcr.io/ublue-os/silverblue-nvidia:pr-77-37-525

A little more feedback: testing on my Dell XPS 15, muxed nvidia 3050ti + iGPU...

supergfxd/supergfxctl seems to work as expected.

Daemon is not pre-enabled (probably shouldn't be), but starts up with systemctl --enable now supergfxd.

I installed supergfxctl-gex using Extension Manager and (after starting supergfxd service) it happily let me switch between integrated/hybrid modes, requiring only a logout/login, which is the behavior I expect on a muxless machine like this.

I didn't test power differences on battery, but I believe supergfx was behaving as expected based on my observations and previous experience.

I believe actual power savings (or not) vary on specific hardware.

bsherman commented 1 year ago

I did a reboot and ran on battery for a few minutes in each mode, integrated/hybrid.

Did rough eyeball monitoring with: watch cat /sys/class/power_supply/BAT0/voltage_now

I don't see much difference in power usage, but with the newer power management defaults for ampere, that doesn't surprise me. I'd expect more savings on an older gpu. I have access to an older XPS 15 with 1650/turing gpu... Maybe it would have different results.

Long and short of it, though... adding supergfxctl doesn't hurt me as a user or builder of downstream custom images... especially if it's disabled by default. It effectively does nothing when disabled or in hybrid mode.

bsherman commented 1 year ago

On a machine with no nvidia card at all (in this case a desktop PC with amdgpu):

$ sudo systemctl status supergfxd.service 
● supergfxd.service - SUPERGFX
     Loaded: loaded (/usr/lib/systemd/system/supergfxd.service; enabled; preset: disabled)
     Active: active (running) since Mon 2023-03-27 13:46:53 CDT; 13s ago
   Main PID: 3375 (supergfxd)
      Tasks: 17 (limit: 38149)
     Memory: 5.4M
        CPU: 9ms
     CGroup: /system.slice/supergfxd.service
             └─3375 /usr/bin/supergfxd

Mar 27 13:46:58 anduril supergfxd[3375]: WARN: get_runtime_status: Could not find dGPU
--- snip ---
$ sudo supergfxctl -S
Graphics mode change error.
Please check `journalctl -b -u supergfxd`, and `systemctl status supergfxd`
GFX fail: get_runtime_status: Could not find dGPU
bsherman commented 1 year ago

A bit more testing... I can get an nvidia 1660ti running in an eGPU on an intel igpu-only laptop...

Supergfx runs there... similar to my laptop with a 3050ti dGPU, it detects possible modes of "Integrated" (iGPU only) or "Hybrid" and it appears to do nothing, unless i choose Integrated mode.

Again, it does nothing if the service is not enabled.

[ ~]$ sudo systemctl status supergfxd
● supergfxd.service - SUPERGFX
     Loaded: loaded (/usr/lib/systemd/system/supergfxd.service; enabled; preset: disabled)
     Active: active (running) since Mon 2023-03-27 14:37:48 CDT; 2min 45s ago
   Main PID: 879 (supergfxd)
      Tasks: 9 (limit: 18784)
     Memory: 5.3M
        CPU: 1.739s
     CGroup: /system.slice/supergfxd.service
             └─879 /usr/bin/supergfxd

Mar 27 14:37:48 sagecactus supergfxd[879]: INFO: do_rescan: Rescanning PCI bus
Mar 27 14:37:50 sagecactus supergfxd[1166]: Job for nvidia-powerd.service failed because the control process exited with error code.
Mar 27 14:37:50 sagecactus supergfxd[1166]: See "systemctl status nvidia-powerd.service" and "journalctl -xeu nvidia-powerd.service" for de>
Mar 27 14:37:50 sagecactus supergfxd[879]: WARN: true nvidia-powerd.service failed: Some(1)
Mar 27 14:37:50 sagecactus supergfxd[879]: INFO: set_runtime_pm: Set PM on "/sys/devices/pci0000:00/0000:00:1d.0/0000:04:00.0/0000:05:01.0/>
Mar 27 14:37:50 sagecactus supergfxd[879]: INFO: set_runtime_pm: Set PM on "/sys/devices/pci0000:00/0000:00:1d.0/0000:04:00.0/0000:05:01.0/>
Mar 27 14:37:50 sagecactus supergfxd[879]: INFO: set_runtime_pm: Set PM on "/sys/devices/pci0000:00/0000:00:1d.0/0000:04:00.0/0000:05:01.0/>
Mar 27 14:37:50 sagecactus supergfxd[879]: INFO: set_runtime_pm: Set PM on "/sys/devices/pci0000:00/0000:00:1d.0/0000:04:00.0/0000:05:01.0/>
Mar 27 14:37:50 sagecactus supergfxd[879]: INFO: reload: Reloaded gfx mode: Hybrid
Mar 27 14:37:51 sagecactus supergfxd[879]: INFO: Notify: dGPU status = Active
[ ~]$ supergfxctl -s
[Integrated, Hybrid]
[ ~]$ lspci -tv
-[0000:00]-+-00.0  Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
           +-02.0  Intel Corporation UHD Graphics 620
           +-04.0  Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem
           +-08.0  Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
           +-14.0  Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller
           +-14.2  Intel Corporation Sunrise Point-LP Thermal subsystem
           +-16.0  Intel Corporation Sunrise Point-LP CSME HECI #1
           +-1c.0-[01]--
           +-1c.6-[03]----00.0  Intel Corporation Wireless 8265 / 8275
           +-1d.0-[04-3c]----00.0-[05-3c]--+-00.0-[06]----00.0  Intel Corporation JHL6240 Thunderbolt 3 NHI (Low Power) [Alpine Ridge LP 2016]
           |                               +-01.0-[07-3b]----00.0-[08-3b]----01.0-[09]--+-00.0  NVIDIA Corporation TU116 [GeForce GTX 1660 Ti]
           |                               |                                            +-00.1  NVIDIA Corporation TU116 High Definition Audio Controller
           |                               |                                            +-00.2  NVIDIA Corporation TU116 USB 3.1 Host Controller
           |                               |                                            \-00.3  NVIDIA Corporation TU116 USB Type-C UCSI Controller
           |                               \-02.0-[3c]----00.0  Intel Corporation JHL6240 Thunderbolt 3 USB 3.1 Controller (Low Power) [Alpine Ridge LP 2016]
           +-1d.2-[3d]----00.0  Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
           +-1f.0  Intel Corporation Sunrise Point LPC Controller/eSPI Controller
           +-1f.2  Intel Corporation Sunrise Point-LP PMC
           +-1f.3  Intel Corporation Sunrise Point-LP HD Audio
           +-1f.4  Intel Corporation Sunrise Point-LP SMBus
           \-1f.6  Intel Corporation Ethernet Connection (4) I219-V
joshua-stone commented 1 year ago

Please note that for anyone testing this with Secure Boot enabled and discovering that the nvidia modules aren't loading, you will have to enroll the test secure boot key:

$ sudo mokutil --import /etc/pki/akmods/certs/akmods-nvidia.der

Disabling Secure Boot temporarily is also an option.

bsherman commented 1 year ago

Tests from an XPS 15 with nvidia 1650 turing...

normal status after manual enable. nvidia-powerd fails "by nvidia design" on this gpu generation, but we see it does ensure power management is enabled on the GPU in hybrid mode, the default.

$ sudo systemctl status supergfxd.service 
● supergfxd.service - SUPERGFX
     Loaded: loaded (/usr/lib/systemd/system/supergfxd.service; enabled; preset: disabled)
     Active: active (running) since Mon 2023-03-27 15:21:16 CDT; 2min 38s ago
   Main PID: 2656 (supergfxd)
      Tasks: 13 (limit: 13769)
     Memory: 5.5M
        CPU: 52ms
     CGroup: /system.slice/supergfxd.service
             └─2656 /usr/bin/supergfxd

Mar 27 15:21:19 hawkeyeninja supergfxd[2656]: INFO: Found dgpu 10DE:1F91 at "0000:01:00.0"
Mar 27 15:21:19 hawkeyeninja supergfxd[2656]: INFO: supergfxd.mode not set, ignoring
Mar 27 15:21:19 hawkeyeninja supergfxd[2656]: INFO: create_modprobe_conf: writing /etc/modprobe.d/supergfxd.conf
Mar 27 15:21:19 hawkeyeninja supergfxd[2656]: INFO: do_rescan: Rescanning PCI bus
Mar 27 15:21:19 hawkeyeninja supergfxd[2680]: Job for nvidia-powerd.service failed because the control process exited with error code.
Mar 27 15:21:19 hawkeyeninja supergfxd[2680]: See "systemctl status nvidia-powerd.service" and "journalctl -xeu nvidia-powerd.service" for >
Mar 27 15:21:19 hawkeyeninja supergfxd[2656]: WARN: true nvidia-powerd.service failed: Some(1)
Mar 27 15:21:19 hawkeyeninja supergfxd[2656]: INFO: set_runtime_pm: Set PM on "/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0" to Auto
Mar 27 15:21:19 hawkeyeninja supergfxd[2656]: INFO: reload: Reloaded gfx mode: Hybrid
Mar 27 15:21:19 hawkeyeninja supergfxd[2656]: INFO: Notify: dGPU status = Active

can change modes between integrated and hybrid:

$ supergfxctl -m integrated
Graphics mode changed to integrated. Required user action is: logout

I can report that, on this machine, with mode "integrated" I see a lower battery discharge rate, in general... close to 1W lower average.

So seems good for those who can benefit, or desire to try.

ludvigng commented 1 year ago

Oh wow, thanks @joshua-stone for providing the testing image, and @lilkidsuave and @bsherman for testing this on a couple of different machines!

Unfortunately I'm currently not able to test this on real hardware yet, I've just started working on my thesis so I don't want to mess too much with my working computer right now. But like @bsherman showed, it should have no effect just being installed without being explicitly enabled by the user, since only a subset of users are likely to benefit from it (only people with dual GPUs which are pre-Ampere I think).

@lilkidsuave have you tried any of the other alternatives mentioned at the top? If they work for you, then those might be better from a general standpoint.

bsherman commented 1 year ago

I can comment on some of the other alternatives mentioned, and I discussed this with @joshua-stone in discord...

bumblebee and optimus-manager are pretty much non-options, either for not supporting Fedora or being outdated.

EnvyControl is close, and I've used it myself on Fedora Workstation. But in its current state, it's not Silverblue compatible.

I'm also a bit more confident in the safety checks provided by supergfxctl... it doesn't do anything if it doesn't believe the system is compatible with it's capabilities.

Given supergfxd is disabled by default when installed... I've given my support as a reasonable tool to pre-install on ublue-os/nvidia images. Users would have to option to use it for disabling onboard nvidia dGPU, but they don't have to do so. In fact, EnvyControl could be added by a user with a Silverblue-compatible patch, and this would not conflict with a disabled supergfxd. So, seems like a decent path forward.

lilkidsuave commented 1 year ago

So I managed to get it to work and not draw 60w, but now I'm stuck in integrated. logout gets stuck on a black screen, so I can't change modes. I tried restarting, but the mode was still on integrated

lilkidsuave commented 1 year ago

Oh wow, thanks @joshua-stone for providing the testing image, and @lilkidsuave and @bsherman for testing this on a couple of different machines!

Unfortunately I'm currently not able to test this on real hardware yet, I've just started working on my thesis so I don't want to mess too much with my working computer right now. But like @bsherman showed, it should have no effect just being installed without being explicitly enabled by the user, since only a subset of users are likely to benefit from it (only people with dual GPUs which are pre-Ampere I think).

@lilkidsuave have you tried any of the other alternatives mentioned at the top? If they work for you, then those might be better from a general standpoint.

Envycontrol didn't work for me, would do the same as this used to.

lilkidsuave commented 1 year ago

So I managed to get it to work and not draw 60w, but now I'm stuck in integrated. logout gets stuck on a black screen, so I can't change modes. I tried restarting, but the mode was still on integrated

I found the ticket https://gitlab.com/asus-linux/supergfxctl/-/issues/70

ludvigng commented 1 year ago

So I managed to get it to work and not draw 60w, but now I'm stuck in integrated. logout gets stuck on a black screen, so I can't change modes. I tried restarting, but the mode was still on integrated

You should be able to use CTRL+ALT+F3 to open a text console, login and use the command supergfxctl to change between dGPU/iGPU mode. I think supergfxctl -m hybrid is what you're looking for.

joshua-stone commented 1 year ago

Are the changes so far sufficient for merging despite upstream issues memntioned, or are there some tweaks that should be made?

lilkidsuave commented 1 year ago

Are the changes so far sufficient for merging despite upstream issues memntioned, or are there some tweaks that should be made?

bug fixed probably. but it is functional

lilkidsuave commented 1 year ago

The gui is there the program works mostly should be ready soon

lilkidsuave commented 1 year ago

Imma have to check that, maybe that is what is causing the black screen issues

joshua-stone commented 1 year ago

It looks like there's a Gnome extension available, however I'm hesitant to include it when the maintainer mentions broken Gnome 44 compatibility. I think I'll leave it out as an optional component for users to install.

https://extensions.gnome.org/extension/5344/supergfxctl-gex/

ludvigng commented 1 year ago

It looks like there's a Gnome extension available, however I'm hesitant to include it when the maintainer mentions broken Gnome 44 compatibility. I think I'll leave it out as an optional component for users to install.

https://extensions.gnome.org/extension/5344/supergfxctl-gex/

That's fair, since you can still use the CLI interface regardless.

joshua-stone commented 1 year ago

I've updated the README to reference the extension. In the future I'll investigate including it once there's Gnome 44 support so we can include the extension in both F37 and F38.

castrojo commented 1 year ago

There's a tool in pip called gnome-extensions-cli that lets you programatically install and enable extensions, you could use that as a convenience shortcut, here's how I'm using it: https://github.com/ublue-os/bluefin/blob/44bfa32e8f8a788cee035e772d179b4b426e143f/etc/justfile#L26

bsherman commented 1 year ago

It looks like there's a Gnome extension available

And a KDE Plasmoid :-)

https://github.com/Jhyub/supergfxctl-plasmoid/blob/master/README.md

But I agree with not adding either of them yet. Aside from stability of the gnome extension/plasmoid... maybe we'll get some feedback on the base tool before adding them.

lilkidsuave commented 1 year ago

It looks like there's a Gnome extension available

And a KDE Plasmoid :-)

https://github.com/Jhyub/supergfxctl-plasmoid/blob/master/README.md

But I agree with not adding either of them yet. Aside from stability of the gnome extension/plasmoid... maybe we'll get some feedback on the base tool before adding them.

thats the difference i was looking for(i couldnt figure out the difference between envycontrol and supergfxctl)

after this is all said and done, i can go back to kde(im not the biggest fan of gnome)

lilkidsuave commented 1 year ago

envycontrol only has a gnome gui

relbus22 commented 1 year ago

Hi guys, speaking of GPUs and power management, there is also System76's power management tool: https://www.reddit.com/r/Fedora/comments/npon3m/system_76_power_management_in_fedora_34/ if things don't work out with supergfxctl.

lilkidsuave commented 1 year ago

Hi guys, speaking of GPUs and power management, there is also System76's power management tool: https://www.reddit.com/r/Fedora/comments/npon3m/system_76_power_management_in_fedora_34/ if things don't work out with supergfxctl.

I expect it to go fine. The System76 power management does a lot more than gpus tho, so you would need to compare vs the gnome power management. Looks interesting tho.

ludvigng commented 1 year ago

Hi guys, speaking of GPUs and power management, there is also System76's power management tool: https://www.reddit.com/r/Fedora/comments/npon3m/system_76_power_management_in_fedora_34/ if things don't work out with supergfxctl.

AFAIK this conflicts with power-profiles-daemon, which will mean the standard gnome power profiles won't work.

lilkidsuave commented 1 year ago

Hi guys, speaking of GPUs and power management, there is also System76's power management tool: https://www.reddit.com/r/Fedora/comments/npon3m/system_76_power_management_in_fedora_34/ if things don't work out with supergfxctl.

AFAIK this conflicts with power-profiles-daemon, which will mean the standard gnome power profiles won't work.

You could just replace power profiles daemon

ludvigng commented 1 year ago

Hi guys, speaking of GPUs and power management, there is also System76's power management tool: https://www.reddit.com/r/Fedora/comments/npon3m/system_76_power_management_in_fedora_34/ if things don't work out with supergfxctl.

AFAIK this conflicts with power-profiles-daemon, which will mean the standard gnome power profiles won't work.

You could just replace power profiles daemon

Of course you can, but then you're starting to replace core gnome functionality. It's a bigger step than just adding power management for older nvidia cards.

lilkidsuave commented 1 year ago

Agreed