pop-os / cosmic-comp

Compositor for the COSMIC desktop environment
GNU General Public License v3.0
481 stars 87 forks source link

Incorrect GPU selection for session #913

Open flukejones opened 2 weeks ago

flukejones commented 2 weeks ago

I have a new laptop with AMD and NVIDIA GPU. The AMD gpu is the iGPU. However there are a few issues:

  1. The Nvidia GPU is before the AMD on the PCI bus

    65:00.0 VGA compatible controller: NVIDIA Corporation AD106M [GeForce RTX 4070 Max-Q / Mobile] (rev a1)
    66:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Strix [Radeon 880M / 890M] (rev c1)
  2. The Nvidia GPU is marked as boot_vga

  3. This causes the dGPU to be in consistent use and never suspend. It will suspend if switched to a TTY and desktops stopped.

The issue isn't solely cosmic-comp. Gnome is affected also. However if I work around the issue with nvidia_drm.modeset=0 this prevents gnome grabbing the card, but makes cosmic crash as it tries to use card0 with now doesn't have DRM ability.

Related issue on drm/amd

Quackdoc commented 2 weeks ago

temporary fix should be COSMIC_RENDER_DEVICE=/dev/dri/renderD1## env var? I believe COSMIC_RENDER_AUTO_ASSIGN=y may also be warranted.

flukejones commented 2 weeks ago

Unfortunately that does not help. Still selects incorrect gpu

superm1 commented 1 week ago

Since the same thing came up on mutter I did have a proposal there for them to better pick things. Maybe cosmic can do something similar.

superm1 commented 1 week ago

FWIW in mutter there was a suggestion of a udev rule to automatically tag the primary. I think something similar could work in cosmic.

flukejones commented 1 week ago

I can confirm that using the udev rule changes as @superm1 suggests in the link above does indeed fix the issues I've experienced in Gnome.

I also discovered that Nvidia 560 driver was causing further issues that maybe masked some things adversely, so I will retest cosmic with 550 driver.

flukejones commented 1 week ago

Umm, yeah.. Have to start cosmic with COSMIC_RENDER_DEVICE=/dev/dri/renderD128 cosmic-session and with nvidia driver 550 the dgpu suspends. But there are too many things going on for me to adequately track while also trying to work. Some notes:

It seems like a whole lot of Linux land is built on certain assumptions that this laptop has kicked the door in on.

superm1 commented 1 week ago

Umm, yeah.. Have to start cosmic with COSMIC_RENDER_DEVICE=/dev/dri/renderD128 cosmic-session

I guess if a similar approach to Gnome is used you would need to come up with a way to associate the render device to the PCI ID and the DRM card with eDP connected to the same PCI ID to decide which render node to use.

hgaiser commented 1 week ago

I'm having a similar issue on my laptop, however my NVIDIA dGPU is not marked as boot_vga. I get the following:

$ fuser -v /dev/dri/render*
                     USER        PID ACCESS COMMAND
/dev/dri/renderD128: hgaiser   26880 F...m Xwayland
                     hgaiser   27064 F...m cosmic-panel
                     hgaiser   27068 F...m cosmic-workspac
                     hgaiser   27079 F...m xdg-desktop-por
                     hgaiser   27184 F...m cosmic-ext-appl
                     hgaiser   27547 F...m wezterm-gui
                     hgaiser   27805 F...m firefox
/dev/dri/renderD129: hgaiser   26865 F.... cosmic-comp

Interestingly there are other cosmic binaries that correctly use the AMD GPU, only cosmic-comp uses the NVIDIA dGPU.

I added export COSMIC_RENDER_DEVICE=/dev/dri/renderD128 to /usr/bin/start-cosmic but it doesn't seem to change anything.

@flukejones where did you add that variable? Or did you run cosmic from commandline?

EDIT: Interestingly, nvtop does claim cosmic-comp runs on the AMD iGPU device, not the NVIDIA device.

Drakulix commented 1 week ago

I'm having a similar issue on my laptop, however my NVIDIA dGPU is not marked as boot_vga. I get the following:

$ fuser -v /dev/dri/render*
                     USER        PID ACCESS COMMAND
/dev/dri/renderD128: hgaiser   26880 F...m Xwayland
                     hgaiser   27064 F...m cosmic-panel
                     hgaiser   27068 F...m cosmic-workspac
                     hgaiser   27079 F...m xdg-desktop-por
                     hgaiser   27184 F...m cosmic-ext-appl
                     hgaiser   27547 F...m wezterm-gui
                     hgaiser   27805 F...m firefox
/dev/dri/renderD129: hgaiser   26865 F.... cosmic-comp

This shouldn't be an issue. cosmic-comp does nothing but holding an open drm-device (which we need to evaluate if any outputs are connected to it), but it doesn't initialize any resources, that keep the GPU alive, if no applications access it and no output is connected to it.

hgaiser commented 1 week ago

Thanks for the quick reply, that makes sense. Do you have any tips on how to debug why my dGPU doesn't go idle when I'm only using the eDP output (laptop screen)?

Drakulix commented 6 days ago

Thanks for the quick reply, that makes sense. Do you have any tips on how to debug why my dGPU doesn't go idle when I'm only using the eDP output (laptop screen)?

That depends on your dGPU and specific system. Since you seem to be using an nVidia GPU, so your options are quite limited, given the driver is a black box. All you have to go by is this bit of documentation: http://us.download.nvidia.com/XFree86/Linux-x86_64/560.35.03/README/dynamicpowermanagement.html

Some systems might also have buggy firmware, e.g. I have a personal machine that will leave the dGPU hanging in an active state, if ever during the current boot an external output was connected to it. If I restart it afterwards it powers down the GPU as expected. I figured this out by observation, as there isn't much else you can do on a closed source firmware system.