pop-os / cosmic-comp

Compositor for the COSMIC desktop environment
GNU General Public License v3.0
487 stars 89 forks source link

NVIDIA instability with fullscreen apps #419

Closed Tipcat-98 closed 2 months ago

Tipcat-98 commented 7 months ago

I'm not sure if this is a big priority at the moment but since there are no other issues like it reported yet, I figured I should make one. The monitor output can freeze, with the image stuck on the whatever was output at the time of the freeze, whilst giving the error below. The only way to fix this seems to be by restarting the computer.

This mainly happens when you try to start a xwayland application in fullscreen mode, or attempting to swap to fullscreen mode. Starting in windowed borderless or windowed is usually safe, though continuously resizing the window with super + shift + arrow keys can lead to the same error as seen below rarely. I haven't observed this behaviour in any native wayland application yet but I don't have too many of those to test. Mainly, it's firefox and emulation software that I run natively on wayland.

The moment of the freeze

Apr 12 16:12:19 tipcat cosmic-comp[2426]: using legacy fbadd Apr 12 16:12:19 tipcat kernel: nvidia-modeset: ERROR: Invalid request parameters, planePitch or rmObjectSizeInBytes, passed during surface registration Apr 12 16:12:19 tipcat kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset. Error code: -22

System Info: Cosmic: Updated to latest git as of today 14:30 (CEST) OS: Pop!_OS 22.04 LTS x86_64 Kernel: 6.8.0-76060800daily20240311-g DE: GNOME 42.5 CPU: Intel i7-6700 (8) @ 4.000GHz GPU: NVIDIA GeForce RTX 2060 Rev. A GPU Driver: 550.67 Memory: 32024MiB

jacobgkau commented 7 months ago

Sounds like this may be related to explicit sync. Explicit sync support in upstream XWayland was only merged three days ago: https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/967

cosmic-comp will need to support explicit sync (Mutter had an MR a few weeks ago, and KWin merged theirs two days ago). An updated NVIDIA driver will also be required-- I've heard the 555 beta coming next month will add support, so hopefully that line will reach non-beta status before COSMIC does (if not, I'm tempted to say we'd ship the NVIDIA beta rather than shipping COSMIC Epoch to general users without that fix, but the decision would need to be made later).

ids1024 commented 7 months ago

Yeah. https://github.com/pop-os/cosmic-comp/pull/411 has an implementation of the explicit sync protocol; with some things that still need to be fixed before merging it. When the 555 beta driver is released next month we can see if that helps with some XWayland issues. This also requires unreleased XWayland changes.

However... that should only help with things involving rendering corruption. Not a freeze and error like this.

Invalid request parameters, planePitch or rmObjectSizeInBytes, passed during surface registration

Is the client here running on the Nvidia GPU, or the Intel GPU? Is it somehow trying to do direct scanout of an Intel buffer on the Nvidia card? That isn't expected to work, and will probably result in pitch alignment issues (among other things) which might be what this is about.

Tipcat-98 commented 7 months ago

My iGPU should be disabled, so I don't think it's doing anything with it.

Tipcat-98 commented 7 months ago

Some additional info that maybe should've been in the main post. It doesn't make the computer inoperable if you have multiple monitors it simply freezes the image on the monitor that the app was open on.

Even with one monitor you can could still execute commands, just blindly.

ids1024 commented 7 months ago

At least for OpenGL clients, glxinfo | grep "renderer string" should show what graphics card is being used under X.

Not sure what the problem would be if it's only using the Nvidia GPU. But if the driver complains about planePitch and it happens with fullscreen windows or sometimes when resizing, that does sound like something related to direct scanout.

Tipcat-98 commented 7 months ago

At least for OpenGL clients, glxinfo | grep "renderer string" should show what graphics card is being used under X.

Not sure what the problem would be if it's only using the Nvidia GPU. But if the driver complains about planePitch and it happens with fullscreen windows or sometimes when resizing, that does sound like something related to direct scanout.

output from glxinfo | grep "renderer string"

OpenGL renderer string: NVIDIA GeForce RTX 2060/PCIe/SSE2

Tipcat-98 commented 7 months ago

Can confirm that my Intel GPU is disabled in bios.

Tipcat-98 commented 5 months ago

I've tested Terraria with the native SDL wayland driver and the same instability is present there, renaming to account for this.

garcia-s commented 5 months ago

Hello, I do have the same exact thing. I was about to report it, thank god I searched before.

Funny thing is, it only happens when I full screen yt videos, which I almost never do. Everything else works extremely well.

PC Info:

CPU: AMD Ryzen 3200g OS: Fedora 40 Kernel: 6.8.11-300.fc40 GPU: NVIDIA GTX 1660Ti DRIVER: NVIDIA 550.78 provided by akmod-nvidia-open

The AMD ryzen APU is also disabled on bios.

here are the logs from journalctl.

jun 04 18:06:58 fedora cosmic-panel[78714]: com.system76.CosmicAppList: Error getting config: com.system76.CosmicAppList [GetKey("enable_drag_source", Os { code: 2, kind: NotFound, message: "No such file or directory" })]
jun 04 18:06:58 fedora cosmic-panel[78714]: com.system76.CosmicAppList: Error getting config: com.system76.CosmicAppList [GetKey("enable_drag_source", Os { code: 2, kind: NotFound, message: "No such file or directory" })]
jun 04 18:07:57 fedora sssd_kcm[46405]: Shutting down (status = 0)
jun 04 18:07:57 fedora systemd[1]: sssd-kcm.service: Deactivated successfully.
jun 04 18:07:57 fedora systemd[1]: sssd-kcm.service: Consumed 1.249s CPU time.
jun 04 18:07:57 fedora audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel 
msg='unit=sssd-kcm comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
jun 04 18:09:14 fedora PackageKit[77690]: daemon quit
jun 04 18:09:14 fedora systemd[1]: packagekit.service: Deactivated successfully.
jun 04 18:09:14 fedora audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=packagekit comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
jun 04 18:09:14 fedora systemd[1]: packagekit.service: Consumed 2.414s CPU time.
jun 04 18:09:25 fedora cosmic-comp[78610]: using legacy fbadd
jun 04 18:09:25 fedora kernel: nvidia-modeset: ERROR: Invalid request parameters, planePitch or rmObjectSizeInBytes, passed during surface registration
jun 04 18:09:25 fedora kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
gmpinder commented 3 months ago

I'm experiencing the same issue on Fedora Atomic 40 using the COPR build. I've got an Nvidia 3080 Ti and I'm unable to play any fullscreen game or fullscreen any youtube video. I get the exact same symptom where the picture is frozen on screen and I can't see anything, but I can interact. I end up having to restart to get things working again.

Tipcat-98 commented 2 months ago

Since Smithay/smithay#1501 this is fixed for me.