pop-os / nvidia-graphics-drivers

Pop!_OS NVIDIA Graphics Drivers
134 stars 7 forks source link

NVIDIA 555.42.02 BETA #205

Open mmstick opened 1 month ago

mmstick commented 1 month ago

https://www.nvidia.co.uk/download/driverResults.aspx/224793/en-us

ids1024 commented 1 month ago

We'll also need to update https://github.com/pop-os/egl-wayland to make use of explicit sync for Wayland EGL, presumably. I assume Nvidia will tag a release of egl-wayland when the stable 555 driver is released.

I don't seem to see Nvidia's driver initializing an EGL display correctly for X11 or Wayland? Might be something with my Pop install since I have done some testing of different driver configurations here.

Is it working for others?

mmstick commented 1 month ago

egl-wayland updates: https://github.com/pop-os/egl-wayland/pull/3 They haven't tagged a new release yet, but there is a patch for explicit sync.

ids1024 commented 1 month ago

Looks like I was missing __ NV_PRIME_RENDER_OFFLOAD=1. Without that the Nvidia Wayland EGL backend won't initialize. (So it falls back to mesa, or fails if __EGL_VENDOR_LIBRARY_FILENAMES=/usr/share/glvnd/egl_vendor.d/10_nvidia.json is set).

Maybe Nvidia doesn't support EGL for X11. (Instead of glx)...

XV-02 commented 1 month ago

I'm not positive, as I cannot recreate it, but I did see an issue where I could "scroll" an external display. When I reverted the driver to the current release, I no longer had the problem. When I re-added 555, I couldn't cause it again.

https://github.com/pop-os/nvidia-graphics-drivers/assets/98765732/6d6d09fd-0290-4b47-a505-49c10d95c558

To describe the video (In case it's not fully clear):

  1. The displays are aligned so that the external display is top-aligned, and shares a screen edge with the right-hand side of the internal display. The external is considerably smaller in resolution than the internal.
  2. I start with my cursor on the external display.
  3. I move the cursor to the bottom of the display, and keep moving. The display "scrolls" down. I move the cursor upwards, and again, the display "scrolls" up.
  4. I move the cursor to the internal (larger) display.
  5. I move the cursor down. Once the cursor meets/passes the point where the bottom of the external is aligned with the internal, the external scrolls down.
  6. I move the cursor up in the internal display. The cursor stops at the level of the top-most displayed pixels on the external. The external does not scroll. I cannot move the cursor further up.
  7. I move the cursor back to the external, and move the cursor up. The display scrolls again. I can not access the top of the internal display again.

I'm currently trying to fully eliminate any other packages.

leviport commented 3 weeks ago

Everything seems to be checking out with this driver so far, besides one potential regression. With a 10-series desktop card, some machines have trouble resuming from suspend after being left in suspend for a while. We recreated this in this 555 version, and we are currently trying to recreate it in the released 550 driver to determine whether it's a regression or not.

XV-02 commented 3 weeks ago

It looks like the failure to resume for desktops with 10 series cards is - in fact - a regression.

XV-02 commented 3 weeks ago

I've pulled (and truncated) logs from the resume process. There's a kernel bug that crops up. 10xx_resume_555_fail.log The kernel bug specifically calls out nvidia as part of it's output.

Right now, I'm checking against the new (6.9.3) kernel PR, in case there is some mismatch between this PR and our current shipped kernel.

XV-02 commented 3 weeks ago

Well, it looks like - while 6.9.3 does result in sometimes being able to resume - the issue of a non-responsive system on resume is still present. We'll either need to dig down ourselves - for which, some guidance would be helpful - or we'll need wait.

My instinct is that this isn't fully a driver issue. It might be a gnome/configuration issue.

XV-02 commented 3 weeks ago

I think we should look to try a newer Nvidia beta driver for 555 or otherwise wait. The resumption issues on the 10xx series GPUs is a blocker for releasing this, and there appear to be issues coming back from suspend regardless of DE. I can log in to Cosmic via cosmic-greeter reliably, but the compositor doesn't seem to be operating correctly as I get issues like the following screenshot: screenshot-2024-06-10-23-25-00 Using GDM, even with the newer Kernel PR, I see either a black screen with a mouse cursor, or the same non-responsive black screen as before the vast majority of the time.