pop-os / nvidia-graphics-drivers

Pop!_OS NVIDIA Graphics Drivers
141 stars 8 forks source link

NVIDIA 550.40.08 #201

Closed mmstick closed 7 months ago

mmstick commented 8 months ago

https://www.nvidia.com/download/driverResults.aspx/218153/en-us/

For review purposes. We can choose to release this separately in a different repository if we want to release it.

gabriele2000 commented 8 months ago

DKMS doesn't build the module

Terminal log

``` Loading new nvidia-550.40.07 DKMS files... Building for 6.6.10-76060610-generic Building for architecture x86_64 Building initial module for 6.6.10-76060610-generic ERROR (dkms apport): kernel package linux-headers-6.6.10-76060610-generic is not supported Error! Bad return status for module build on kernel: 6.6.10-76060610-generic (x86_64) Consult /var/lib/dkms/nvidia/550.40.07/build/make.log for more information. dpkg: error processing package nvidia-dkms-550 (--configure): installed nvidia-dkms-550 package post-installation script subprocess returned error exit status 10 dpkg: dependency problems prevent configuration of nvidia-driver-550: nvidia-driver-550 depends on nvidia-dkms-550 (>= 550.40.07); however: Package nvidia-dkms-550 is not configured yet. dpkg: error processing package nvidia-driver-550 (--configure): dependency problems - leaving unconfigured No apport report written because the error message indicates its a followup error from a previous failure. Processing triggers for gnome-menus (3.36.0-1ubuntu3) ... Processing triggers for libc-bin (2.35-0ubuntu3.6) ... Processing triggers for man-db (2.10.2-1) ... Processing triggers for dbus (1.12.20-2ubuntu4.1) ... Processing triggers for dbus-broker (29-4build1) ... Processing triggers for mailcap (3.70+nmu1ubuntu1) ... Processing triggers for desktop-file-utils (0.26-1ubuntu3) ... Processing triggers for initramfs-tools (0.140ubuntu13.4) ... update-initramfs: Generating /boot/initrd.img-6.6.10-76060610-generic kernelstub.Config : INFO Looking for configuration... kernelstub : INFO System information: OS:..................Pop!_OS 22.04 Root partition:....../dev/nvme0n1p3 Root FS UUID:........012303dc-82fb-4ce0-8f08-62160f8e3263 ESP Path:............/boot/efi ESP Partition:......./dev/nvme0n1p1 ESP Partition #:.....1 NVRAM entry #:.......-1 Boot Variable #:.....0000 Kernel Boot Options:.i915.mitigations=off intel_pstate=disable mitigations=off systemd.show_status=false loglevel=0 quiet Kernel Image Path:.../boot/vmlinuz-6.6.10-76060610-generic Initrd Image Path:.../boot/initrd.img-6.6.10-76060610-generic Force-overwrite:.....False kernelstub.Installer : INFO Copying Kernel into ESP kernelstub.Installer : INFO Copying initrd.img into ESP kernelstub.Installer : INFO Setting up loader.conf configuration kernelstub.Installer : INFO Making entry file for Pop!_OS kernelstub.Installer : INFO Backing up old kernel kernelstub.Installer : INFO Making entry file for Pop!_OS Errors were encountered while processing: nvidia-dkms-550 nvidia-driver-550 E: Sub-process /usr/bin/dpkg returned an error code (1) gabriele@msi-gp72m:~$ ```

make.log

``` gabriele@msi-gp72m:~$ cat /var/lib/dkms/nvidia/550.40.07/build/make.log DKMS make.log for nvidia-550.40.07 for kernel 6.6.10-76060610-generic (x86_64) gio 25 gen 2024, 22:10:48, CET make[1]: Entering directory '/usr/src/linux-headers-6.6.10-76060610-generic' make --no-print-directory -C /usr/src/linux-headers-6.6.10-76060610-generic \ -f /usr/src/linux-headers-6.6.10-76060610-generic/Makefile modules make -f ./scripts/Makefile.build obj=/var/lib/dkms/nvidia/550.40.07/build need-builtin=1 need-modorder=1 /var/lib/dkms/nvidia/550.40.07/build/Kbuild:233: /var/lib/dkms/nvidia/550.40.07/build/header-presence-tests.mk: No such file or directory make[3]: *** No rule to make target '/var/lib/dkms/nvidia/550.40.07/build/header-presence-tests.mk'. Stop. make[2]: *** [/usr/src/linux-headers-6.6.10-76060610-generic/Makefile:1919: /var/lib/dkms/nvidia/550.40.07/build] Error 2 make[1]: *** [Makefile:234: __sub-make] Error 2 make[1]: Leaving directory '/usr/src/linux-headers-6.6.10-76060610-generic' make: *** [Makefile:85: modules] Error 2 gabriele@msi-gp72m:~$ ```

XV-02 commented 7 months ago

I'm seeing issues with native Cosmic applications and some others such as the Lapce flatpak being entirely unresponsive in Pop with this PR on desktop.

This is true both in Cosmic DE and Gnome in either Wayland or X modes. Without the Nvidia driver installed, on the same hardware, I see no such issues. I found this on a Spark with a 1080 graphics card - previously we were looking to the Beta driver to address an existing nvidia bug around wayland session in which response times were being measures in seconds per frame on 10xx series hardware. That general responsiveness appears resolved despite the issues with specific applications.

On a Serw13 with a 4070, native Cosmic applications seemed to function without issues, but Lapce was equally unusable in integrated, hybrid, or dedicated graphics modes in a different way. Lapce would jump almost entirely off the display (to the left) whenever it received a mouse input, and horizontal window size would collapse to about twice the width of that application's close button. However, I could move and resize Lapce successfully using the Cosmic extensions for Gnome.

It looks like the driver may need a little longer to mature before we can push it. Alternatively, it may suggest broader issues.

gabriele2000 commented 7 months ago

response times were being measures in seconds per frame on 10xx series hardware. That general responsiveness appears resolved despite the issues with specific applications.

Heh, I remember that a week after that problem appeared, maybe two, there was a patch for every cosmic application that fixed the issue, despite adding a rendering issue for a lot of elements. Since two days that problem got fixed too, since wgpu got patched.

Basically nvidia claims were fake, since I was able at some point, thanks to @mmstick fix (later that day), to see that not only the problem didn't get corrected even after the driver update, but I had the classic standby issues (and even other issues, such flatpaks being unresponsive, like you said) that you often get after a new nvidia driver release.

It's a beta, sure, but beta doesn't mean "broken", it means "it works with some minor issues"

mmstick commented 7 months ago

It's possible that a required fix is in https://github.com/pop-os/egl-wayland/pull/2

XV-02 commented 7 months ago

The egl-wayland Nvidia commits are resolving most of my issues. I don't think that other behaviours I'm seeing are specific to this Nvidia update. I'll continue seeing if I notice any other differences/regressions between 550 and 545.

RayJW commented 7 months ago

This driver is currently unusable on my system. I've attached a log extract of what's happening on boot. Basically I just get a grey screen and when switching to a TTY session I get spammed with the line Feb 07 23:28:39 device kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000900] Flip event timeout on head 1 a few times after logging in, before it becomes usable. The same happens when running sudo systemctl restart gdm3 and the graphical session is still unusable.

All updates are applied, including the newly released libnvidia-egl-wayland1 version: 1:1.1.13-2pop1~1707162632~22.04~c5241b5.

XV-02 commented 7 months ago

@RayJW can you attach system specifications for us? Particularly what GPU you have installed?

RayJW commented 7 months ago

@RayJW can you attach system specifications for us? Particularly what GPU you have installed?

Sure! I have a Ryzen 9 3900X with a GTX 1080 Ti, anything else that could be of help?

XV-02 commented 7 months ago

@RayJW We don't have a GTX 1080 Ti in our lab. However, our standard GTX 1080 performed without issues. If you haven't already, please try the following steps:

1) Remove the Nvidia driver, and reboot to see if the card functions under the open-source Nouveau driver. First, from a TTY, run sudo apt purge ~nnvidia (that "~nnvidia" with two "n"s is not a typo :smile: ) This will remove the Nvidia driver and associated configuration files. Then run sudo apt autoremove which will remove any packages that were installed as dependencies, but are no longer needed. This should clean out and dependencies the Nvidia driver pulled in that are no longer needed.

2) Reboot the system, and see if you can successfully reach a graphical user session. If you can successfully reach a graphical user session:

3) Try reinstalling the driver. Run sudo apt install nvidia-driver-550 which will install the Nvidia-driver from this pull-request, and try rebooting again.

The Nvidia driver is complicated, and I have seen issues where purging and reinstalling the driver has been the solution, so those are good first steps.

XV-02 commented 7 months ago

Broadly, on our lab hardware, I am seeing compliance with this driver. There is one obvious issue though. nvidia-smi reports the driver version as 550.40.07 not 550.40.08

It looks to be behaving without issue on 10xx series, 20xx series, and 40xx series GPUs in my desktop testing. On the laptop front, I've only really looked at a 16xx series system, so that might leave a 40xx series laptop to test unless someone else on QA has already covered that base.

Finally, something I have noticed recently, which is a nuisance and not an actual driver problem, is that the output of nvidia-smi is now wider than our default width for freshly spawned gnome-terminal window in floating mode. I don't know if that's something we had configured ourselves or not, but thought I'd mention it as a quality of life thing if it was and it had slipped through the cracks at some point.

mmstick commented 7 months ago

550.40.07 is correct. I must have typo'd the .08

RayJW commented 7 months ago

@XV-02 Sorry, it seems like that fixed it, although I was so sure I already tried that. I did however only purge ".*nvidia.*" so maybe the ~nnvidia did the trick. I can confirm the system seems to be working fine now and the session seems to be working without issues so far!