ublue-os / bazzite

Bazzite is a custom image built upon Fedora Atomic Desktops that brings the best of Linux gaming to all of your devices - including your favorite handheld.
https://bazzite.gg
Apache License 2.0
3.18k stars 168 forks source link

sddm (login screen) crash on boot [NVIDIA] #789

Open duhow opened 4 months ago

duhow commented 4 months ago

Describe the bug

Power on the PC, boot splash screen, then terminal. No login, just the blinking cursor _. Sometimes white garbage pixels show. sddm-greeter crashed according to dmesg.

[   17.773336] show_signal_msg: 51 callbacks suppressed
[   17.773339] sddm-greeter[4519]: segfault at 178 ip 00007fd1fe815a80 sp 00007ffc43c04520 error 4 in libnvidia-eglcore.so.545.29.06[7fd1fde27000+bc3000] likely on CPU 4 (core 0, socket 0)
[   17.773348] Code: fe e8 d4 d5 0b 00 eb 8a 66 90 48 8b 78 10 4c 89 fe ff d2 e9 7a ff ff ff 66 90 55 48 89 fd 53 48 83 ec 08 48 8b 87 e0 03 00 00 <48> 39 b8 78 01 00 00 0f 84 eb 00 00 00 48 8b bd e8 03 00 00 48 85
[   18.390725] evdi: [I] (card3) Closed by Task 4990 ((sd-close)) of process 4990 ((sd-close))

If I run into another TTY, login, then sudo systemctl restart sddm, everything works again fine.

What did you expect to happen?

Boot into the login screen.

Output of rpm-ostree status

State: idle
Deployments:
● ostree-image-signed:docker://ghcr.io/ublue-os/bazzite-nvidia:latest
                   Digest: sha256:0c779386100258771e5947e913348d072935e2683f8301ca5c256d4b8f44bbb0
                  Version: 39.20240116.0 (2024-02-20T17:03:54Z)
            LocalPackages: sunshine-0.21.0-1.x86_64
                Initramfs: '"-I /etc/crypttab /usr/lib/modprobe.d/nvidia.conf"' 

  ostree-image-signed:docker://ghcr.io/ublue-os/bazzite-nvidia:latest
                   Digest: sha256:48c4a0b0e2837e6e1317d889bd2886f964e6b0c396b35756a135ace5a3257135
                  Version: 39.20240116.0 (2024-02-19T07:26:50Z)
            LocalPackages: sunshine-0.21.0-1.x86_64
                Initramfs: '"-I /etc/crypttab /usr/lib/modprobe.d/nvidia.conf"'

Hardware

ASUSTeK Z170-A Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz NVIDIA Corporation GM204 [GeForce GTX 970]

Extra information or context

``` [ 9.088846] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input23 [ 9.088951] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input24 [ 9.089015] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input25 [ 9.089067] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input26 [ 9.377694] input: HDA Intel PCH Front Mic as /devices/pci0000:00/0000:00:1f.3/sound/card0/input27 [ 9.390375] e1000e: Intel(R) PRO/1000 Network Driver [ 9.390381] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. [ 9.392890] e1000e 0000:00:1f.6: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode [ 9.410589] input: HDA Intel PCH Rear Mic as /devices/pci0000:00/0000:00:1f.3/sound/card0/input28 [ 9.410688] input: HDA Intel PCH Line as /devices/pci0000:00/0000:00:1f.3/sound/card0/input29 [ 9.429834] i801_smbus 0000:00:1f.4: SPD Write Disable is set [ 9.429869] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt [ 9.435854] input: HDA Intel PCH Line Out Front as /devices/pci0000:00/0000:00:1f.3/sound/card0/input30 [ 9.435933] input: HDA Intel PCH Line Out Surround as /devices/pci0000:00/0000:00:1f.3/sound/card0/input31 [ 9.435988] input: HDA Intel PCH Line Out CLFE as /devices/pci0000:00/0000:00:1f.3/sound/card0/input32 [ 9.436038] input: HDA Intel PCH Front Headphone as /devices/pci0000:00/0000:00:1f.3/sound/card0/input33 [ 9.460894] usbcore: registered new interface driver btusb [ 9.462879] Bluetooth: hci0: RTL: examining hci_ver=0a hci_rev=000b lmp_ver=0a lmp_subver=8761 [ 9.463858] Bluetooth: hci0: RTL: rom_version status=0 version=1 [ 9.463861] Bluetooth: hci0: RTL: loading rtl_bt/rtl8761bu_fw.bin [ 9.463956] i2c i2c-0: 2/4 memory slots populated (from DMI) [ 9.469026] Bluetooth: hci0: RTL: loading rtl_bt/rtl8761bu_config.bin [ 9.470060] Bluetooth: hci0: RTL: cfg_sz 6, total sz 30210 [ 9.471428] i2c i2c-0: Successfully instantiated SPD at 0x52 [ 9.471758] i2c i2c-0: Successfully instantiated SPD at 0x53 [ 9.535472] evdi: [I] (card3) Opened by Task 602 (plymouthd) of process 602 (plymouthd) [ 9.535506] evdi: [I] (card3) Closed by Task 602 (plymouthd) of process 602 (plymouthd) [ 9.545392] evdi: [I] (card2) Opened by Task 602 (plymouthd) of process 602 (plymouthd) [ 9.545437] evdi: [I] (card2) Closed by Task 602 (plymouthd) of process 602 (plymouthd) [ 9.548299] evdi: [I] (card1) Opened by Task 602 (plymouthd) of process 602 (plymouthd) [ 9.548362] evdi: [I] (card1) Closed by Task 602 (plymouthd) of process 602 (plymouthd) [ 9.553261] input: PC Speaker as /devices/platform/pcspkr/input/input34 [ 9.606735] evdi: [I] (card0) Opened by Task 602 (plymouthd) of process 602 (plymouthd) [ 9.606771] evdi: [I] (card0) Closed by Task 602 (plymouthd) of process 602 (plymouthd) [ 9.627501] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized): registered PHC clock [ 9.630869] Bluetooth: hci0: RTL: fw version 0xdfc6d922 [ 9.744090] asus_wmi: ASUS WMI generic driver loaded [ 9.846699] asus_wmi: Initialization: 0x0 [ 9.846734] asus_wmi: BIOS WMI version: 0.9 [ 9.846775] asus_wmi: SFUN value: 0x0 [ 9.846778] eeepc-wmi eeepc-wmi: Detected ASUSWMI, use DCTS [ 9.852057] input: Eee PC WMI hotkeys as /devices/platform/eeepc-wmi/input/input35 [ 9.926289] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width x1) 2c:4d:54:69:68:87 [ 9.926296] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection [ 9.926358] e1000e 0000:00:1f.6 eth0: MAC: 12, PHY: 12, PBA No: FFFFFF-0FF [ 9.938004] evdi: [I] (card0) Opened by Task 602 (plymouthd) of process 602 (plymouthd) [ 9.938034] evdi: [I] (card0) Closed by Task 602 (plymouthd) of process 602 (plymouthd) [ 9.938267] evdi: [I] (card1) Opened by Task 602 (plymouthd) of process 602 (plymouthd) [ 9.938289] evdi: [I] (card1) Closed by Task 602 (plymouthd) of process 602 (plymouthd) [ 9.938542] evdi: [I] (card2) Opened by Task 602 (plymouthd) of process 602 (plymouthd) [ 9.938565] evdi: [I] (card2) Closed by Task 602 (plymouthd) of process 602 (plymouthd) [ 9.938800] evdi: [I] (card3) Opened by Task 602 (plymouthd) of process 602 (plymouthd) [ 9.938826] evdi: [I] (card3) Closed by Task 602 (plymouthd) of process 602 (plymouthd) [ 9.947825] fbcon: Taking over console [ 10.028516] Console: switching to colour frame buffer device 128x48 [ 10.115530] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 655360 ms ovfl timer [ 10.115534] RAPL PMU: hw unit of domain pp0-core 2^-14 Joules [ 10.115536] RAPL PMU: hw unit of domain package 2^-14 Joules [ 10.115537] RAPL PMU: hw unit of domain dram 2^-14 Joules [ 10.889340] intel_tcc_cooling: Programmable TCC Offset detected [ 11.114280] NET: Registered PF_QIPCRTR protocol family [ 12.404478] nvidia: module license 'NVIDIA' taints kernel. [ 12.404487] Disabling lock debugging due to kernel taint [ 12.404491] nvidia: module license taints kernel. [ 12.885367] Bluetooth: BNEP (Ethernet Emulation) ver 1.3 [ 12.885371] Bluetooth: BNEP filters: protocol multicast [ 12.885376] Bluetooth: BNEP socket layer initialized [ 12.887282] nvidia-nvlink: Nvlink Core is being initialized, major device number 234 [ 12.887308] Bluetooth: MGMT ver 1.22 [ 12.888910] iTCO_vendor_support: vendor-support=0 [ 12.893676] nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=io+mem [ 12.897855] iTCO_wdt iTCO_wdt: Found a Intel PCH TCO device (Version=4, TCOBASE=0x0400) [ 12.898319] iTCO_wdt iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0) [ 12.980778] ee1004 0-0052: 512 byte EE1004-compliant SPD EEPROM, read-only [ 12.980823] ee1004 0-0053: 512 byte EE1004-compliant SPD EEPROM, read-only [ 12.990046] Bluetooth: hci0: Bad flag given (0x1) vs supported (0x0) [ 13.178267] e1000e 0000:00:1f.6 enp0s31f6: renamed from eth0 [ 13.225994] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 545.29.06 Thu Nov 16 01:59:08 UTC 2023 [ 13.362116] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint. [ 13.552710] nvidia-uvm: Loaded the UVM driver, major device number 510. [ 13.573281] intel_rapl_common: Found RAPL domain package [ 13.573287] intel_rapl_common: Found RAPL domain core [ 13.573289] intel_rapl_common: Found RAPL domain dram [ 13.943575] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 545.29.06 Thu Nov 16 01:47:29 UTC 2023 [ 14.247557] tun: Universal TUN/TAP device driver, 1.6 [ 14.347652] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver [ 14.620109] evdi: [I] (card3) Opened by Task 3193 (systemd-logind) of process 3193 (systemd-logind) [ 14.884709] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 4 [ 14.884817] Console: switching to colour dummy device 80x25 [ 14.885404] nvidia 0000:01:00.0: vgaarb: deactivate vga console [ 14.914086] evdi: [I] (card2) Opened by Task 3193 (systemd-logind) of process 3193 (systemd-logind) [ 14.924939] evdi: [I] (card1) Opened by Task 3193 (systemd-logind) of process 3193 (systemd-logind) [ 14.931637] evdi: [I] (card0) Opened by Task 3193 (systemd-logind) of process 3193 (systemd-logind) [ 14.948517] fbcon: nvidia-drmdrmfb (fb0) is primary device [ 14.948660] Console: switching to colour frame buffer device 320x90 [ 14.948675] nvidia 0000:01:00.0: [drm] fb0: nvidia-drmdrmfb frame buffer device [ 15.321524] evdi: [I] (card1) Closed by Task 4686 ((sd-close)) of process 4686 ((sd-close)) [ 15.322469] evdi: [I] (card2) Closed by Task 4685 ((sd-close)) of process 4685 ((sd-close)) [ 15.323478] evdi: [I] (card0) Closed by Task 4687 ((sd-close)) of process 4687 ((sd-close)) [ 16.467900] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx [ 17.691600] Bluetooth: RFCOMM TTY layer initialized [ 17.691609] Bluetooth: RFCOMM socket layer initialized [ 17.691614] Bluetooth: RFCOMM ver 1.11 [ 17.773336] show_signal_msg: 51 callbacks suppressed [ 17.773339] sddm-greeter[4519]: segfault at 178 ip 00007fd1fe815a80 sp 00007ffc43c04520 error 4 in libnvidia-eglcore.so.545.29.06[7fd1fde27000+bc3000] likely on CPU 4 (core 0, socket 0) [ 17.773348] Code: fe e8 d4 d5 0b 00 eb 8a 66 90 48 8b 78 10 4c 89 fe ff d2 e9 7a ff ff ff 66 90 55 48 89 fd 53 48 83 ec 08 48 8b 87 e0 03 00 00 <48> 39 b8 78 01 00 00 0f 84 eb 00 00 00 48 8b bd e8 03 00 00 48 85 [ 18.390725] evdi: [I] (card3) Closed by Task 4990 ((sd-close)) of process 4990 ((sd-close)) [ 33.071907] systemd-journald[785]: /var/log/journal/50067b1bf54549b6904b23fd5606d284/user-1000.journal: Journal file uses a different sequence number ID, rotating. ```
janoxakes commented 4 months ago

Same issue, sudo systemctl restart sddm also helped me login.

Hardware: ASRock B250M-HDV Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz NVIDIA GeForce RTX 3070/PCIe/SSE2

Here output of rpm-ostree status. Pinned image index 1 (2024-02-14) used to work, but from a couple of days ago stopped and index 2 (2024-02-05) is the only one that works without workarounds.

State: idle
Deployments:
● ostree-image-signed:docker://ghcr.io/ublue-os/bazzite-nvidia:latest
                   Digest: sha256:b2e9d91aee99cb448274955238093f129e7871aa6506c25e20ca6b735ac31c2f
                  Version: 39.20240116.0 (2024-02-25T07:26:47Z)
                Initramfs: '"-I /etc/crypttab /usr/lib/modprobe.d/nvidia.conf"' 

  ostree-image-signed:docker://ghcr.io/ublue-os/bazzite-nvidia:latest
                   Digest: sha256:cb9e8310f7180042c48a42ddb3519a337c49fccb4b74b54d81644c905ce12683
                  Version: 39.20240116.0 (2024-02-14T09:33:57Z)
                Initramfs: '"-I /etc/crypttab /usr/lib/modprobe.d/nvidia.conf"' 
                   Pinned: yes

  ostree-image-signed:docker://ghcr.io/ublue-os/bazzite-nvidia:latest
                   Digest: sha256:3fc5f0ab5d9349c9edbf6b03a6e33a5701174233a10d85d5c6bbe49cab404766
                  Version: 39.20240116.0 (2024-02-05T09:03:16Z)
                Initramfs: '"-I /etc/crypttab /usr/lib/modprobe.d/nvidia.conf"' 
                   Pinned: yes
TriMoon commented 4 months ago

For what it's worth This issue caught my attention also, as another unfortunate nVidia consumer... But i don't (luckily) experience the described problem on my machine. I checked https://www.nvidia.com/Download/driverResults.aspx/214100/en-us/ and looks like the 545.29.02 driver supports these as an excerpt from the total:

GeForce RTX 30 Series (Notebooks):
GeForce RTX 3080 Ti Laptop GPU, GeForce RTX 3080 Laptop GPU, GeForce RTX 3070 Ti Laptop GPU, GeForce RTX 3070 Laptop GPU, GeForce RTX 3060 Laptop GPU, GeForce RTX 3050 Ti Laptop GPU, GeForce RTX 3050 Laptop GPU

GeForce RTX 30 Series:
GeForce RTX 3090 Ti, GeForce RTX 3090, GeForce RTX 3080 Ti, GeForce RTX 3080, GeForce RTX 3070 Ti, GeForce RTX 3070, GeForce RTX 3060 Ti, GeForce RTX 3060, GeForce RTX 3050

GeForce 900 Series:
GeForce GTX 980 Ti, GeForce GTX 980, GeForce GTX 970, GeForce GTX 960, GeForce GTX 950

, anyhow:

@duhow Can you try to see if the problem persists wtihout using shunshine :thinking:

@janoxakes can you try to enroll the new MOK-Key using the ujust command on the newer images? They used a different akmods-MOK-key back then IIRC.

Maybe both of you need to enroll this (new) MOK-Key, in case you are booting with SecureBoot on... :woman_shrugging:

duhow commented 4 months ago

While I do have sunshine , I haven't used it, and I don't recall it starts during boot process... Surprisingly, without any other changes than bumping to latest version, ~now it seems to be working?~

📝 EDIT: Nah, after rebooting the issue happened again.

[   15.428435] show_signal_msg: 7 callbacks suppressed
[   15.428437] sddm-greeter[4359]: segfault at 178 ip 00007f6842e15a80 sp 00007ffd30774220 error 4 in libnvidia-eglcore.so.545.29.06[7f6842427000+bc3000] likely on CPU 3 (core 3, socket 0)
[   15.428449] Code: fe e8 d4 d5 0b 00 eb 8a 66 90 48 8b 78 10 4c 89 fe ff d2 e9 7a ff ff ff 66 90 55 48 89 fd 53 48 83 ec 08 48 8b 87 e0 03 00 00 <48> 39 b8 78 01 00 00 0f 84 eb 00 00 00 48 8b bd e8 03 00 00 48 85
[   16.441422] evdi: [I] (card3) Closed by Task 4983 ((sd-close)) of process 4983 ((sd-close))
● ostree-image-signed:docker://ghcr.io/ublue-os/bazzite-nvidia:latest
                   Digest: sha256:e7bf4f3cc4ae67e49f81cf1d6a4f57a90656757cde71c02be4e267eff689b418
                  Version: 39.20240116.0 (2024-02-24T02:02:28Z)
            LocalPackages: sunshine-0.21.0-1.x86_64
                Initramfs: '"-I /etc/crypttab /usr/lib/modprobe.d/nvidia.conf"' 

  ostree-image-signed:docker://ghcr.io/ublue-os/bazzite-nvidia:latest
                   Digest: sha256:0c779386100258771e5947e913348d072935e2683f8301ca5c256d4b8f44bbb0
                  Version: 39.20240116.0 (2024-02-20T17:03:54Z)
            LocalPackages: sunshine-0.21.0-1.x86_64
                Initramfs: '"-I /etc/crypttab /usr/lib/modprobe.d/nvidia.conf"' 
janoxakes commented 4 months ago

@TriMoon, I have secure boot disabled because I had similar (related?) issues mid january. I followed the steps for enrolling the keys in secure boot anyway back then, and they appear as already registered. AFAIK, if it was due to secure boot, restarting sddm wouldn't solve anything, right?

@fedora:~$ just enroll-secure-boot-key 
echo 'Enter password "ublue-os" if prompted after your user password.'
Enter password "ublue-os" if prompted after your user password.
sudo mokutil --timeout -1
[sudo] password for jano: 
sudo mokutil --import /etc/pki/akmods/certs/akmods-ublue.der
SKIP: /etc/pki/akmods/certs/akmods-ublue.der is already enrolled
echo 'When you reboot your computer, follow the instructions to start MOK util'
When you reboot your computer, follow the instructions to start MOK util
echo 'by pressing a key, then enroll the secure boot key and enter "ublue-os" as the password'
by pressing a key, then enroll the secure boot key and enter "ublue-os" as the password
TriMoon commented 4 months ago

AFAIK, if it was due to secure boot, restarting sddm wouldn't solve anything, right?

Iv'e met so many illogical bugs in my live, that i stopped looking for logical reasons when it comes to bugs in programs :rofl: But anyhow, it was just a something you guys might want to try.

Didn't solve the problem it seems, so i have no idea what to suggest further... (This uBlue/ostree stuff is also new to me, so..) :woman_shrugging:

janoxakes commented 3 months ago

@duhow it suddenly started working fine for me today. I hadn't used the system for over a week so it might be one of the updates during this time.

Did it work for you?

deverton commented 2 months ago

I've run in to this as well with though it's moved on from sddm crashing to pretty much all of the KDE apps segfaulting inside the nvidia driver. Doing a kill on the kdewayland process than systemctl restart sddm seems to work to bring it back.

Apr 12 18:41:57 fedora kernel: kded5[4896]: segfault at 178 ip 00007fae3a4fc140 sp 00007ffc1d4ea5c0 error 4 in libnvidia-glcore.so.550.67[7fae39800000+1d27000] likely on CPU 3 (core 3, socket 0)