ublue-os / bazzite

Bazzite is a custom image built upon Fedora Atomic Desktops that brings the best of Linux gaming to all of your devices - including your favorite handheld.
https://bazzite.gg
Apache License 2.0
3.81k stars 226 forks source link

[NVIDIA - STABLE] Getting kicket back to SDDM Greeter after login #1704

Open NekroSomnia opened 2 weeks ago

NekroSomnia commented 2 weeks ago

Describe the bug

After login, i get kicked back to the Greeter/Lockscreen located on TTY1. If i try to login again, the Desktop won't load / the screen stays black.

What did you expect to happen?

I expected to have a desktop, that doesn't kick me out after a Minute

Output of rpm-ostree status

State: idle
Deployments:
● ostree-unverified-registry:ghcr.io/ublue-os/bazzite-nvidia:stable
                   Digest: sha256:158bbced9de484d9e6a3acca9534be77d0becab8e7d4d828a75880024ef53340
                  Version: 40.20240922.0 (2024-09-23T05:05:17Z)
                Initramfs: regenerate

  ostree-unverified-registry:ghcr.io/ublue-os/bazzite-nvidia:stable
                   Digest: sha256:e447992949d4508d573ddce67fd2669aef87cc98efe2fe44312db54d052b5aeb
                  Version: 40.20240914.0 (2024-09-15T21:06:47Z)
                Initramfs: regenerate

Hardware

No response

Extra information or context

After getting kicked back to the Greeter, i either have to reboot using a shell on TTY3 or higher, or kill the Desktop Session on TTY2 manually using loginctl kill-session {ID}

I've noticed, that the SDDM Session usually gets destroyed, after a sucessfull login (if this issue doesn't happen). This is not the case in the instances, I get bootet out of my Session. In this case, the greeter Session persists and has the Status "online", while the Desktop Session on TTY2 has the State "closing".

After killing the "defective" session, another login attempt will result in a working Desktop without any interuptions/unexpected lockscreens.

Attached are 2 Photos. One (quite blurry butr readable) image of the output from loginctl list-sessions after a failed login and one with the output after a working login. BotchedLogin_Censored WorkingLogin_Censored

Moshugan commented 2 weeks ago

I have the exact same issue. Same Bazzite version too (same rpm-ostree status output). The strange thing about it is that the desktop might not crash if Steam is running! If Steam is not running then it will certainly crash sooner than later. Discovery app might also contribute to the crashing.

Other weird behaviors include the inability to update certain "freedesktop platform" parts. Discovery didn't want to update certain parts, but when running system update it did something to them, but still outputs weird warnings that I don't know what to do about.

error_01 error_02 error_03 error_04

Moshugan commented 2 weeks ago

Okay, the Steam thing might be a coincidence. I did not do any if those things that you did, but I am succesfully running Bazzite right now and my loginctl output is the same:

error_05

I'm not knowledgeable enough to understand this. I have no idea why it's working now but not at other times.

mrdev023 commented 2 weeks ago

Same problem

State: idle
Deployments:
● ostree-image-signed:docker://ghcr.io/ublue-os/bazzite-nvidia-open:stable
                   Digest: sha256:7714620ce66e84806949720204c07da491b6b31bc6304a27ca620893bf1508b9
                  Version: 40.20240922.0 (2024-09-23T05:09:18Z)

  ostree-image-signed:docker://ghcr.io/ublue-os/bazzite-nvidia-open:stable
                   Digest: sha256:0038d9bf78189ecf1505eeeaa22b7bf47fc39a8bdd2527c6b9a043ba4f99e14c
                  Version: 40.20240921.1 (2024-09-22T12:39:50Z)
Moshugan commented 2 weeks ago

For some inexplicable reason my recent logins have been without issues.

alec-petros commented 2 weeks ago

I am experiencing the same issue, though I've had mixed success in killing the tty2 session, returning to the sddm greeter and logging in again. Not sure if this is relevant, as this appears to happen even on a successful load, but I see this in plasmashell logs immediately following boot:

Sep 30 23:02:35 bazzite plasmashell[6040]: KPackageStructure of KPluginMetaData(pluginId:"dev.jhyub.supergfxctl", fileName: "/usr/share/plasma/plasmoids/dev.jhyub.supergfxctl/metadata.json") does not match requested format "Plasma/Applet"
Sep 30 23:02:35 bazzite plasmashell[6040]: kde.plasmashell: Aborting shell load: The activity manager daemon (kactivitymanagerd) is not running.
Sep 30 23:02:35 bazzite plasmashell[6040]: kde.plasmashell: If this Plasma has been installed into a custom prefix, verify that its D-Bus services dir is known to the system for the daemon to be activatable.
Sep 30 23:02:36 bazzite plasmashell[6040]: kde.plasmashell: Aborting shell load: The activity manager daemon (kactivitymanagerd) is not running.
Sep 30 23:02:36 bazzite plasmashell[6040]: kde.plasmashell: If this Plasma has been installed into a custom prefix, verify that its D-Bus services dir is known to the system for the daemon to be activatable.

Also possibly relevant, it seems that so far for me, a boot immediately following an update / new rpm-ostree deploy will tend to work correctly, and this bug occurs on following boots on the same deploy. I've noticed this offhand over the past couple weeks of this issue popping up. Haven't tested this theory extensively yet, but last night I started rebooting from tty3 as mentioned in the initial report to see if it it would boot correctly. I got this crash-to-sddm bug about six or seven times in a row, then out of curiosity I did an rpm-ostree update of an irrelevant package (the spotify client) to trigger a new deploy, and the next reboot was successful.

NekroSomnia commented 1 week ago

Also possibly relevant, it seems that so far for me, a boot immediately following an update / new rpm-ostree deploy will tend to work correctly

I've notuiced something simmilar, although it seems like a regular reboot will do the trick too.

I also noticed - from this and this reddit posts, that the issue might be related to the combination of Ryzen and Nvidia.

If all that are affecvted use and AMD CPU wirth an nVidia GPU, we might be on to something - maybe an incompatibility, maybe a red herring, but certainly something

Moshugan commented 1 week ago

It happened to me again, totally randomly. Immediately after boot I checked loginctl list-sessions and at first tty2 looked normal. Then I updated some flatpaks on Discovery and watched some show on Netflix with Chrome for a little while. Then it just suddenly threw me to the greeter. I immediately opened tty4 and checked loginctl list-sessions which gave me this:

0015 yoshi 1

I was able to do a new login that worked following NekroSomnias directions.

Also possibly relevant, it seems that so far for me, a boot immediately following an update / new rpm-ostree deploy will tend to work correctly

I've notuiced something simmilar, although it seems like a regular reboot will do the trick too.

I also noticed - from this and this reddit posts, that the issue might be related to the combination of Ryzen and Nvidia.

If all that are affecvted use and AMD CPU wirth an nVidia GPU, we might be on to something - maybe an incompatibility, maybe a red herring, but certainly something

I also do have a Ryzen 5 3600 CPU and a GeForce RTX 3070 GPU. Thanks for those posts! I hope this issue gets noticed by the devs.

NekroSomnia commented 1 week ago

I was able to do a new login that worked following NekroSomnias directions.

Glad that helped :D

I've exported my journalctl via journalctl --since today > ~/Desktop/journalctl-export.log and will disassemble that one once i got the time for it. Its a long log file, so that is gonna take some time but might shine some light on the issue

I also do have a Ryzen 5 3600 CPU and a GeForce RTX 3070 GPU.

That's good to know, i hope we are onto something here.

NekroSomnia commented 1 week ago

Little Update : I've had a quick look at the log this morning and found the following lines, right before i get disconnected from the active Session :

Oct 02 10:35:04 COMPUTER.DOMAIN.NAME setroubleshoot[8761]: SELinux is preventing kwin_wayland from 'read, write' accesses on the chr_file nvidia-modeset.

and

Oct 02 10:35:05 COMPUTER.DOMAIN.NAME sddm-helper-start-wayland[8342]: "kwin_wayland_drm: Presentation failed! Invalid argument\n"
Oct 02 10:35:05 COMPUTER.DOMAIN.NAME sddm-helper-start-wayland[8342]: "kwin_core: Applying output config failed!\n"
Oct 02 10:35:05 COMPUTER.DOMAIN.NAME sddm-helper-start-wayland[8342]: "kwin_wayland_drm: Presentation failed! Permission denied\n"

Note that there is teh hint to run sealert -l fced9120-1a43-4615-b5c3-66eae81adbc2 for more information. So i did. This is the output :

SELinux is preventing kwin_wayland from 'read, write' accesses on the chr_file nvidia-modeset.

*****  Plugin device (91.4 confidence) suggests   ****************************

If you want to allow kwin_wayland to have read write access on the nvidia-modeset chr_file
Then you need to change the label on nvidia-modeset to a type of a similar device.
Do
# semanage fcontext -a -t SIMILAR_TYPE 'nvidia-modeset'
# restorecon -v 'nvidia-modeset'

*****  Plugin catchall (9.59 confidence) suggests   **************************

If you believe that kwin_wayland should be allowed read write access on the nvidia-modeset chr_file by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# ausearch -c 'kwin_wayland' --raw | audit2allow -M my-kwinwayland
# semodule -X 300 -i my-kwinwayland.pp

Additional Information:
Source Context                system_u:system_r:xdm_t:s0-s0:c0.c1023
Target Context                system_u:object_r:device_t:s0
Target Objects                nvidia-modeset [ chr_file ]
Source                        kwin_wayland
Source Path                   kwin_wayland
Port                          <Unknown>
Host                          COMPUTER.DOMAIN.NAME
Source RPM Packages           
Target RPM Packages           
SELinux Policy RPM            <Unknown>
Local Policy RPM              <Unknown>
Selinux Enabled               True
Policy Type                   targeted
Enforcing Mode                Enforcing
Host Name                     COMPUTER.DOMAIN.NAME
Platform                      Linux COMPUTER.DOMAIN.NAME
                              6.9.12-205.fsync.fc40.x86_64 #1 SMP
                              PREEMPT_DYNAMIC Thu Aug 22 20:33:26 UTC 2024
                              x86_64
Alert Count                   326
First Seen                    2024-08-05 23:46:34 CEST
Last Seen                     2024-10-02 10:43:14 CEST
Local ID                      fced9120-1a43-4615-b5c3-66eae81adbc2

Raw Audit Messages
type=AVC msg=audit(1727858594.192:9961): avc:  denied  { read write } for  pid=6984 comm="maliit-keyboard" name="nvidia-modeset" dev="devtmpfs" ino=1458 scontext=system_u:system_r:xdm_t:s0-s0:c0.c1023 tcontext=system_u:object_r:device_t:s0 tclass=chr_file permissive=0

Hash: kwin_wayland,xdm_t,device_t,chr_file,read,write

I have never had to troubleshoot anything to do with SELinux, but (as far as i understand this), it seems like SELinux is blocking Wayland to read the DRM Cache (DRM = Direct Render Manager, not Digital Rights Management).

I'll try to figure out, how to allow that, without setting SELinux to permissive mode after i've pulled a Backup of my drive.

NekroSomnia commented 1 week ago

It seems like i accidentally fixed my issue.

Had to reset my CMOS yesterday after installing more RAM, since i got some weird post issues (too many Sticks, too high of a frequency, Memory controller wasn't having it). Now the issue seems to be gone. I replicated the BIOS Settings i had before, just to see, if the issue would pop up again. But no, it seems like my problems just vanished.

I should be happy about that, but the fact, that I don't know what caused the problems just annoys me to no end.

Moshugan commented 1 week ago

It seems like i accidentally fixed my issue.

Had to reset my CMOS yesterday after installing more RAM, since i got some weird post issues (too many Sticks, too high of a frequency, Memory controller wasn't having it). Now the issue seems to be gone. I replicated the BIOS Settings i had before, just to see, if the issue would pop up again. But no, it seems like my problems just vanished.

I should be happy about that, but the fact, that I don't know what caused the problems just annoys me to no end.

Aha!! So it might be some kind of an issue related to CMOS? You know what, I've recently had this problem that the clock time has been wrong every time I've booted up Windows 10! The clock on Linux has been right from bootup, but I guess it just updates it immediately via online unlike Windows where I had to manually choose to sync the clock . I've suspected that there's something wrong with the CMOS battery but haven't got around to doing anything about it yet. So if the CMOS battery dying is causing the wrong time on Windows, then maybe it's causing this issue on Bazzite?

BTW, thank you very much for doing all this work on this issue!

tarus13 commented 2 days ago

I’d like to add to this conversation. I have NVIDIA 4060 and i5 13400 and experiencing this same issue. It’s completely random and only shutting down and restarting the machine resolves the issue.