Closed zeroepoch closed 10 months ago
That's weird. In the latest release the console framebuffer option was added:
$ tail -1 /etc/modprobe.d/nvidia-modeset.conf
options nvidia-drm modeset=1 fbdev=1
This binds a new console framebuffer using the Nvidia kernel module, so regardless of that boot parameter the console is taken over and the framebuffer driver replaced. Beside this, the workaround was not needed since quite some time.
Can you check your kernel command line for other spurious stuff (cat /proc/cmdline
)? Also, did you customize the /etc/modprobe.d/nvidia-modeset.conf
file?
Let me rephrase, regardless of efifb
, vesafb
or simpledrm
using your console, the driver is replaced, so that boot option that tells the kernel to use efifb
instead of simpledrm
should be completely useless.
My system is a little weird and I'm not sure if it's my BIOS (x570 TUF Gaming), Monitor (Monoprice 4k IPS/HDR), or GPU (3090 Ti), but I only see the BIOS output and Grub menu when I do a cold boot. A reboot results in a blank screen until the desktop is shown. If I enable CSM then I don't have this odd problem, but then ReSize BAR doesn't work. This is actually the same for Windows and I see nothing until the tail end of the login process, so nothing to do with Linux that I can see. For this reason I collected logs both with a cool boot, where I think the EFI framebuffer is more properly initialized, and a reboot where it's blank at boot (monitor turns off for a short bit).
The main difference I can see in the case where initcall_blacklist=simpledrm_platform_driver_init
is missing (cold boot or reboot) is the following line shows up a few times.
[drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000800] Failed to apply atomic modeset. Error code: -22
In either case, with or without simpledrm initialized, I see the following in the reboot case (blank screen).
[drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000800] Flip event timeout on head 0
Neither error shows up in the cold boot case with the kernel option added.
As mentioned earlier with initcall_blacklist=simpledrm_platform_driver_init
it always ends up at the desktop, but the reboot case does take a little longer. With this kernel option omitted it sometimes stop at the login screen, sometimes logs in but all input is dead, and sometimes I can't login manually (X dies). In these "broken" cases I collected the logs by SSH'ing from my laptop to the desktop.
I attached the journalctl output for gdm-x-session
for one of the bad boots. What's probably most relevant here is this:
/usr/libexec/gdm-x-session[1994]: (EE) NVIDIA(GPU-0): Failed to acquire modesetting permission.
/usr/libexec/gdm-x-session[1994]: (EE) NVIDIA(0): Failing initialization of X screen
Adding back initcall_blacklist=simpledrm_platform_driver_init
when nvidia-kmod-common
updates will be a little annoying, but at least it provides a stable workaround. If others are seeing improved compatibility with this kernel option removed then it makes sense to keep the logic you have now.
@scaronni not sure what changed. Either kernel 6.6, or my more likely hunch is they fixed some related bug in 545.29.06
. Anyways I'm not seeing these modesetting issues in journalctl on either a cold boot or reboot (where screen blanks until desktop loads). It also boots slightly faster (I think?) without the initcall disabled. Happy to have it resolved so I can leave the grub options as intended with initcall_blacklist=simpledrm_platform_driver_init
removed. I'll go ahead and close this issue out.
Thanks for feedback, I'm glad to hear it was solved with the update, not much i could do otherwise.
Previous I had
initcall_blacklist=simpledrm_platform_driver_init
added in/etc/default/grub
and a recent update ofnvidia-kmod-common
removes this option when the driver is updated. I couldn't figure out why it was so unreliable to login on boot and even sometimes failing to load the session when logging in manually. I couldn't get VTs to work either. Once I added backinitcall_blacklist=simpledrm_platform_driver_init
everything started to work as expected and VTs work and it logs in automatically reliably. I'm curious why this kernel option was removed and if others are seeing the same issue.