selkies-project / docker-nvidia-glx-desktop

KDE Plasma Desktop container designed for Kubernetes, supporting OpenGL EGL and GLX, Vulkan, and Wine/Proton for NVIDIA GPUs through WebRTC and HTML5, providing an open-source remote cloud/HPC graphics or game streaming platform.
https://github.com/selkies-project/docker-nvidia-glx-desktop/pkgs/container/nvidia-glx-desktop
Mozilla Public License 2.0
322 stars 67 forks source link

List of NVIDIA drivers with issues #41

Closed maxpain closed 1 year ago

maxpain commented 1 year ago

Hello. I'm trying to run this container in my home Kubernetes cluster on Talos Linux with RTX4090 GPU. Nvidia driver: 535.86.05

root@csgo-0:/tmp# cat /home/user/.local/share/xorg/Xorg.0.log
[  3108.301] _XSERVTransmkdir: ERROR: euid != 0,directory /tmp/.X11-unix will not be created.
[  3108.301] 
X.Org X Server 1.21.1.3
X Protocol Version 11, Revision 0
[  3108.301] Current Operating System: Linux csgo-0 6.1.35-talos #1 SMP PREEMPT_DYNAMIC Wed Jun 28 13:58:51 UTC 2023 x86_64
[  3108.301] Kernel command line: talos.platform=metal talos.config=none console=ttyS0 console=tty0 init_on_alloc=1 slab_nomerge pti=on consoleblank=0 nvme_core.io_timeout=4294967295 printk.devkmsg=on ima_template=ima-ng ima_appraise=fix ima_hash=sha512 mitigations=off cpufreq.default_governor=performance
[  3108.301] xorg-server 2:21.1.3-2ubuntu2.5 (For technical support please see http://www.ubuntu.com/support) 
[  3108.301] Current version of pixman: 0.40.0
[  3108.301]    Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
[  3108.301] Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[  3108.301] (==) Log file: "/home/user/.local/share/xorg/Xorg.0.log", Time: Tue Jul 25 13:36:43 2023
[  3108.301] (==) Using config file: "/etc/X11/xorg.conf"
[  3108.301] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[  3108.301] (==) ServerLayout "Layout0"
[  3108.301] (**) |-->Screen "Screen0" (0)
[  3108.301] (**) |   |-->Monitor "Monitor0"
[  3108.301] (**) |   |-->Device "Device0"
[  3108.301] (**) |-->Input Device "Keyboard0"
[  3108.301] (**) |-->Input Device "Mouse0"
[  3108.301] (**) Option "AutoAddGPU" "false"
[  3108.301] (==) Automatically adding devices
[  3108.301] (==) Automatically enabling devices
[  3108.301] (**) Not automatically adding GPU devices
[  3108.301] (==) Automatically binding GPU devices
[  3108.301] (==) Max clients allowed: 256, resource mask: 0x1fffff
[  3108.301] (WW) The directory "/usr/share/fonts/X11/cyrillic" does not exist.
[  3108.301]    Entry deleted from font path.
[  3108.301] (WW) The directory "/usr/share/fonts/X11/100dpi/" does not exist.
[  3108.301]    Entry deleted from font path.
[  3108.301] (WW) The directory "/usr/share/fonts/X11/75dpi/" does not exist.
[  3108.301]    Entry deleted from font path.
[  3108.301] (WW) The directory "/usr/share/fonts/X11/Type1" does not exist.
[  3108.301]    Entry deleted from font path.
[  3108.301] (WW) The directory "/usr/share/fonts/X11/100dpi" does not exist.
[  3108.301]    Entry deleted from font path.
[  3108.301] (WW) The directory "/usr/share/fonts/X11/75dpi" does not exist.
[  3108.301]    Entry deleted from font path.
[  3108.301] (==) FontPath set to:
        /usr/share/fonts/X11/misc,
        built-ins
[  3108.301] (==) ModulePath set to "/usr/lib/xorg/modules"
[  3108.301] (WW) Hotplugging is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' will be disabled.
[  3108.301] (WW) Disabling Keyboard0
[  3108.301] (WW) Disabling Mouse0
[  3108.301] (II) Loader magic: 0x55f31992c020
[  3108.301] (II) Module ABI versions:
[  3108.301]    X.Org ANSI C Emulation: 0.4
[  3108.301]    X.Org Video Driver: 25.2
[  3108.301]    X.Org XInput driver : 24.4
[  3108.301]    X.Org Server Extension : 10.0
[  3108.303] (EE) systemd-logind: failed to get session: Launch helper exited with unknown return code 1
[  3108.303] (II) xfree86: Adding drm device (/dev/dri/card0)
[  3108.303] (II) Platform probe for /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card0
[  3108.304] (--) PCI:*(1@0:0:0) 10de:2684:10de:165b rev 161, Mem @ 0x93000000/16777216, 0x4000000000/34359738368, 0x4800000000/33554432, I/O @ 0x00006000/128, BIOS @ 0x????????/524288
[  3108.304] (II) LoadModule: "glx"
[  3108.305] (II) Loading /usr/lib/xorg/modules/extensions/libglx.so
[  3108.305] (II) Module glx: vendor="X.Org Foundation"
[  3108.305]    compiled for 1.21.1.3, module version = 1.0.0
[  3108.305]    ABI class: X.Org Server Extension, version 10.0
[  3108.305] (II) LoadModule: "nvidia"
[  3108.305] (II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so
[  3108.305] (II) Module nvidia: vendor="NVIDIA Corporation"
[  3108.305]    compiled for 1.6.99.901, module version = 1.0.0
[  3108.305]    Module class: X.Org Video Driver
[  3108.305] (II) NVIDIA dlloader X Driver  535.86.05  Fri Jul 14 20:26:08 UTC 2023
[  3108.305] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[  3108.305] (II) Loading sub module "fb"
[  3108.305] (II) LoadModule: "fb"
[  3108.305] (II) Module "fb" already built-in
[  3108.305] (II) Loading sub module "wfb"
[  3108.305] (II) LoadModule: "wfb"
[  3108.305] (II) Loading /usr/lib/xorg/modules/libwfb.so
[  3108.305] (II) Module wfb: vendor="X.Org Foundation"
[  3108.305]    compiled for 1.21.1.3, module version = 1.0.0
[  3108.305]    ABI class: X.Org ANSI C Emulation, version 0.4
[  3108.305] (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support
[  3108.305] (**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32
[  3108.305] (==) NVIDIA(0): RGB weight 888
[  3108.305] (==) NVIDIA(0): Default visual is TrueColor
[  3108.305] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
[  3108.305] (**) NVIDIA(0): Option "ModeValidation" "NoMaxPClkCheck, NoEdidMaxPClkCheck, NoMaxSizeCheck, NoHorizSyncCheck, NoVertRefreshCheck, NoVirtualSizeCheck, NoExtendedGpuCapabilitiesCheck, NoTotalSizeCheck, NoDualLinkDVICheck, NoDisplayPortBandwidthCheck, AllowNon3DVisionModes, AllowNonHDMI3DModes, AllowNonEdidModes, NoEdidHDMI2Check, AllowDpInterlaced"
[  3108.305] (**) NVIDIA(0): Option "ProbeAllGpus" "False"
[  3108.305] (**) NVIDIA(0): Option "BaseMosaic" "False"
[  3108.305] (**) NVIDIA(0): Option "AllowEmptyInitialConfiguration" "True"
[  3108.305] (**) NVIDIA(0): Option "HardDPMS" "False"
[  3108.305] (**) NVIDIA(0): Option "ConnectedMonitor" "DFP"
[  3108.305] (**) NVIDIA(0): Enabling 2D acceleration
[  3108.305] (**) NVIDIA(0): ConnectedMonitor string: "DFP"
[  3108.305] (II) Loading sub module "glxserver_nvidia"
[  3108.305] (II) LoadModule: "glxserver_nvidia"
[  3108.305] (II) Loading /usr/lib/xorg/modules/extensions/libglxserver_nvidia.so
[  3108.309] (II) Module glxserver_nvidia: vendor="NVIDIA Corporation"
[  3108.309]    compiled for 1.6.99.901, module version = 1.0.0
[  3108.309]    Module class: X.Org Server Extension
[  3108.309] (II) NVIDIA GLX Module  535.86.05  Fri Jul 14 20:27:17 UTC 2023
[  3108.309] (II) NVIDIA: The X server supports PRIME Render Offload.
[  3108.322] (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:1:0:0
[  3108.322] (--) NVIDIA(0):     DFP-0 (boot)
[  3108.322] (--) NVIDIA(0):     DFP-1
[  3108.322] (--) NVIDIA(0):     DFP-2
[  3108.322] (--) NVIDIA(0):     DFP-3
[  3108.322] (--) NVIDIA(0):     DFP-4
[  3108.322] (--) NVIDIA(0):     DFP-5
[  3108.322] (--) NVIDIA(0):     DFP-6
[  3108.322] (**) NVIDIA(0): Using ConnectedMonitor string "DFP-0".
[  3108.322] (II) NVIDIA(0): NVIDIA GPU NVIDIA GeForce RTX 4090 (AD102-A) at PCI:1:0:0
[  3108.322] (II) NVIDIA(0):     (GPU-0)
[  3108.322] (--) NVIDIA(0): Memory: 25153536 kBytes
[  3108.322] (--) NVIDIA(0): VideoBIOS: 95.02.20.00.01
[  3108.322] (II) NVIDIA(0): Detected PCI Express Link width: 16X
[  3108.365] (--) NVIDIA(GPU-0): LNX PiKVM (DFP-0): connected
[  3108.365] (--) NVIDIA(GPU-0): LNX PiKVM (DFP-0): Internal TMDS
[  3108.365] (--) NVIDIA(GPU-0): LNX PiKVM (DFP-0): 600.0 MHz maximum pixel clock
[  3108.365] (--) NVIDIA(GPU-0): 
[  3108.365] (--) NVIDIA(GPU-0): DFP-1: disconnected
[  3108.365] (--) NVIDIA(GPU-0): DFP-1: Internal DisplayPort
[  3108.365] (--) NVIDIA(GPU-0): DFP-1: 2670.0 MHz maximum pixel clock
[  3108.365] (--) NVIDIA(GPU-0): 
[  3108.365] (--) NVIDIA(GPU-0): DFP-2: disconnected
[  3108.365] (--) NVIDIA(GPU-0): DFP-2: Internal TMDS
[  3108.365] (--) NVIDIA(GPU-0): DFP-2: 165.0 MHz maximum pixel clock
[  3108.365] (--) NVIDIA(GPU-0): 
[  3108.365] (--) NVIDIA(GPU-0): DFP-3: disconnected
[  3108.365] (--) NVIDIA(GPU-0): DFP-3: Internal DisplayPort
[  3108.365] (--) NVIDIA(GPU-0): DFP-3: 2670.0 MHz maximum pixel clock
[  3108.365] (--) NVIDIA(GPU-0): 
[  3108.365] (--) NVIDIA(GPU-0): DFP-4: disconnected
[  3108.365] (--) NVIDIA(GPU-0): DFP-4: Internal TMDS
[  3108.365] (--) NVIDIA(GPU-0): DFP-4: 165.0 MHz maximum pixel clock
[  3108.365] (--) NVIDIA(GPU-0): 
[  3108.365] (--) NVIDIA(GPU-0): DFP-5: disconnected
[  3108.365] (--) NVIDIA(GPU-0): DFP-5: Internal DisplayPort
[  3108.365] (--) NVIDIA(GPU-0): DFP-5: 2670.0 MHz maximum pixel clock
[  3108.365] (--) NVIDIA(GPU-0): 
[  3108.365] (--) NVIDIA(GPU-0): DFP-6: disconnected
[  3108.365] (--) NVIDIA(GPU-0): DFP-6: Internal TMDS
[  3108.365] (--) NVIDIA(GPU-0): DFP-6: 165.0 MHz maximum pixel clock
[  3108.365] (--) NVIDIA(GPU-0): 
[  3108.365] (**) NVIDIA(GPU-0): Mode Validation Overrides for LNX PiKVM (DFP-0):
[  3108.365] (**) NVIDIA(GPU-0):     NoMaxSizeCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoVirtualSizeCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoMaxPClkCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoEdidMaxPClkCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoHorizSyncCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoVertRefreshCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoExtendedGpuCapabilitiesCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoTotalSizeCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoDualLinkDVICheck
[  3108.365] (**) NVIDIA(GPU-0):     NoDisplayPortBandwidthCheck
[  3108.365] (**) NVIDIA(GPU-0):     AllowNon3DVisionModes
[  3108.365] (**) NVIDIA(GPU-0):     AllowNonEdidModes
[  3108.365] (**) NVIDIA(GPU-0):     AllowNonHDMI3DModes
[  3108.365] (**) NVIDIA(GPU-0):     NoEdidHDMI2Check
[  3108.365] (**) NVIDIA(GPU-0):     AllowDpInterlaced
[  3108.366] (EE) NVIDIA(GPU-0): Unable to add conservative default mode "nvidia-auto-select".
[  3108.366] (EE) NVIDIA(GPU-0): Unable to add "nvidia-auto-select" mode to ModePool.
[  3108.366] (WW) NVIDIA(0): No valid modes for "DFP-0:1920x1080R"; removing.
[  3108.366] (WW) NVIDIA(0): 
[  3108.366] (WW) NVIDIA(0): Unable to validate any modes; falling back to the default mode
[  3108.366] (WW) NVIDIA(0):     "nvidia-auto-select".
[  3108.366] (WW) NVIDIA(0): 
[  3108.366] (WW) NVIDIA(0): No valid modes for "DFP-0:nvidia-auto-select"; removing.
[  3108.366] (EE) NVIDIA(0): Unable to use default mode "nvidia-auto-select".
[  3108.366] (EE) NVIDIA(0): Failing initialization of X screen
[  3108.427] (II) UnloadModule: "nvidia"
[  3108.427] (II) UnloadSubModule: "glxserver_nvidia"
[  3108.427] (II) Unloading glxserver_nvidia
[  3108.427] (II) UnloadSubModule: "wfb"
[  3108.427] (EE) Screen(s) found, but none have a usable configuration.
[  3108.427] (EE) 
Fatal server error:
[  3108.427] (EE) no screens found(EE) 
[  3108.427] (EE) 
Please consult the The X.Org Foundation support 
         at http://wiki.x.org
 for help. 
[  3108.427] (EE) Please also check the log file at "/home/user/.local/share/xorg/Xorg.0.log" for additional information.
[  3108.427] (EE) 
[  3108.427] (EE) Server terminated with error (1). Closing log file.
maxpain commented 1 year ago

@ehfd could you help me, please?

ehfd commented 1 year ago

Upgrade your driver. Use the latest minor release of each major release if you are in the 535 or 550 branch. Versions earlier than 535.113.01 or 550.67 have bugs.

maxpain commented 1 year ago

VIDEO_PORT to DP-0 perhaps.

I tried DP-O, DP-1, DP-2, DFP. Only "none" works.

ehfd commented 1 year ago

VIDEO_PORT to DP-0 perhaps.

"none" is not optimal. What's your environment?

maxpain commented 1 year ago

@ehfd Kubernetes cluster, nvidia-container-toolkit, NVIDIA device plugin, Talos Linux, RTX4090 with 535.86.05 nvidia driver.

ehfd commented 1 year ago

Similar issue with egl desktop. Perhaps an issue with driver 535.

maxpain commented 1 year ago

@ehfd Hmm, I don't have any issues with EGL desktop on 535.

maxpain commented 1 year ago

Mostly because we use Xvfb in EGL desktop variant, not Xorg.

ehfd commented 1 year ago

I reproduce the error... Immediate directive is NOT to upgrade to NVIDIA 535, yet.

ehfd commented 1 year ago

In NVIDIA 535.86.05 with Option "ModeDebug" "True" inserted in /etc/X11/xorg.conf for debugging: GPU extended capability check failed. is the key message.

[  2711.450] (II) NVIDIA(GPU-0): --- Building ModePool for DFP-1 ---
[  2711.450] (**) NVIDIA(GPU-0): Mode Validation Overrides for DFP-1:
[  2711.450] (**) NVIDIA(GPU-0):     NoMaxSizeCheck
[  2711.450] (**) NVIDIA(GPU-0):     NoVirtualSizeCheck
[  2711.450] (**) NVIDIA(GPU-0):     NoMaxPClkCheck
[  2711.450] (**) NVIDIA(GPU-0):     NoEdidMaxPClkCheck
[  2711.450] (**) NVIDIA(GPU-0):     NoHorizSyncCheck
[  2711.450] (**) NVIDIA(GPU-0):     NoVertRefreshCheck
[  2711.451] (**) NVIDIA(GPU-0):     NoExtendedGpuCapabilitiesCheck
[  2711.451] (**) NVIDIA(GPU-0):     NoTotalSizeCheck
[  2711.451] (**) NVIDIA(GPU-0):     NoDualLinkDVICheck
[  2711.451] (**) NVIDIA(GPU-0):     NoDisplayPortBandwidthCheck
[  2711.451] (**) NVIDIA(GPU-0):     AllowNon3DVisionModes
[  2711.451] (**) NVIDIA(GPU-0):     AllowNonEdidModes
[  2711.451] (**) NVIDIA(GPU-0):     AllowNonHDMI3DModes
[  2711.451] (**) NVIDIA(GPU-0):     NoEdidHDMI2Check
[  2711.451] (**) NVIDIA(GPU-0):     AllowDpInterlaced
(OMITTED)
[  2711.454] (WW) NVIDIA(GPU-0):   Validating Mode "1920x1080_60":
[  2711.454] (WW) NVIDIA(GPU-0):     Mode Source: X Configuration file ModeLine
[  2711.454] (WW) NVIDIA(GPU-0):     1920 x 1080 @ 60 Hz
[  2711.454] (WW) NVIDIA(GPU-0):       Pixel Clock      : 138.50 MHz
[  2711.454] (WW) NVIDIA(GPU-0):       HRes, HSyncStart : 1920, 1968
[  2711.454] (WW) NVIDIA(GPU-0):       HSyncEnd, HTotal : 2000, 2080
[  2711.454] (WW) NVIDIA(GPU-0):       VRes, VSyncStart : 1080, 1083
[  2711.454] (WW) NVIDIA(GPU-0):       VSyncEnd, VTotal : 1088, 1111
[  2711.454] (WW) NVIDIA(GPU-0):       Sync Polarity    : +H -V
[  2711.454] (WW) NVIDIA(GPU-0):     DualHead Mode: No
[  2711.454] (WW) NVIDIA(GPU-0):     Viewport
[  2711.454] (WW) NVIDIA(GPU-0):       Horizontal Taps
[  2711.454] (WW) NVIDIA(GPU-0):       Vertical Taps
[  2711.454] (WW) NVIDIA(GPU-0):     GPU extended capability check failed.
[  2711.454] (WW) NVIDIA(GPU-0):     Mode "1920x1080_60" is invalid.
[  2711.470] (WW) NVIDIA(GPU-0):   Validating Mode "1280x800_60":
[  2711.470] (WW) NVIDIA(GPU-0):     Mode Source: X Server
[  2711.470] (WW) NVIDIA(GPU-0):     1280 x 800 @ 60 Hz
[  2711.470] (WW) NVIDIA(GPU-0):       Pixel Clock      : 71.00 MHz
[  2711.470] (WW) NVIDIA(GPU-0):       HRes, HSyncStart : 1280, 1328
[  2711.470] (WW) NVIDIA(GPU-0):       HSyncEnd, HTotal : 1360, 1440
[  2711.470] (WW) NVIDIA(GPU-0):       VRes, VSyncStart :  800,  803
[  2711.470] (WW) NVIDIA(GPU-0):       VSyncEnd, VTotal :  809,  823
[  2711.470] (WW) NVIDIA(GPU-0):       Sync Polarity    : +H -V
[  2711.470] (WW) NVIDIA(GPU-0):     DualHead Mode: No
[  2711.470] (WW) NVIDIA(GPU-0):     Viewport
[  2711.470] (WW) NVIDIA(GPU-0):       Horizontal Taps
[  2711.470] (WW) NVIDIA(GPU-0):       Vertical Taps
[  2711.470] (WW) NVIDIA(GPU-0):     GPU extended capability check failed.
[  2711.470] (WW) NVIDIA(GPU-0):     Mode "1280x800_60" is invalid.
[  2711.471] (WW) NVIDIA(GPU-0):   Validating Mode "1920x1200_60":
[  2711.471] (WW) NVIDIA(GPU-0):     Mode Source: X Server
[  2711.471] (WW) NVIDIA(GPU-0):     1920 x 1200 @ 60 Hz
[  2711.471] (WW) NVIDIA(GPU-0):       Pixel Clock      : 154.00 MHz
[  2711.471] (WW) NVIDIA(GPU-0):       HRes, HSyncStart : 1920, 1968
[  2711.471] (WW) NVIDIA(GPU-0):       HSyncEnd, HTotal : 2000, 2080
[  2711.471] (WW) NVIDIA(GPU-0):       VRes, VSyncStart : 1200, 1203
[  2711.471] (WW) NVIDIA(GPU-0):       VSyncEnd, VTotal : 1209, 1235
[  2711.471] (WW) NVIDIA(GPU-0):       Sync Polarity    : +H -V
[  2711.471] (WW) NVIDIA(GPU-0):     DualHead Mode: No
[  2711.471] (WW) NVIDIA(GPU-0):     Viewport
[  2711.471] (WW) NVIDIA(GPU-0):       Horizontal Taps
[  2711.471] (WW) NVIDIA(GPU-0):       Vertical Taps
[  2711.471] (WW) NVIDIA(GPU-0):     GPU extended capability check failed.
[  2711.471] (WW) NVIDIA(GPU-0):     Mode "1920x1200_60" is invalid.
[  2711.472] (WW) NVIDIA(GPU-0):   Validating Mode "800x600_60":
[  2711.472] (WW) NVIDIA(GPU-0):     Mode Source: NVIDIA Predefined
[  2711.472] (WW) NVIDIA(GPU-0):     800 x 600 @ 60 Hz
[  2711.472] (WW) NVIDIA(GPU-0):       Pixel Clock      : 40.00 MHz
[  2711.472] (WW) NVIDIA(GPU-0):       HRes, HSyncStart :  800,  840
[  2711.472] (WW) NVIDIA(GPU-0):       HSyncEnd, HTotal :  968, 1056
[  2711.472] (WW) NVIDIA(GPU-0):       VRes, VSyncStart :  600,  601
[  2711.472] (WW) NVIDIA(GPU-0):       VSyncEnd, VTotal :  605,  628
[  2711.472] (WW) NVIDIA(GPU-0):       Sync Polarity    : +H +V
[  2711.472] (WW) NVIDIA(GPU-0):     DualHead Mode: No
[  2711.472] (WW) NVIDIA(GPU-0):     Viewport
[  2711.472] (WW) NVIDIA(GPU-0):       Horizontal Taps
[  2711.472] (WW) NVIDIA(GPU-0):       Vertical Taps
[  2711.472] (WW) NVIDIA(GPU-0):     GPU extended capability check failed.
[  2711.472] (WW) NVIDIA(GPU-0):     Mode "800x600_60" is invalid.
[  2711.472] (WW) NVIDIA(GPU-0):
[  2711.472] (EE) NVIDIA(GPU-0): Unable to add conservative default mode "nvidia-auto-select".
[  2711.472] (EE) NVIDIA(GPU-0): Unable to add "nvidia-auto-select" mode to ModePool.
[  2711.472] (WW) NVIDIA(0): No valid modes for "DFP-1:1920x1080R"; removing.
[  2711.472] (WW) NVIDIA(0):
[  2711.472] (WW) NVIDIA(0): Unable to validate any modes; falling back to the default mode
[  2711.472] (WW) NVIDIA(0):     "nvidia-auto-select".
[  2711.472] (WW) NVIDIA(0):
[  2711.472] (WW) NVIDIA(0): No valid modes for "DFP-1:nvidia-auto-select"; removing.
[  2711.472] (EE) NVIDIA(0): Unable to use default mode "nvidia-auto-select".
[  2711.472] (EE) NVIDIA(0): Failing initialization of X screen

Xorg.0.log xorg.conf.log

ehfd commented 1 year ago

Works up to 530.41.03.

X.Org X Server 1.21.1.4
X Protocol Version 11, Revision 0
Current Operating System: Linux xgl-test 5.4.0-153-generic #170-Ubuntu SMP Fri Jun 16 13:43:31 UTC 2023 x86_64
Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.4.0-153-generic root=UUID=b74b4d9b-e7b1-4dc6-be2e-bf94365e04ed ro maybe-ubiquity
xorg-server 2:21.1.4-2ubuntu1.7~22.04.1 (For technical support please see http://www.ubuntu.com/support)
Current version of pixman: 0.40.0
        Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/home/user/.local/share/xorg/Xorg.0.log", Time: Wed Aug  2 04:39:32 2023
(==) Using config file: "/etc/X11/xorg.conf"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
[746936.617] (II) NVIDIA(GPU-0):   Validating Mode "2560x1600_60":
[746936.617] (II) NVIDIA(GPU-0):     Mode Source: X Server
[746936.617] (II) NVIDIA(GPU-0):     2560 x 1600 @ 60 Hz
[746936.617] (II) NVIDIA(GPU-0):       Pixel Clock      : 268.50 MHz
[746936.617] (II) NVIDIA(GPU-0):       HRes, HSyncStart : 2560, 2608
[746936.617] (II) NVIDIA(GPU-0):       HSyncEnd, HTotal : 2640, 2720
[746936.617] (II) NVIDIA(GPU-0):       VRes, VSyncStart : 1600, 1603
[746936.617] (II) NVIDIA(GPU-0):       VSyncEnd, VTotal : 1609, 1646
[746936.617] (II) NVIDIA(GPU-0):       Sync Polarity    : +H -V
[746936.617] (II) NVIDIA(GPU-0):     Viewport                 2560x1600+0+0
[746936.617] (II) NVIDIA(GPU-0):       Horizontal Taps        1
[746936.617] (II) NVIDIA(GPU-0):       Vertical Taps          1
[746936.617] (II) NVIDIA(GPU-0):     Mode "2560x1600_60" is valid.
[746936.617] (II) NVIDIA(GPU-0):
[746936.617] (II) NVIDIA(GPU-0):   Validating Mode "1280x800d60":
[746936.617] (II) NVIDIA(GPU-0):     Mode Source: X Server
[746936.617] (II) NVIDIA(GPU-0):     1280 x 800 @ 60 Hz
[746936.617] (II) NVIDIA(GPU-0):       Pixel Clock      : 134.25 MHz
[746936.617] (II) NVIDIA(GPU-0):       HRes, HSyncStart : 1280, 1304
[746936.617] (II) NVIDIA(GPU-0):       HSyncEnd, HTotal : 1320, 1360
[746936.617] (II) NVIDIA(GPU-0):       VRes, VSyncStart :  800,  801
[746936.617] (II) NVIDIA(GPU-0):       VSyncEnd, VTotal :  804,  823
[746936.617] (II) NVIDIA(GPU-0):       Sync Polarity    : +H -V
[746936.617] (II) NVIDIA(GPU-0):       Extra            : DoubleScan
[746936.617] (II) NVIDIA(GPU-0):     Viewport                 1280x800+0+0
[746936.617] (II) NVIDIA(GPU-0):       Horizontal Taps        2
[746936.617] (II) NVIDIA(GPU-0):       Vertical Taps          2
[746936.617] (II) NVIDIA(GPU-0):     Mode "1280x800d60" is valid.
[746936.617] (II) NVIDIA(GPU-0):
[746936.617] (II) NVIDIA(GPU-0):   Validating Mode "2560x1600_60":
[746936.617] (II) NVIDIA(GPU-0):     Mode Source: X Server
[746936.617] (II) NVIDIA(GPU-0):     2560 x 1600 @ 60 Hz
[746936.617] (II) NVIDIA(GPU-0):       Pixel Clock      : 348.50 MHz
[746936.617] (II) NVIDIA(GPU-0):       HRes, HSyncStart : 2560, 2760
[746936.617] (II) NVIDIA(GPU-0):       HSyncEnd, HTotal : 3032, 3504
[746936.617] (II) NVIDIA(GPU-0):       VRes, VSyncStart : 1600, 1603
[746936.617] (II) NVIDIA(GPU-0):       VSyncEnd, VTotal : 1609, 1658
[746936.617] (II) NVIDIA(GPU-0):       Sync Polarity    : -H +V
[746936.617] (II) NVIDIA(GPU-0):     Viewport                 2560x1600+0+0
[746936.617] (II) NVIDIA(GPU-0):       Horizontal Taps        1
[746936.617] (II) NVIDIA(GPU-0):       Vertical Taps          1
[746936.617] (II) NVIDIA(GPU-0):     Mode "2560x1600_60" is valid.
[746936.617] (II) NVIDIA(GPU-0):
[746936.617] (II) NVIDIA(GPU-0):   Validating Mode "1280x800d60":
[746936.617] (II) NVIDIA(GPU-0):     Mode Source: X Server
[746936.617] (II) NVIDIA(GPU-0):     1280 x 800 @ 60 Hz
[746936.617] (II) NVIDIA(GPU-0):       Pixel Clock      : 174.25 MHz
[746936.617] (II) NVIDIA(GPU-0):       HRes, HSyncStart : 1280, 1380
[746936.617] (II) NVIDIA(GPU-0):       HSyncEnd, HTotal : 1516, 1752
[746936.617] (II) NVIDIA(GPU-0):       VRes, VSyncStart :  800,  801
[746936.617] (II) NVIDIA(GPU-0):       VSyncEnd, VTotal :  804,  829
[746936.617] (II) NVIDIA(GPU-0):       Sync Polarity    : -H +V
[746936.617] (II) NVIDIA(GPU-0):       Extra            : DoubleScan
[746936.617] (II) NVIDIA(GPU-0):     Viewport                 1280x800+0+0
[746936.617] (II) NVIDIA(GPU-0):       Horizontal Taps        2
[746936.617] (II) NVIDIA(GPU-0):       Vertical Taps          2
[746936.617] (II) NVIDIA(GPU-0):     Mode "1280x800d60" is valid.
[746936.617] (II) NVIDIA(GPU-0):
[746936.618] (II) NVIDIA(GPU-0): --- Done building ModePool for DFP-2 ---
[746936.618] (II) NVIDIA(GPU-0):
[746936.618] (II) NVIDIA(GPU-0): Frequency information for DFP-2:
[746936.618] (II) NVIDIA(GPU-0):   HorizSync   : 28.000-55.000 kHz
[746936.618] (II) NVIDIA(GPU-0):   VertRefresh : 43.000-72.000 Hz
[746936.618] (II) NVIDIA(GPU-0):     (HorizSync from Conservative Defaults)
[746936.618] (II) NVIDIA(GPU-0):     (VertRefresh from Conservative Defaults)

And in 525.60.13.

X.Org X Server 1.21.1.4
X Protocol Version 11, Revision 0
Current Operating System: Linux xgl-test 5.4.0-148-generic #165-Ubuntu SMP Tue Apr 18 08:53:12 UTC 2023 x86_64
Kernel command line: BOOT_IMAGE=/vmlinuz-5.4.0-148-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro
xorg-server 2:21.1.4-2ubuntu1.7~22.04.1 (For technical support please see http://www.ubuntu.com/support)
Current version of pixman: 0.40.0
        Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/home/user/.local/share/xorg/Xorg.0.log", Time: Wed Aug  2 04:44:04 2023
(==) Using config file: "/etc/X11/xorg.conf"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
ehfd commented 1 year ago

I've emailed the NVIDIA driver team. Waiting for response.

ehfd commented 1 year ago

TO OUR USERS:

Please send an email to linux-bugs@nvidia.com that you are a user of https://github.com/selkies-project/docker-nvidia-glx-desktop and that you are also affected by the below issue. This is the only way to accelerate the bug fix in the drivers, and if this issue is not fixed, this repository may not be usable on later drivers.

We have reproduced an issue that all of our users using the 535.86.05 drivers have also faced, where the "NoExtendedGpuCapabilitiesCheck" option in "ModeValidation" for xorg.conf is not honored in GeForce GPUs.

This is a new issue that has arised which did not exist in 530.xx, 525.xx, and any other earlier drivers, and is reproducible in every user using headless setups in GeForce (so far, all of 10xx, 20xx, and 30xx GPUs).

How to reproduce: In a using port with no monitor connected for ConnectedMonitor (e.g. DP-0) to enable XRandR, and use Option "ModeValidation" "NoMaxPClkCheck, NoEdidMaxPClkCheck, NoMaxSizeCheck, NoHorizSyncCheck, NoVertRefreshCheck, NoVirtualSizeCheck, NoExtendedGpuCapabilitiesCheck, NoTotalSizeCheck, NoDualLinkDVICheck, NoDisplayPortBandwidthCheck, AllowNon3DVisionModes, AllowNonHDMI3DModes, AllowNonEdidModes, NoEdidHDMI2Check, AllowDpInterlaced" to have the Modes pass the tests.

We have also turned on Option "ModeDebug" "True" for debugging.

Result:

[  2711.454] (WW) NVIDIA(GPU-0):   Validating Mode "1920x1080_60":
[  2711.454] (WW) NVIDIA(GPU-0):     Mode Source: X Configuration file ModeLine
[  2711.454] (WW) NVIDIA(GPU-0):     1920 x 1080 @ 60 Hz
[  2711.454] (WW) NVIDIA(GPU-0):       Pixel Clock      : 138.50 MHz
[  2711.454] (WW) NVIDIA(GPU-0):       HRes, HSyncStart : 1920, 1968
[  2711.454] (WW) NVIDIA(GPU-0):       HSyncEnd, HTotal : 2000, 2080
[  2711.454] (WW) NVIDIA(GPU-0):       VRes, VSyncStart : 1080, 1083
[  2711.454] (WW) NVIDIA(GPU-0):       VSyncEnd, VTotal : 1088, 1111
[  2711.454] (WW) NVIDIA(GPU-0):       Sync Polarity    : +H -V
[  2711.454] (WW) NVIDIA(GPU-0):     DualHead Mode: No
[  2711.454] (WW) NVIDIA(GPU-0):     Viewport
[  2711.454] (WW) NVIDIA(GPU-0):       Horizontal Taps
[  2711.454] (WW) NVIDIA(GPU-0):       Vertical Taps
[  2711.454] (WW) NVIDIA(GPU-0):     GPU extended capability check failed.
[  2711.454] (WW) NVIDIA(GPU-0):     Mode "1920x1080_60" is invalid.
[  2711.454] (WW) NVIDIA(GPU-0):

This is a behavior which does not coincide with the README documentation, and therefore has to be fixed.

------

On a separate note, there is a separate issue which is not a blocking issue (existed long before NVIDIA 535 drivers), where the HDMI or DVI (including the virtual DVI ports in supported Tesla/Datacenter GPUs where the maximum resolution is stuck at a maximum of 2560 x 1600 at 60 hz) ports are stuck at 165.0 MHz maximum pixel clock, and the "NoMaxPClkCheck" "ModeValidation" and related options are never honored. This makes headless GPUs with a "ConnectedMonitor" option on an HDMI or DVI port not able to use Modes above 1920x1200 at 60 hz resolutions.

[2363014.704] (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:33:0:0
[2363014.704] (--) NVIDIA(0):     DFP-0
[2363014.704] (--) NVIDIA(0):     DFP-1
[2363014.704] (--) NVIDIA(0):     DFP-2
[2363014.704] (--) NVIDIA(0):     DFP-3
[2363014.704] (--) NVIDIA(0):     DFP-4
[2363014.704] (--) NVIDIA(0):     DFP-5
[2363014.705] (**) NVIDIA(0): Using ConnectedMonitor string "DFP-0".
[2363014.707] (II) NVIDIA(0): NVIDIA GPU NVIDIA GeForce RTX 3090 (GA102-A) at PCI:33:0:0
[2363014.707] (II) NVIDIA(0):     (GPU-0)
[2363014.707] (--) NVIDIA(0): Memory: 25165824 kBytes
[2363014.707] (--) NVIDIA(0): VideoBIOS: 94.02.42.40.34
[2363014.707] (II) NVIDIA(0): Detected PCI Express Link width: 16X
[2363014.711] (--) NVIDIA(GPU-0): DFP-0: connected
[2363014.711] (--) NVIDIA(GPU-0): DFP-0: Internal TMDS
[2363014.711] (--) NVIDIA(GPU-0): DFP-0 Name Aliases:
[2363014.711] (--) NVIDIA(GPU-0):   DFP
[2363014.711] (--) NVIDIA(GPU-0):   DFP-0
[2363014.711] (--) NVIDIA(GPU-0):   DPY-0
[2363014.711] (--) NVIDIA(GPU-0):   HDMI-0
[2363014.712] (--) NVIDIA(GPU-0):   HDMI-0
[2363014.712] (--) NVIDIA(GPU-0):   Connector-3
[2363014.712] (--) NVIDIA(GPU-0): DFP-0: 165.0 MHz maximum pixel clock
[2363014.712] (--) NVIDIA(GPU-0):

[2363014.714] (WW) NVIDIA(GPU-0):   Validating Mode "1920x1440_60":
[2363014.714] (WW) NVIDIA(GPU-0):     Mode Source: VESA
[2363014.714] (WW) NVIDIA(GPU-0):     1920 x 1440 @ 60 Hz
[2363014.714] (WW) NVIDIA(GPU-0):       Pixel Clock      : 234.00 MHz
[2363014.714] (WW) NVIDIA(GPU-0):       HRes, HSyncStart : 1920, 2048
[2363014.714] (WW) NVIDIA(GPU-0):       HSyncEnd, HTotal : 2256, 2600
[2363014.714] (WW) NVIDIA(GPU-0):       VRes, VSyncStart : 1440, 1441
[2363014.714] (WW) NVIDIA(GPU-0):       VSyncEnd, VTotal : 1444, 1500
[2363014.714] (WW) NVIDIA(GPU-0):       Sync Polarity    : -H +V
[2363014.714] (WW) NVIDIA(GPU-0):     Mode is rejected: Unable to construct hardware-specific
[2363014.714] (WW) NVIDIA(GPU-0):     mode timings.
[2363014.714] (WW) NVIDIA(GPU-0):     GPU extended capability check failed.
[2363014.714] (WW) NVIDIA(GPU-0):     Mode "1920x1440_60" is invalid.
[2363014.714] (WW) NVIDIA(GPU-0):
[2363014.714] (WW) NVIDIA(GPU-0):   Validating Mode "1920x1440_75":
[2363014.714] (WW) NVIDIA(GPU-0):     Mode Source: VESA
[2363014.714] (WW) NVIDIA(GPU-0):     1920 x 1440 @ 75 Hz
[2363014.714] (WW) NVIDIA(GPU-0):       Pixel Clock      : 297.00 MHz
[2363014.714] (WW) NVIDIA(GPU-0):       HRes, HSyncStart : 1920, 2064
[2363014.714] (WW) NVIDIA(GPU-0):       HSyncEnd, HTotal : 2288, 2640
[2363014.714] (WW) NVIDIA(GPU-0):       VRes, VSyncStart : 1440, 1441
[2363014.714] (WW) NVIDIA(GPU-0):       VSyncEnd, VTotal : 1444, 1500
[2363014.714] (WW) NVIDIA(GPU-0):       Sync Polarity    : -H +V
[2363014.714] (WW) NVIDIA(GPU-0):     Mode is rejected: Unable to construct hardware-specific
[2363014.714] (WW) NVIDIA(GPU-0):     mode timings.
[2363014.714] (WW) NVIDIA(GPU-0):     GPU extended capability check failed.
[2363014.714] (WW) NVIDIA(GPU-0):     Mode "1920x1440_75" is invalid.
[2363014.714] (WW) NVIDIA(GPU-0):
[2363014.714] (WW) NVIDIA(GPU-0):   Validating Mode "2560x1440_60":
[2363014.714] (WW) NVIDIA(GPU-0):     Mode Source: X Configuration file ModeLine
[2363014.714] (WW) NVIDIA(GPU-0):     2560 x 1440 @ 60 Hz
[2363014.714] (WW) NVIDIA(GPU-0):       Pixel Clock      : 241.50 MHz
[2363014.714] (WW) NVIDIA(GPU-0):       HRes, HSyncStart : 2560, 2608
[2363014.714] (WW) NVIDIA(GPU-0):       HSyncEnd, HTotal : 2640, 2720
[2363014.714] (WW) NVIDIA(GPU-0):       VRes, VSyncStart : 1440, 1443
[2363014.714] (WW) NVIDIA(GPU-0):       VSyncEnd, VTotal : 1448, 1481
[2363014.714] (WW) NVIDIA(GPU-0):       Sync Polarity    : +H -V
[2363014.714] (WW) NVIDIA(GPU-0):     Mode is rejected: Unable to construct hardware-specific
[2363014.714] (WW) NVIDIA(GPU-0):     mode timings.
[2363014.714] (WW) NVIDIA(GPU-0):     GPU extended capability check failed.
[2363014.714] (WW) NVIDIA(GPU-0):     Mode "2560x1440_60" is invalid.

This separate note also does not coincide with the README documentation, this time originating way before the 535.xx drivers.
ehfd commented 1 year ago

https://forums.developer.nvidia.com/t/linux-535-xx-does-not-honor-modevalidation-making-headless-randr-usage-with-connectedmonitor-impossible/264034

ehfd commented 1 year ago

@maxpain https://forums.developer.nvidia.com/t/if-you-have-a-problem-please-read-this-first/27131

Could you (as well as everyone else affected) provide a nvidia-bug-report.log.gz after facing the error when running Xorg, either here or the NVIDIA forum post above?

As many people as possible is good.

ehfd commented 1 year ago

NVIDIA has added this issue to their internal tracker.

ehfd commented 1 year ago

From @xhejtman in the Discord:

what is the issue with nvidia drivers and no resolution available? We just tested 535 drivers on A10 gpu and it gets all resolutions available. Is that desktop card specific?

Perhaps it could be, or the new driver release fixed things. CC @maxpain

ehfd commented 1 year ago

Good news: NVIDIA said they found the source of the issue and they will ship the fix in the next release. Now, we have to pray that all of the issues have indeed been properly fixed.

bongole commented 1 year ago

Maybe this issue was fixed in 535.129.03 and 545.29.02. I tested the drivers on Ubuntu 22.04 with RTX 4060 Ti.

ehfd commented 1 year ago
Release highlights since 535.113.01:
Fixed a bug that could cause modes to fail validation when Option "ModeValidation" "NoExtendedGpuCapabilitiesCheck" is specified in xorg.conf.
Fixed a bug that could cause GPU memory utilization to be reported incorrectly for Multi-Instance GPU (MIG) partitions on Grace Hopper systems.
Fixed a bug that intermittently caused the display to freeze when resuming from suspend on some Ada GPUs.
Fixed a bug which could cause some DisplayPort monitors to flicker.
Fixed a bug that could cause monitors to flicker when the performance state changes on Turing GPUs.

Release highlights since 535.113.01:

Added experimental HDMI 10 bits per component support; enable by loading nvidia-modeset with hdmi_deepcolor=1.
Added support for the CTM, DEGAMMA_LUT, and GAMMA_LUT DRM-KMS CRTC properties. These are used by features such as the “Night Light” feature in GNOME and the “Night Color” feature in KDE, when they are used as Wayland compositors.
Added support for GeForce and Workstation GPUs to the open kernel modules. Please see the “Open Linux Kernel Modules” chapter in the README for details.
Added initial experimental support for runtime D3 (RTD3) power management on Desktop GPUs. Please see the ‘PCI-Express Runtime D3 (RTD3) Power Management’ chapter in the README for more details.
Added support for the EGL_ANDROID_native_fence_sync EGL extension and the VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_SYNC_FD_BIT and VK_EXTERNAL_FENCE_HANDLE_TYPE_SYNC_FD_BIT Vulkan external handle types when the nvidia-drm kernel module is loaded with the modeset=1 parameter.
Added experimental support for framebuffer consoles provided by nvidia-drm. On kernels that implement drm_fbdev_generic_setup and drm_aperture_remove_conflicting_pci_framebuffers, nvidia-drm will install a framebuffer console when loaded with both modeset=1 and fbdev=1 kernel module parameters. This will replace the Linux boot console driven by a system framebuffer driver such as efifb or vesafb.
Note that when an nvidia-drm framebuffer console is enabled, unloading nvidia-drm will cause the screen to turn off.
Updated nvidia-installer to allow installing the driver while an existing NVIDIA driver is already loaded.
Added support for virtual reality displays, such as the SteamVR platform, on Wayland compositors that support DRM leasing. Support requires xwayland version 22.1.0 and wayland-protocols version 1.22, or later. Tested on sway, minimum version 1.7 with wlroots version 0.15, and also on Kwin, minimum version 5.24.
Note: Before xwayland 23.2, there is a known issue with HDMI displays where the headset will fail to start a second time after closing SteamVR. This can be worked around by unplugging and replugging in the headset.
Fixed a bug that prevented VRR (Variable Refresh Rate) from working with Wayland.
Added support to the NVIDIA VDPAU driver for running in Xwayland. Please refer to the “Xwayland support in VDPAU” section of the README for further details.
Added libnvidia-gpucomp.so to the driver package. This is a helper library used for GPU shader compilation.
Removed libnvidia-vulkan-producer.so from the driver package. This helper library is no longer needed by the Wayland WSI.
Fixed a bug that intermittently caused the display to freeze when resuming from suspend on some Ada GPUs.
Fixed a bug that could cause monitors to flicker when the performance state changes on Turing GPUs.
Added support for HDR signaling via the HDR_OUTPUT_METADATA and Colorspace per-connector DRM properties when nvidia-drm is loaded with the modeset=1 parameter.
Added support for PRIME render offload to Vulkan Wayland WSI.
Fixed a bug that could cause modes to fail validation when Option "ModeValidation" "NoExtendedGpuCapabilitiesCheck" is specified in xorg.conf.
Fixed a bug which could cause some DisplayPort monitors to flicker.

It seems to be the case @bongole. I will check if all edge cases were addressed.

ehfd commented 1 year ago

@bongole What's the environment that made it work? Is it this container?

bongole commented 1 year ago

@ehfd

I tested below command on bare metal Ubuntu-22.04 server with RTX 4060 Ti.

docker run --gpus all -it --rm --tmpfs /dev/shm:rw -e SIZEW=1920 -e SIZEH=1080 -e REFRESH=60 -e DPI=96 -e CDEPTH=24 -e VIDEO_PORT=DFP -e PASSWD=mypasswd -e WEBRTC_ENCODER=nvh264enc -e BASIC_AUTH_PASSWORD=mypasswd -e ENABLE_HTTPS_WEB=true --network host ghcr.io/selkies-project/nvidia-glx-desktop:latest

OS Info:

$ uname -a
Linux gpu-server 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

$ nvidia-smi
Thu Nov  9 11:15:09 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.02              Driver Version: 545.29.02    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4060 Ti     Off | 00000000:01:00.0 Off |                  N/A |
| 32%   29C    P0              29W / 165W |      4MiB / 16380MiB |      3%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
ehfd commented 1 year ago

I cannot confirm on 535.129.03 because my testing node is currently broken. Information regarding this is appreciated.

ehfd commented 1 year ago

image image image

A kind user has also confirmed with version 535.129.03 for me. Issue resolved.

Conclusion: if you face this issue, Use Display Driver Versions >= 535.129.03 or 545.29.02, or <= 530.xx. Don't use headless drivers because they lack certain libraries.

ehfd commented 6 months ago

NVIDIA 550 drivers <= 550.5x have issues with Vulkan. Use 550.67 or higher.