Closed msmurphy closed 2 hours ago
If I were to guess, I'd say there's probably an issue with your nvidia-container-toolkit.
Other than that maybe the container is on a too old version of something...
Try debugging the container toolkit first by making sure you can run gpu related tasks in a container
Yeah so it turns out every time you upgrade your drivers you need to regenerate some things in the nvidia toolkit. The issue I'm having is the toolkit is segfaulting when I try to generate the file with. Since I'm on a pop os specific version of the toolkit my guess is the toolkit is out of date for the newest nvidia drivers.
mike@pop-os:~$ sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
INFO[0000] Selecting /dev/nvidia0 as /dev/nvidia0
INFO[0000] Selecting /dev/dri/card2 as /dev/dri/card2
WARN[0000] Could not locate /dev/dri/controlD66: pattern /dev/dri/controlD66 not found
INFO[0000] Selecting /dev/dri/renderD129 as /dev/dri/renderD129
INFO[0000] Selecting /var/run/nvidia-persistenced/socket as /var/run/nvidia-persistenced/socket
WARN[0000] Could not locate /var/run/nvidia-fabricmanager/socket: pattern /var/run/nvidia-fabricmanager/socket not found
WARN[0000] Could not locate /tmp/nvidia-mps: pattern /tmp/nvidia-mps not found
INFO[0000] Using driver version 560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libnvoptix.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libnvidia-wayland-client.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libnvidia-tls.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libnvidia-rtcore.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libnvidia-pkcs11-openssl3.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libnvidia-opticalflow.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libnvidia-opencl.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libnvidia-nvvm.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libnvidia-ngx.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libnvidia-ml.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libnvidia-gpucomp.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libnvidia-glsi.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libnvidia-glcore.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libnvidia-fbc.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libnvidia-encode.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libnvidia-eglcore.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libnvidia-cfg.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libnvidia-allocator.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libnvcuvid.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libcudadebugger.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libcuda.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libGLX_nvidia.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libGLESv2_nvidia.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.560.35.03
INFO[0000] found 64-bit driver lib: /lib/x86_64-linux-gnu/libEGL_nvidia.so.560.35.03
INFO[0000] found 32-bit driver lib: /lib/i386-linux-gnu/libnvidia-tls.so.560.35.03
INFO[0000] found 32-bit driver lib: /lib/i386-linux-gnu/libnvidia-ptxjitcompiler.so.560.35.03
INFO[0000] found 32-bit driver lib: /lib/i386-linux-gnu/libnvidia-opticalflow.so.560.35.03
INFO[0000] found 32-bit driver lib: /lib/i386-linux-gnu/libnvidia-opencl.so.560.35.03
INFO[0000] found 32-bit driver lib: /lib/i386-linux-gnu/libnvidia-nvvm.so.560.35.03
INFO[0000] found 32-bit driver lib: /lib/i386-linux-gnu/libnvidia-ml.so.560.35.03
INFO[0000] found 32-bit driver lib: /lib/i386-linux-gnu/libnvidia-gpucomp.so.560.35.03
INFO[0000] found 32-bit driver lib: /lib/i386-linux-gnu/libnvidia-glvkspirv.so.560.35.03
INFO[0000] found 32-bit driver lib: /lib/i386-linux-gnu/libnvidia-glsi.so.560.35.03
INFO[0000] found 32-bit driver lib: /lib/i386-linux-gnu/libnvidia-glcore.so.560.35.03
INFO[0000] found 32-bit driver lib: /lib/i386-linux-gnu/libnvidia-fbc.so.560.35.03
INFO[0000] found 32-bit driver lib: /lib/i386-linux-gnu/libnvidia-encode.so.560.35.03
INFO[0000] found 32-bit driver lib: /lib/i386-linux-gnu/libnvidia-eglcore.so.560.35.03
INFO[0000] found 32-bit driver lib: /lib/i386-linux-gnu/libnvcuvid.so.560.35.03
INFO[0000] found 32-bit driver lib: /lib/i386-linux-gnu/libcuda.so.560.35.03
INFO[0000] found 32-bit driver lib: /lib/i386-linux-gnu/libGLX_nvidia.so.560.35.03
INFO[0000] found 32-bit driver lib: /lib/i386-linux-gnu/libGLESv2_nvidia.so.560.35.03
INFO[0000] found 32-bit driver lib: /lib/i386-linux-gnu/libGLESv1_CM_nvidia.so.560.35.03
INFO[0000] found 32-bit driver lib: /lib/i386-linux-gnu/libEGL_nvidia.so.560.35.03
INFO[0000] Selecting /dev/nvidia-modeset as /dev/nvidia-modeset
INFO[0000] Selecting /dev/nvidia-uvm-tools as /dev/nvidia-uvm-tools
INFO[0000] Selecting /dev/nvidia-uvm as /dev/nvidia-uvm
INFO[0000] Selecting /dev/nvidiactl as /dev/nvidiactl
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x2c pc=0x512c22]
goroutine 1 [running]:
github.com/sirupsen/logrus.(*Logger).Logf(0xc0003fd990?, 0x1?, {0x6afa84?, 0x1?}, {0xc00019a868?, 0x203000?, 0x203000?})
/build/nvidia-container-toolkit-K2baJ7/nvidia-container-toolkit-1.12.1/gopath/pkg/mod/github.com/sirupsen/logrus@v1.9.0/logger.go:152 +0x22
github.com/sirupsen/logrus.(*Logger).Warnf(...)
/build/nvidia-container-toolkit-K2baJ7/nvidia-container-toolkit-1.12.1/gopath/pkg/mod/github.com/sirupsen/logrus@v1.9.0/logger.go:178
github.com/NVIDIA/nvidia-container-toolkit/internal/lookup.library.Locate({0x0, {0x6fd0c8, 0xc00012a190}, {0x6fd670, 0xc000144000}}, {0x6a857d, 0x14})
/build/nvidia-container-toolkit-K2baJ7/nvidia-container-toolkit-1.12.1/internal/lookup/library.go:60 +0x1ab
github.com/NVIDIA/nvidia-container-toolkit/internal/discover.(*mounts).Mounts(0xc00011c360)
/build/nvidia-container-toolkit-K2baJ7/nvidia-container-toolkit-1.12.1/internal/discover/mounts.go:71 +0x52b
github.com/NVIDIA/nvidia-container-toolkit/internal/discover.list.Mounts({{0xc000124280?, 0xc000138480?, 0xc00019ae78?}})
/build/nvidia-container-toolkit-K2baJ7/nvidia-container-toolkit-1.12.1/internal/discover/list.go:59 +0xa9
github.com/NVIDIA/nvidia-container-toolkit/internal/discover.list.Mounts({{0xc00012d800?, 0xc000138500?, 0x4?}})
/build/nvidia-container-toolkit-K2baJ7/nvidia-container-toolkit-1.12.1/internal/discover/list.go:59 +0xa9
github.com/NVIDIA/nvidia-container-toolkit/internal/discover.list.Mounts({{0xc000124340?, 0xc00012a3c0?, 0x80a2078500000000?}})
/build/nvidia-container-toolkit-K2baJ7/nvidia-container-toolkit-1.12.1/internal/discover/list.go:59 +0xa9
github.com/NVIDIA/nvidia-container-toolkit/internal/edits.FromDiscoverer({0x6fdaa0, 0xc00011e6a8})
/build/nvidia-container-toolkit-K2baJ7/nvidia-container-toolkit-1.12.1/internal/edits/edits.go:57 +0xc2
github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/cdi/generate.command.generateSpec({0x69c876?}, {0x0, 0x0}, {0xc000016180, 0x13}, {0x6fd918, 0xc00007e8a0})
/build/nvidia-container-toolkit-K2baJ7/nvidia-container-toolkit-1.12.1/cmd/nvidia-ctk/cdi/generate/generate.go:265 +0xdca
github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/cdi/generate.command.run({0x69bb76?}, 0xc0000e54f0?, 0xc0000b4460)
/build/nvidia-container-toolkit-K2baJ7/nvidia-container-toolkit-1.12.1/cmd/nvidia-ctk/cdi/generate/generate.go:136 +0x13f
github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/cdi/generate.command.build.func2(0xc0000e54a0?)
/build/nvidia-container-toolkit-K2baJ7/nvidia-container-toolkit-1.12.1/cmd/nvidia-ctk/cdi/generate/generate.go:74 +0x27
github.com/urfave/cli/v2.(*Command).Run(0xc000014c60, 0xc000012840)
/build/nvidia-container-toolkit-K2baJ7/nvidia-container-toolkit-1.12.1/gopath/pkg/mod/github.com/urfave/cli/v2@v2.3.0/command.go:163 +0x5bb
github.com/urfave/cli/v2.(*App).RunAsSubcommand(0xc0000cf040, 0xc000012780)
/build/nvidia-container-toolkit-K2baJ7/nvidia-container-toolkit-1.12.1/gopath/pkg/mod/github.com/urfave/cli/v2@v2.3.0/app.go:434 +0xc8a
github.com/urfave/cli/v2.(*Command).startApp(0xc000014b40, 0xc000012780)
/build/nvidia-container-toolkit-K2baJ7/nvidia-container-toolkit-1.12.1/gopath/pkg/mod/github.com/urfave/cli/v2@v2.3.0/command.go:278 +0x713
github.com/urfave/cli/v2.(*Command).Run(0xc000012540?, 0x3?)
/build/nvidia-container-toolkit-K2baJ7/nvidia-container-toolkit-1.12.1/gopath/pkg/mod/github.com/urfave/cli/v2@v2.3.0/command.go:94 +0xba
github.com/urfave/cli/v2.(*App).RunContext(0xc0000ced00, {0x6fddd0?, 0xc00001a0b0}, {0xc000012080, 0x4, 0x4})
/build/nvidia-container-toolkit-K2baJ7/nvidia-container-toolkit-1.12.1/gopath/pkg/mod/github.com/urfave/cli/v2@v2.3.0/app.go:313 +0xb48
github.com/urfave/cli/v2.(*App).Run(...)
/build/nvidia-container-toolkit-K2baJ7/nvidia-container-toolkit-1.12.1/gopath/pkg/mod/github.com/urfave/cli/v2@v2.3.0/app.go:224
main.main()
/build/nvidia-container-toolkit-K2baJ7/nvidia-container-toolkit-1.12.1/cmd/nvidia-ctk/main.go:85 +0x445
mike@pop-os:~$
Yeah, looks like that might be the case. I guess you should revert the Nvidia drivers in the meantime or temporarily run zwift on another gpu if your on a multi/dual gpu setup.
Or, you could try installing and maintaining zwift yourself through steam/proton or wine-bottles or lutris or ...
Was able to generate the file after upgrading the container toolkit. Had to add this entry in /etc/apt/preferences.d/pop-default-settings to prioritize the nvidia depot over the pop one. Went from version 1.12 to 1.16. My guess is 1.12 is incompatible with the latest driver. I haven't had a chance to try the zwift container yet, will test after work.
Package: *
Pin: origin nvidia.github.io
Pin-Priority: 1002
Was able to generate the file after upgrading the container toolkit. Had to add this entry in /etc/apt/preferences.d/pop-default-settings to prioritize the nvidia depot over the pop one. Went from version 1.12 to 1.16. My guess is 1.12 is incompatible with the latest driver. I haven't had a chance to try the zwift container yet, will test after work.
Package: * Pin: origin nvidia.github.io Pin-Priority: 1002
This 100% worked and I can run zwift in the the container now. It also fixed the performance issues I was having. I highly recommend anyone running pop os make this change.
Checklist
DEBUG=1 zwift
)Describe the issue
It's looking for the wrong nvidia library. I recently updated from driver 555 to 560. Running via podman or docker fails.
It outputs that it's looking for libEGL_nvidia.so.550.54.14. This doesn't exist on my machine anymore.
Distribution Details
OS: Pop!_OS 22.04 LTS x86_64 Kernel: 6.9.3-76060903-generic Shell: bash 5.1.16 Resolution: 3440x1440, 2560x1440 DE: GNOME 42.9 WM: Mutter Terminal: alacritty CPU: AMD Ryzen 9 9950X (32) @ 5.752GHz GPU: AMD ATI 17:00.0 Device 13c0 GPU: NVIDIA GeForce RTX 2070 SUPER Memory: 3797MiB / 31105MiB
Reproduction steps