k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
28.06k stars 2.35k forks source link

[Release-1.31] - Nvidia operator not working correctly #11088

Closed manuelbuil closed 3 weeks ago

manuelbuil commented 1 month ago

Backport fix for Nvidia operator not working correctly

VestigeJ commented 3 weeks ago

Environment Details

Reproduced using VERSION=v1.31.1+k3s1 Validated using COMMIT=221ab22ca911b548d7278afb0df7fca17d2fe596

Infrastructure

p3.2xlarge instance type
00:1e.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1)

sudo nvidia-smi
Mon Oct 21 23:02:38 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla V100-SXM2-16GB           Off |   00000000:00:1E.0 Off |                    0 |
| N/A   35C    P0             25W /  300W |       1MiB /  16384MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found //this will change very fast if you test with the vector-add image
+-----------------------------------------------------------------------------------------+

Node(s) CPU architecture, OS, and version:

Linux 6.4.0-150600.23.17-default x86_64 GNU/Linux PRETTY_NAME="SUSE Linux Enterprise Server 15 SP6"

Cluster Configuration:

NAME              STATUS   ROLES                       AGE   VERSION
ip-1-1-1-23       Ready    control-plane,etcd,master   53m   v1.31.1+k3s-221ab22c

Config.yaml:

node-external-ip: 1.1.1.23
token: YOUR_TOKEN_HERE
write-kubeconfig-mode: 644
debug: true
cluster-init: true
embedded-registry: true

Reproduction && Validation

``` $ curl https://get.k3s.io --output install-"k3s".sh $ sudo chmod +x install-"k3s".sh $ sudo groupadd --system etcd && sudo useradd -s /sbin/nologin --system -g etcd etcd $ sudo modprobe ip_vs_rr $ sudo modprobe ip_vs_wrr $ sudo modprobe ip_vs_sh $ sudo printf "on_oovm.panic_on_oom=0 \nvm.overcommit_memory=1 \nkernel.panic=10 \nkernel.panic_ps=1 \nkernel.panic_on_oops=1 \n" > ~/90-kubelet.conf $ sudo cp 90-kubelet.conf /etc/sysctl.d/ $ sudo systemctl restart systemd-sysctl $ sudo zypper ar https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo $ sudo zypper modifyrepo --enable nvidia-container-toolkit-experimental $ sudo zypper --gpg-auto-import-keys install -y nvidia-container-toolkit $ VERSION=v1.31.1+k3s1 $ sudo INSTALL_K3S_VERSION=$VERSION INSTALL_K3S_EXEC=server ./install-k3s.sh $ kg runtimeclass $ vim nvidia-pod.yaml $ k apply -f nvidia-pod.yaml $ nvidia-ctk cdi list //interestingly this still shows 0 devices sudo nvidia-ctk cdi list INFO[0000] Found 0 CDI devices $ sudo zypper addrepo --refresh 'https://developer.download.nvidia.com/compute/cuda/repos/sles15/x86_64/' NVIDIA $ sudo zypper --gpg-auto-import-keys refresh $ sudo zypper install -y nvidia-gl-G06 nvidia-video-G06 nvidia-compute-utils-G06 $ sudo reboot $ vim cuda-add.yaml $ k apply -f cuda-add.yaml //note this vector-add image still works on never drivers and different OS's it was more brittle in the past but the output quickly flashes across the nvidia-smi output so keep a watch for processes to change on the output page. $ k delete -f cuda-add.yaml $ k apply -f pytorch-gpu.yaml $ sudo cat /var/lib/rancher/k3s/agent/etc/containerd/config.toml $ COMMIT=221ab22ca911b548d7278afb0df7fca17d2fe596 $ sudo INSTALL_K3S_COMMIT=$COMMIT INSTALL_K3S_EXEC=server ./install-k3s.sh $ sudo cat /var/lib/rancher/k3s/agent/etc/containerd/config.toml $ kgp -A //ensure all running pods / node remain healthy that should be ``` **Results:** Before from existing release v1.31.1+k3s1 truncated down to only nvidia related entries $ sudo cat /var/lib/rancher/k3s/agent/etc/containerd/config.toml ``` [plugins."io.containerd.grpc.v1.cri".containerd.runtimes."nvidia"] runtime_type = "io.containerd.runc.v2" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes."nvidia".options] BinaryName = "/usr/local/nvidia/toolkit/nvidia-container-runtime" SystemdCgroup = true ``` Newest COMMIT ID installation now shows additional nvidia-cdi entries on config.toml ``` [plugins."io.containerd.grpc.v1.cri".containerd.runtimes."nvidia"] runtime_type = "io.containerd.runc.v2" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes."nvidia".options] BinaryName = "/usr/local/nvidia/toolkit/nvidia-container-runtime" SystemdCgroup = true [plugins."io.containerd.grpc.v1.cri".containerd.runtimes."nvidia-cdi"] runtime_type = "io.containerd.runc.v2" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes."nvidia-cdi".options] BinaryName = "/usr/local/nvidia/toolkit/nvidia-container-runtime.cdi" SystemdCgroup = true ``` Seems to be required now but isn't documented well on the k3s side yet $ cat operator.yaml ``` apiVersion: helm.cattle.io/v1 kind: HelmChart metadata: name: gpu-operator namespace: kube-system spec: repo: https://helm.ngc.nvidia.com/nvidia chart: gpu-operator targetNamespace: gpu-operator createNamespace: true valuesContent: |- toolkit: env: - name: CONTAINERD_SOCKET value: /run/k3s/containerd/containerd.sock ``` $ cat cuda-add.yaml ``` apiVersion: v1 kind: Pod metadata: name: test-cuda-vector-add spec: restartPolicy: "OnFailure" runtimeClassName: "nvidia" terminationGracePeriodSeconds: 15 containers: - name: vectoradd-cuda image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04" resources: limits: nvidia.com/gpu: 1 ``` $ cat nvidia-pod.yaml ``` cat nvidia-pod.yaml apiVersion: v1 kind: Pod metadata: name: nbody-gpu-benchmark namespace: default spec: restartPolicy: OnFailure runtimeClassName: nvidia containers: - name: cuda-container image: nvcr.io/nvidia/k8s/cuda-sample:nbody args: ["nbody", "-gpu", "-benchmark"] resources: limits: nvidia.com/gpu: 1 env: - name: NVIDIA_VISIBLE_DEVICES value: all - name: NVIDIA_DRIVER_CAPABILITIES value: all ``` $ cat pytorch-gpu.yaml ``` apiVersion: v1 kind: Pod metadata: name: pytorch-test spec: runtimeClassName: nvidia containers: - name: pytorch-container image: pytorch/pytorch:latest # Use the latest PyTorch image command: ["/bin/bash", "-c", "sleep infinity"] # Keeps the container running resources: limits: nvidia.com/gpu: 1 # If using GPUs, request a GPU env: - name: NVIDIA_VISIBLE_DEVICES value: all - name: NVIDIA_DRIVER_CAPABILITIES value: all ``` $ k exec --stdin --tty pytorch-test -- /bin/bash ``` root@pytorch-test:/workspace# mount | grep -i nvidia /dev/xvda3 on /usr/lib64/libnvidia-egl-gbm.so.1.1.1 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib64/libnvidia-egl-wayland.so.1.1.13 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /etc/vulkan/icd.d/nvidia_icd.json type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /etc/vulkan/implicit_layer.d/nvidia_layers.json type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/share/nvidia/nvoptix.bin type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/share/egl/egl_external_platform.d/10_nvidia_wayland.json type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/share/glvnd/egl_vendor.d/10_nvidia.json type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib64/xorg/modules/drivers/nvidia_drv.so type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib64/xorg/modules/extensions/libglxserver_nvidia.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) tmpfs on /proc/driver/nvidia type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=555,inode64) tmpfs on /etc/nvidia/nvidia-application-profiles-rc.d type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=555,inode64) /dev/xvda3 on /usr/bin/nvidia-smi type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/bin/nvidia-debugdump type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/bin/nvidia-persistenced type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/bin/nvidia-cuda-mps-control type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/bin/nvidia-cuda-mps-server type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libnvidia-gpucomp.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libnvidia-pkcs11.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libnvidia-pkcs11-openssl3.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libvdpau_nvidia.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/i386-linux-gnu/libnvidia-ml.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/i386-linux-gnu/libnvidia-opencl.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/i386-linux-gnu/libnvidia-gpucomp.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/i386-linux-gnu/libnvidia-ptxjitcompiler.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/i386-linux-gnu/libnvidia-nvvm.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/i386-linux-gnu/libnvidia-eglcore.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/i386-linux-gnu/libnvidia-glcore.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/i386-linux-gnu/libnvidia-tls.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/i386-linux-gnu/libnvidia-glsi.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/i386-linux-gnu/libnvidia-fbc.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/i386-linux-gnu/libGLX_nvidia.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/i386-linux-gnu/libEGL_nvidia.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/i386-linux-gnu/libGLESv2_nvidia.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/i386-linux-gnu/libGLESv1_CM_nvidia.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/xvda3 on /usr/lib/i386-linux-gnu/libnvidia-glvkspirv.so.560.35.03 type xfs (ro,nosuid,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota) devtmpfs on /dev/nvidiactl type devtmpfs (ro,nosuid,noexec,size=4096k,nr_inodes=7833885,mode=755,inode64) devtmpfs on /dev/nvidia-uvm type devtmpfs (ro,nosuid,noexec,size=4096k,nr_inodes=7833885,mode=755,inode64) devtmpfs on /dev/nvidia-uvm-tools type devtmpfs (ro,nosuid,noexec,size=4096k,nr_inodes=7833885,mode=755,inode64) devtmpfs on /dev/nvidia-modeset type devtmpfs (ro,nosuid,noexec,size=4096k,nr_inodes=7833885,mode=755,inode64) devtmpfs on /dev/nvidia0 type devtmpfs (ro,nosuid,noexec,size=4096k,nr_inodes=7833885,mode=755,inode64) proc on /proc/driver/nvidia/gpus/0000:00:1e.0 type proc (ro,nosuid,nodev,noexec,relatime) ```