Closed paulfantom closed 3 years ago
It looks like some of the logs got cleaned up and it's confusing containerd. Do you have anything that might be trying to rotate the pod logs out from under containerd?
You might try running k3s-killall.sh
and then rm -rf /var/log/pods
, followed by starting K3s again. This will of course terminate all running pods but might fix whatever containerd is struggling with.
There is nothing I can think of that would rotate logs. This is a fresh ubuntu 20.04 instance only with k3s and nvidia drivers installed.
I did remove everything from /var/log/pods
as wells as cleared whole /var/lib/rancher/k3s/agent
and /var/lib/kubelet
(actually done it few times in different order and steps). It did not help.
I also get other error messages from k3s which most likely relate to the issue:
Nov 04 11:21:08 metal01 k3s[930011]: E1104 11:21:08.364474 930011 pod_workers.go:836] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"nvidia-device-plugin-ctr\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=nvidia-device-plugin-ctr pod=nvidia-device-plugin-daemonset-twrj5_kube-system(07b07c46-45aa-4b4d-b30d-06054a939784)\"" pod="kube-system/nvidia-device-plugin-daemonset-twrj5" podUID=07b07c46-45aa-4b4d-b30d-06054a939784
Nov 04 11:21:08 metal01 k3s[930011]: I1104 11:21:08.573763 930011 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="64d8ac27c5d1500c469dc62279c24d1b536382579973328e1fd3e96a64ed2201"
Nov 04 11:21:09 metal01 k3s[930011]: W1104 11:21:09.140176 930011 manager.go:1176] Failed to process watch event {EventType:0 Name:/kubepods/pod07b07c46-45aa-4b4d-b30d-06054a939784/ff954893be5edf196f2ccdddd950da868060bd8de2cf7aa839894d6964835b23 WatchSource:0}: task ff954893be5edf196f2ccdddd950da868060bd8de2cf7aa839894d6964835b23 not found: not found
Nov 04 11:21:09 metal01 k3s[930011]: W1104 11:21:09.140229 930011 watcher.go:95] Error while processing event ("/sys/fs/cgroup/devices/kubepods/pod07b07c46-45aa-4b4d-b30d-06054a939784/64d8ac27c5d1500c469dc62279c24d1b536382579973328e1fd3e96a64ed2201": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/kubepods/pod07b07c46-45aa-4b4d-b30d-06054a939784/64d8ac27c5d1500c469dc62279c24d1b536382579973328e1fd3e96a64ed2201: no such file or directory
Nov 04 11:21:09 metal01 k3s[930011]: W1104 11:21:09.140293 930011 watcher.go:95] Error while processing event ("/sys/fs/cgroup/memory/kubepods/pod07b07c46-45aa-4b4d-b30d-06054a939784/64d8ac27c5d1500c469dc62279c24d1b536382579973328e1fd3e96a64ed2201": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/memory/kubepods/pod07b07c46-45aa-4b4d-b30d-06054a939784/64d8ac27c5d1500c469dc62279c24d1b536382579973328e1fd3e96a64ed2201: no such file or directory
Nov 04 11:21:09 metal01 k3s[930011]: W1104 11:21:09.140318 930011 watcher.go:95] Error while processing event ("/sys/fs/cgroup/cpu,cpuacct/kubepods/pod07b07c46-45aa-4b4d-b30d-06054a939784/64d8ac27c5d1500c469dc62279c24d1b536382579973328e1fd3e96a64ed2201": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/cpu,cpuacct/kubepods/pod07b07c46-45aa-4b4d-b30d-06054a939784/64d8ac27c5d1500c469dc62279c24d1b536382579973328e1fd3e96a64ed2201: no such file or directory
Nov 04 11:21:09 metal01 k3s[930011]: W1104 11:21:09.140389 930011 watcher.go:95] Error while processing event ("/sys/fs/cgroup/pids/kubepods/pod07b07c46-45aa-4b4d-b30d-06054a939784/64d8ac27c5d1500c469dc62279c24d1b536382579973328e1fd3e96a64ed2201": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/pids/kubepods/pod07b07c46-45aa-4b4d-b30d-06054a939784/64d8ac27c5d1500c469dc62279c24d1b536382579973328e1fd3e96a64ed2201: no such file or directory
Nov 04 11:21:09 metal01 k3s[930011]: W1104 11:21:09.140460 930011 watcher.go:95] Error while processing event ("/sys/fs/cgroup/blkio/kubepods/pod07b07c46-45aa-4b4d-b30d-06054a939784/64d8ac27c5d1500c469dc62279c24d1b536382579973328e1fd3e96a64ed2201": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/blkio/kubepods/pod07b07c46-45aa-4b4d-b30d-06054a939784/64d8ac27c5d1500c469dc62279c24d1b536382579973328e1fd3e96a64ed2201: no such file or directory
Stopping k3s does not stop the pods that are running. This is by design. What those logs are telling you is that you have deleted locations out from under running (or at least, created) pods. As suggested at https://github.com/k3s-io/k3s/issues/4391#issuecomment-960513374, to stop everything that can be started by k3s you should run k3s-killall.sh
. But since it appears that you have deleted locations that pods expect to exist you would likely be better off invoking k3s-uninstall.sh
and starting over.
Stopping k3s does not stop the pods that are running.
I know :) Nodes were drained before stopping k3s.
But since it appears that you have deleted locations that pods expect to exist you would likely be better off invoking k3s-uninstall.sh and starting over.
That's what I did and that's why the instance is fresh as I reinstalled the whole node.
Just to be clear, this is not my first rodeo with kubernetes and I tried multiple options before writing this issue. The solution of "have you tried turning this off and on again" was the first I did :)
The full chain of events on my side:
1) upgrade from 1.21 to 1.22 resulting in all pods failing to start due to missing containerd-shim
binary. This is because I was using nvidia runtime as default with runc v1.
1) Few tests to figure out what is going on (deleting directories, restarting k3s, etc.). This is when I found https://github.com/k3s-io/k3s/issues/4070
1) Removing custom containerd config.toml.tmpl
and using default configuration shipped with k3s
1) node drained, k3s restarted. All containers starting up apart from ones using nvidia runtime due to issue described here
1) Testing few different configurations of nvidia-device-plugin
pod, but issue described here seems to be preventing pod from startup
1) Node teardown to discard issues related to stale confiigurations. Installing only k3s and only nvidia drivers on new node.
1) Issue still persists.
Right now my gut is telling me that this may be something in the nvidia runtime itself and integration of this runtime with k3s. I tried running the pod with default runtimeClassName
and it works just fine (albeit without GPU access). However, setting runtimeClassName: nvidia
and recreating pod leads to errors regarding log messages and cgroups.
What is surprising to me is the fact that in 1.21 everything works just fine and 1.22 breaks completely for workloads needing nvidia GPU.
Hmm, I can't see why it would be required, but we did drop the v1 runtime in 1.22 since it's been deprecated for a while: https://github.com/k3s-io/k3s/pull/3903
The Nvidia plugin should work fine without it unless for some reason you had configured the legacy runtime type in your custom containerd toml?
https://github.com/k3s-io/k3s/issues/3105#issuecomment-906672797
experiencing exactly same issue, in 1.21.6 everything works correctly, 1.22.3 adds automatic nvidia-container-runtime detection, but every deployment requesting runtimeclass nvidia is crash looping
- gopro:deployment/gopro-vcr: container samba in error: &ContainerStateWaiting{Reason:CreateContainerError,Message:failed to get sandbox container task: no running task found: task d751f121e2ec5bee9b43b4c9698d43d31d7cb6cc68ccc59ae0c0b72200b16890 not found: not found,}
- gopro:pod/gopro-vcr-6c4ccfb587-x98k6: container samba in error: &ContainerStateWaiting{Reason:CreateContainerError,Message:failed to get sandbox container task: no running task found: task d751f121e2ec5bee9b43b4c9698d43d31d7cb6cc68ccc59ae0c0b72200b16890 not found: not found,}
- gopro:deployment/gopro-vcr: container samba is backing off waiting to restart
- gopro:pod/gopro-vcr-6c4ccfb587-x98k6: container samba is backing off waiting to restart
> [gopro-vcr-6c4ccfb587-x98k6 samba] failed to try resolving symlinks in path "/var/log/pods/gopro_gopro-vcr-6c4ccfb587-x98k6_a1325b13-67ab-401a-8919-2bb207641fc0/samba/1.log": lstat /var/log/pods/gopro_gopro-vcr-6c4ccfb587-x98k6_a1325b13-67ab-401a-8919-2bb207641fc0/samba/1.log: no such file or directory
- gopro:deployment/gopro-vcr failed. Error: container samba is backing off waiting to restart.
@hlacik can you attach the logs from k3s starting up on your node (specifically the runtime detection bit), along with the containerd configuration toml that it is generating?
@brandond
jtsna Ready control-plane,master 5h39m v1.22.3+k3s1
jtsnb Ready <none> 5h38m v1.22.3+k3s1
config.toml
root@jtsna-2111:/var/lib/rancher/k3s/agent/etc/containerd# cat config.toml
[plugins.opt]
path = "/var/lib/rancher/k3s/agent/containerd"
[plugins.cri]
stream_server_address = "127.0.0.1"
stream_server_port = "10010"
enable_selinux = false
sandbox_image = "rancher/mirrored-pause:3.1"
[plugins.cri.containerd]
snapshotter = "overlayfs"
disable_snapshot_annotations = true
[plugins.cri.cni]
bin_dir = "/var/lib/rancher/k3s/data/86a8c46cd5fe617d1c1c90d80222fa4b7e04e7da9b3caace8af4daf90fc5a699/bin"
conf_dir = "/var/lib/rancher/k3s/agent/etc/cni/net.d"
[plugins.cri.containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins.cri.containerd.runtimes."nvidia"]
runtime_type = "io.containerd.runc.v2"
[plugins.cri.containerd.runtimes."nvidia".options]
BinaryName = "/usr/bin/nvidia-container-runtime"
k3s.service log
-- Logs begin at Wed 2020-04-01 19:23:42 CEST, end at Thu 2021-11-11 22:29:56 CET. --
Nov 11 18:23:25 jtsna-2111 systemd[1]: Starting Lightweight Kubernetes...
Nov 11 18:23:25 jtsna-2111 sh[4246]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Nov 11 18:23:25 jtsna-2111 sh[4258]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
Nov 11 18:23:28 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:28+01:00" level=info msg="Starting k3s v1.22.3+k3s1 (61a2aab2)"
Nov 11 18:23:28 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:28+01:00" level=info msg="Cluster bootstrap already complete"
Nov 11 18:23:28 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:28+01:00" level=info msg="Configuring sqlite3 database connection pooling: maxIdleConns=2, maxOpenConns=0, connMaxLifetime=0s"
Nov 11 18:23:28 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:28+01:00" level=info msg="Configuring database table schema and indexes, this may take a moment..."
Nov 11 18:23:28 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:28+01:00" level=info msg="Database tables and indexes are up to date"
Nov 11 18:23:28 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:28+01:00" level=info msg="Kine available at unix://kine.sock"
Nov 11 18:23:28 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:28+01:00" level=info msg="Running kube-apiserver --advertise-port=6443 --allow-privileged=true --anonymous-auth=false --api-audiences=https://kubernetes.default.svc.carpc.local>
Nov 11 18:23:28 jtsna-2111 k3s[4297]: Flag --insecure-port has been deprecated, This flag has no effect now and will be removed in v1.24.
Nov 11 18:23:28 jtsna-2111 k3s[4297]: I1111 18:23:28.588795 4297 server.go:581] external host was not specified, using 172.16.15.8
Nov 11 18:23:28 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:28+01:00" level=info msg="Running kube-scheduler --authentication-kubeconfig=/var/lib/rancher/k3s/server/cred/scheduler.kubeconfig --authorization-kubeconfig=/var/lib/rancher/k>
Nov 11 18:23:28 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:28+01:00" level=info msg="Waiting for API server to become available"
Nov 11 18:23:28 jtsna-2111 k3s[4297]: I1111 18:23:28.640314 4297 server.go:175] Version: v1.22.3+k3s1
Nov 11 18:23:28 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:28+01:00" level=info msg="Running kube-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/var/lib/rancher/k3s/server/cred/controller.kubeconfig --author>
Nov 11 18:23:28 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:28+01:00" level=info msg="Running cloud-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/var/lib/rancher/k3s/server/cred/cloud-controller.kubeconfig ->
Nov 11 18:23:28 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:28+01:00" level=info msg="Node token is available at /var/lib/rancher/k3s/server/token"
Nov 11 18:23:28 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:28+01:00" level=info msg="To join node to cluster: k3s agent -s https://172.16.15.8:6443 -t ${NODE_TOKEN}"
Nov 11 18:23:28 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:28+01:00" level=info msg="Wrote kubeconfig /etc/rancher/k3s/k3s.yaml"
Nov 11 18:23:28 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:28+01:00" level=info msg="Run: k3s kubectl"
Nov 11 18:23:28 jtsna-2111 k3s[4297]: I1111 18:23:28.681567 4297 shared_informer.go:240] Waiting for caches to sync for node_authorizer
Nov 11 18:23:28 jtsna-2111 k3s[4297]: I1111 18:23:28.818950 4297 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesB>
Nov 11 18:23:28 jtsna-2111 k3s[4297]: I1111 18:23:28.819022 4297 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimRe>
Nov 11 18:23:28 jtsna-2111 k3s[4297]: I1111 18:23:28.822841 4297 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesB>
Nov 11 18:23:28 jtsna-2111 k3s[4297]: I1111 18:23:28.822893 4297 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimRe>
Nov 11 18:23:28 jtsna-2111 k3s[4297]: W1111 18:23:28.897222 4297 genericapiserver.go:455] Skipping API apiextensions.k8s.io/v1beta1 because it has no resources.
Nov 11 18:23:28 jtsna-2111 k3s[4297]: I1111 18:23:28.899801 4297 instance.go:278] Using reconciler: lease
Nov 11 18:23:28 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:28+01:00" level=info msg="certificate CN=jtsna signed by CN=k3s-server-ca@1636645919: notBefore=2021-11-11 15:51:59 +0000 UTC notAfter=2022-11-11 17:23:28 +0000 UTC"
Nov 11 18:23:28 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:28+01:00" level=info msg="certificate CN=system:node:jtsna,O=system:nodes signed by CN=k3s-client-ca@1636645919: notBefore=2021-11-11 15:51:59 +0000 UTC notAfter=2022-11-11 17:>
Nov 11 18:23:29 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:29+01:00" level=info msg="Module overlay was already loaded"
Nov 11 18:23:29 jtsna-2111 k3s[4297]: I1111 18:23:29.044782 4297 rest.go:130] the default service ipfamily for this cluster is: IPv4
Nov 11 18:23:29 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:29+01:00" level=info msg="Module br_netfilter was already loaded"
Nov 11 18:23:29 jtsna-2111 k3s[4297]: W1111 18:23:29.089015 4297 sysinfo.go:203] Nodes topology is not available, providing CPU topology
Nov 11 18:23:29 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:29+01:00" level=info msg="Set sysctl 'net/netfilter/nf_conntrack_max' to 131072"
Nov 11 18:23:29 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:29+01:00" level=info msg="Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400"
Nov 11 18:23:29 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:29+01:00" level=info msg="Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600"
Nov 11 18:23:29 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:29+01:00" level=info msg="Set sysctl 'net/ipv4/conf/all/forwarding' to 1"
Nov 11 18:23:29 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:29+01:00" level=info msg="Found nvidia container runtime at /usr/bin/nvidia-container-runtime"
Nov 11 18:23:29 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:29+01:00" level=info msg="Logging containerd to /var/lib/rancher/k3s/agent/containerd/containerd.log"
Nov 11 18:23:29 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:29+01:00" level=info msg="Running containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root >
Nov 11 18:23:29 jtsna-2111 k3s[4297]: W1111 18:23:29.850324 4297 genericapiserver.go:455] Skipping API authentication.k8s.io/v1beta1 because it has no resources.
Nov 11 18:23:29 jtsna-2111 k3s[4297]: W1111 18:23:29.855986 4297 genericapiserver.go:455] Skipping API authorization.k8s.io/v1beta1 because it has no resources.
Nov 11 18:23:29 jtsna-2111 k3s[4297]: W1111 18:23:29.944179 4297 genericapiserver.go:455] Skipping API certificates.k8s.io/v1beta1 because it has no resources.
Nov 11 18:23:29 jtsna-2111 k3s[4297]: W1111 18:23:29.953629 4297 genericapiserver.go:455] Skipping API coordination.k8s.io/v1beta1 because it has no resources.
Nov 11 18:23:29 jtsna-2111 k3s[4297]: W1111 18:23:29.448543 4297 genericapiserver.go:455] Skipping API networking.k8s.io/v1beta1 because it has no resources.
Nov 11 18:23:29 jtsna-2111 k3s[4297]: W1111 18:23:29.460480 4297 genericapiserver.go:455] Skipping API node.k8s.io/v1alpha1 because it has no resources.
Nov 11 18:23:29 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:29+01:00" level=info msg="Waiting for containerd startup: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /run/k3s/cont>
Nov 11 18:23:29 jtsna-2111 k3s[4297]: W1111 18:23:29.618490 4297 genericapiserver.go:455] Skipping API rbac.authorization.k8s.io/v1beta1 because it has no resources.
Nov 11 18:23:29 jtsna-2111 k3s[4297]: W1111 18:23:29.618534 4297 genericapiserver.go:455] Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources.
Nov 11 18:23:29 jtsna-2111 k3s[4297]: W1111 18:23:29.624428 4297 genericapiserver.go:455] Skipping API scheduling.k8s.io/v1beta1 because it has no resources.
Nov 11 18:23:29 jtsna-2111 k3s[4297]: W1111 18:23:29.624468 4297 genericapiserver.go:455] Skipping API scheduling.k8s.io/v1alpha1 because it has no resources.
Nov 11 18:23:29 jtsna-2111 k3s[4297]: W1111 18:23:29.640436 4297 genericapiserver.go:455] Skipping API storage.k8s.io/v1alpha1 because it has no resources.
Nov 11 18:23:29 jtsna-2111 k3s[4297]: W1111 18:23:29.648652 4297 genericapiserver.go:455] Skipping API flowcontrol.apiserver.k8s.io/v1alpha1 because it has no resources.
Nov 11 18:23:29 jtsna-2111 k3s[4297]: W1111 18:23:29.669939 4297 genericapiserver.go:455] Skipping API apps/v1beta2 because it has no resources.
Nov 11 18:23:29 jtsna-2111 k3s[4297]: W1111 18:23:29.670005 4297 genericapiserver.go:455] Skipping API apps/v1beta1 because it has no resources.
Nov 11 18:23:29 jtsna-2111 k3s[4297]: W1111 18:23:29.677042 4297 genericapiserver.go:455] Skipping API admissionregistration.k8s.io/v1beta1 because it has no resources.
Nov 11 18:23:29 jtsna-2111 k3s[4297]: I1111 18:23:29.690538 4297 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesB>
Nov 11 18:23:29 jtsna-2111 k3s[4297]: I1111 18:23:29.690589 4297 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimRe>
Nov 11 18:23:29 jtsna-2111 k3s[4297]: W1111 18:23:29.705227 4297 genericapiserver.go:455] Skipping API apiregistration.k8s.io/v1beta1 because it has no resources.
Nov 11 18:23:30 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:30+01:00" level=error msg="runtime core not ready"
Nov 11 18:23:30 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:30+01:00" level=info msg="Containerd is now running"
Nov 11 18:23:30 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:30+01:00" level=info msg="Connecting to proxy" url="wss://127.0.0.1:6443/v1-k3s/connect"
Nov 11 18:23:30 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:30+01:00" level=info msg="Handling backend connection request [jtsna]"
Nov 11 18:23:30 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:30+01:00" level=info msg="Running kubelet --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=cgroupfs --c>
Nov 11 18:23:30 jtsna-2111 k3s[4297]: Flag --cloud-provider has been deprecated, will be removed in 1.23, in favor of removing cloud provider code from Kubelet.
Nov 11 18:23:30 jtsna-2111 k3s[4297]: Flag --cni-bin-dir has been deprecated, will be removed along with dockershim.
Nov 11 18:23:30 jtsna-2111 k3s[4297]: Flag --cni-conf-dir has been deprecated, will be removed along with dockershim.
Nov 11 18:23:30 jtsna-2111 k3s[4297]: Flag --containerd has been deprecated, This is a cadvisor flag that was mistakenly registered with the Kubelet. Due to legacy concerns, it will follow the standard CLI deprecation timeline before bei>
Nov 11 18:23:30 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:30+01:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6443/v1-k3s/readyz: 500 Internal Server Error"
Nov 11 18:23:30 jtsna-2111 k3s[4297]: I1111 18:23:30.692377 4297 server.go:436] "Kubelet version" kubeletVersion="v1.22.3+k3s1"
Nov 11 18:23:30 jtsna-2111 k3s[4297]: I1111 18:23:30.757884 4297 dynamic_cafile_content.go:155] "Starting controller" name="client-ca-bundle::/var/lib/rancher/k3s/agent/client-ca.crt"
Nov 11 18:23:35 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:35+01:00" level=error msg="runtime core not ready"
Nov 11 18:23:35 jtsna-2111 k3s[4297]: time="2021-11-11T18:23:35+01:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6443/v1-k3s/readyz: 500 Internal Server Error"
Nov 11 18:23:35 jtsna-2111 k3s[4297]: W1111 18:23:35.790140 4297 sysinfo.go:203] Nodes topology is not available, providing CPU topology
Nov 11 18:23:35 jtsna-2111 k3s[4297]: I1111 18:23:35.798286 4297 server.go:687] "--cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /"
Nov 11 18:23:35 jtsna-2111 k3s[4297]: I1111 18:23:35.800477 4297 container_manager_linux.go:280] "Container manager verified user specified cgroup-root exists" cgroupRoot=[]
Nov 11 18:23:35 jtsna-2111 k3s[4297]: I1111 18:23:35.800713 4297 container_manager_linux.go:285] "Creating Container Manager object based on Node Config" nodeConfig={RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: Container>
Nov 11 18:23:35 jtsna-2111 k3s[4297]: I1111 18:23:35.804549 4297 topology_manager.go:133] "Creating topology manager with policy per scope" topologyPolicyName="none" topologyScopeName="container"
Nov 11 18:23:35 jtsna-2111 k3s[4297]: I1111 18:23:35.804611 4297 container_manager_linux.go:320] "Creating device plugin manager" devicePluginEnabled=true
Nov 11 18:23:35 jtsna-2111 k3s[4297]: I1111 18:23:35.805290 4297 state_mem.go:36] "Initialized new in-memory state store"
Nov 11 18:23:35 jtsna-2111 k3s[4297]: I1111 18:23:35.807245 4297 kubelet.go:418] "Attempting to sync node with API server"
Nov 11 18:23:35 jtsna-2111 k3s[4297]: I1111 18:23:35.807356 4297 kubelet.go:279] "Adding static pod path" path="/var/lib/rancher/k3s/agent/pod-manifests"
Nov 11 18:23:35 jtsna-2111 k3s[4297]: I1111 18:23:35.808649 4297 kubelet.go:290] "Adding apiserver pod source"
Nov 11 18:23:35 jtsna-2111 k3s[4297]: I1111 18:23:35.811299 4297 apiserver.go:42] "Waiting for node sync before watching apiserver pods"
Nov 11 18:23:35 jtsna-2111 k3s[4297]: I1111 18:23:35.823719 4297 kuberuntime_manager.go:245] "Container runtime initialized" containerRuntime="containerd" version="v1.5.7-k3s2" apiVersion="v1alpha2"
Nov 11 18:23:35 jtsna-2111 k3s[4297]: I1111 18:23:35.831055 4297 server.go:1213] "Started kubelet"
Nov 11 18:23:35 jtsna-2111 k3s[4297]: I1111 18:23:35.833142 4297 server.go:149] "Starting to listen" address="0.0.0.0" port=10250
Nov 11 18:23:35 jtsna-2111 k3s[4297]: I1111 18:23:35.835980 4297 server.go:409] "Adding debug handlers to kubelet server"
Nov 11 18:23:35 jtsna-2111 k3s[4297]: I1111 18:23:35.842247 4297 secure_serving.go:266] Serving securely on 127.0.0.1:6444
@brandond to me generated config.toml seem fine this is what i was using on 1.21.6 i was adding this manually via config.tml.tmpl
root@jtsnb-2111:~# cat /var/lib/rancher/k3s/agent/etc/containerd/config.toml
[plugins.opt]
path = "/var/lib/rancher/k3s/agent/containerd"
[plugins.cri]
stream_server_address = "127.0.0.1"
stream_server_port = "10010"
enable_selinux = false
sandbox_image = "rancher/pause:3.1"
[plugins.cri.containerd]
disable_snapshot_annotations = true
snapshotter = "overlayfs"
[plugins.cri.cni]
bin_dir = "/var/lib/rancher/k3s/data/e265ce840ebe0eaaebfc0eba8cac0a94057c6bccadc5a194b2db1b07e65f63a0/bin"
conf_dir = "/var/lib/rancher/k3s/agent/etc/cni/net.d"
[plugins.cri.containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
# BEGIN nvidia-container-runtime
[plugins.cri.containerd.runtimes.nvidia]
runtime_type = "io.containerd.runc.v2"
[plugins.cri.containerd.runtimes.nvidia.options]
BinaryName = "/usr/bin/nvidia-container-runtime"
# END nvidia-container-runtime
and it was working. Seems identical except "" which i think is no difference
also i want to note, that this is same OS (ubuntu 20.04 on arm64) , i have removed 1.21.5 via k3s-uninstall.sh , and installed fresh 1.22.3 , so i can confirm it has nothing to do with OS configuration/packages
this is runtimeclass with handler nvidia:
apiVersion: node.k8s.io/v1 # RuntimeClass is defined in the node.k8s.io API group
kind: RuntimeClass
metadata:
name: nvidia # The name the RuntimeClass will be referenced by
# RuntimeClass is a non-namespaced resource
handler: nvidia # The name of the corresponding CRI configuration
, which i am using in deployments when i want to use nvidia-container-runtime
apiVersion: apps/v1
kind: Deployment
metadata:
name: vcr
spec:
selector:
matchLabels:
app: vcr
template:
metadata:
labels:
app: vcr
spec:
imagePullSecrets:
- name: registry-pcr-docker
terminationGracePeriodSeconds: 10
runtimeClassName: nvidia
containers:
- name: vcr-0
image: vcr
args:
- --rhost=dev-basler
- --key=basler:0
- --chunk_duration=60
volumeMounts:
- mountPath: /videos
name: videos
- mountPath: /tmp/argus_socket
name: argus
volumes:
- name: videos
persistentVolumeClaim:
claimName: videos
- name: argus
hostPath:
path: /tmp/argus_socket
Small log update, I am getting the following events on nvidia device plugin v0.10.0 pod start (using nvidia runtimeclass):
Normal Scheduled 95m default-scheduler Successfully assigned kube-system/nvidia-device-plugin-daemonset-h4v5c to metal01
Normal Pulling 92m kubelet Pulling image "nvidia/k8s-device-plugin:v0.10.0"
Normal Pulled 88m kubelet Successfully pulled image "nvidia/k8s-device-plugin:v0.10.0" in 3m35.07358951s
Warning Failed 88m kubelet Error: failed to get sandbox container task: no running task found: container not created: not found
Warning Failed 88m kubelet Error: failed to create containerd task: failed to create shim: OCI runtime create failed: container_linux.go:364: creating new parent process caused: container_linux.go:2005: running lstat on namespace path "/proc/4076575/ns/ipc" caused: lstat /proc/4076575/ns/ipc: no such file or directory: unknown
Normal Pulled 88m (x2 over 88m) kubelet Container image "nvidia/k8s-device-plugin:v0.10.0" already present on machine
Normal Created 88m (x2 over 88m) kubelet Created container nvidia-device-plugin-ctr
Warning Failed 88m kubelet Error: sandbox container "437bb0ac4e63c34e8a678754a1ac4dd71d72cf2ab27a5555aed5b46f193f849b" is not running
Warning BackOff 88m (x7 over 88m) kubelet Back-off restarting failed container
Normal SandboxChanged 88m (x9 over 88m) kubelet Pod sandbox changed, it will be killed and re-created.
Warning FailedSync 83m kubelet error determining status: rpc error: code = NotFound desc = an error occurred when try to find sandbox: not found
Normal Pulled 83m kubelet Container image "nvidia/k8s-device-plugin:v0.10.0" already present on machine
Normal Created 83m kubelet Created container nvidia-device-plugin-ctr
Warning Failed 83m kubelet Error: sandbox container "ea086ddd278fac26f2d55d82f0f4cafb41e7582796a50d199c1971896b0a886b" is not running
Normal SandboxChanged 78m (x278 over 83m) kubelet Pod sandbox changed, it will be killed and re-created.
Warning BackOff 73m (x523 over 83m) kubelet Back-off restarting failed container
Normal Created 70m kubelet Created container nvidia-device-plugin-ctr
Warning Failed 70m kubelet Error: failed to create containerd task: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:402: getting the final child's pid from pipe caused: EOF: unknown
Normal Pulled 70m (x2 over 70m) kubelet Container image "nvidia/k8s-device-plugin:v0.10.0" already present on machine
Warning BackOff 10m (x2792 over 70m) kubelet Back-off restarting failed container
Normal SandboxChanged 50s (x3385 over 70m) kubelet Pod sandbox changed, it will be killed and re-created.
I can confirm I'm having the same issue here. Works fine on v1.21.6
, but does not work on v1.22.3
.
I suspect that perhaps the nvidia device plugin isn't compatible with containerd 1.5?
@kralicky have you tried this out at all?
I have seen the exact errors @paulfantom has and I believe this is related to the clone3/seccomp updates that are in the latest containerd. The workaround for now is to make all pods which use the nvidia container runtime privileged. It is possible that this is an issue on nvidia's end but I am not 100% sure if that is the case.
I can confirm that workaround suggested by @kralicky is working. :+1:
@kralicky we recently saw the clone3/seccomp
update issue on updated docker
packages on Ubuntu (see nvidia-container-runtime#157). We have published updated packages for the NVIDIA Container Toolkit (including the nvidia-container-runtime
) to our experimental package repositories and will be promoting these to stable
in the near future.
As an alternative to running the containers as privileged, you could update the nvidia-container-toolkit
to at least 1.6.0-rc.2
.
I can confirm that with 1.6.0~rc.3-1
the issue is gone. As such I am closing this bug report. Thank you everyone for your feedback and helping me solving this issue! :100: :+1:
Just in case someone tries with k3s v1.23.8+k3s2
After installing driver and nvidia-container-toolkit, put only this inside the config.toml
sudo mkdir /var/lib/rancher/k3s/agent/etc/containerd
sudo vim /var/lib/rancher/k3s/agent/etc/containerd/config.toml
(Copied from https://github.com/NVIDIA/k8s-device-plugin#configure-containerd)
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "nvidia"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
BinaryName = "/usr/bin/nvidia-container-runtime
And of course add the nvidia-device-plugin:
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.12.2/nvidia-device-plugin.yml
Add a RuntimeClass:
apiVersion: node.k8s.io/v1 # RuntimeClass is defined in the node.k8s.io API group
kind: RuntimeClass
metadata:
name: nvidia # The name the RuntimeClass will be referenced by
# RuntimeClass is a non-namespaced resource
handler: nvidia # The name of the corresponding CRI configuration
Add a gpu pod:
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
namespace: gpus
spec:
restartPolicy: OnFailure
runtimeClassName: nvidia
containers:
- name: cuda-container
image: nvidia/cuda:11.0-base
command: ["nvidia-smi"]
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 GPU
Providing a containerd config template is only necessary if you want to change the default runtime. If you're using runtimeClassName, all you should need to do is install the runtime package for your OS, then restart K3s.
@brandond Now I've tried to run the pod without the runtimeClassName and given the containerd config template:
Pod crashloopbackoffs with:
Error: failed to create containerd task: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: exec: "nvidia-smi": executable file not found in $PATH: unknown
I would assume that:
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "nvidia"
this setting makes nvidia the default runtime and is enough, but somehow
runtimeClassName: nvidia
does more, like setting the correct binary path for nvidia-smi
I spent lot of time on this and I finally managed to get it to work. @FischerLGLN got close but the issue is you are not suppose to modify /var/lib/rancher/k3s/agent/etc/containerd/config.toml and you suppose to use the ....config.toml.tmpl according to https://rancher.com/docs/k3s/latest/en/advanced/
This is what I did to get it to work
# Install gpg key from nvidia
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
# Install Drivers. Use the latest drivers!
apt-get update && apt-get install -y nvidia-driver-515-server nvidia-container-toolkit nvidia-modprobe
reboot
# Check whether GPU recognized. You might have to restart the node after the driver installation to get this working
nvidia-smi
# Download template from k3d project
sudo wget https://k3d.io/v5.4.1/usage/advanced/cuda/config.toml.tmpl -O /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl
# Install nvidia plugin. This is optional. You can simply pass in the env variables to pass in GPU access to pods. However this is a nice way to debug whether you have access to the GPUs on contianerd
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.12.2/nvidia-device-plugin.yml
Try running nvidia plugin and check the logs. If you see this on the correct node then you have to modify the .tmpl file further.
2022/07/25 18:14:19 Initializing NVML.
2022/07/25 18:14:19 Failed to initialize NVML: could not load NVML library.
2022/07/25 18:14:19 If this is a GPU node, did you set the docker default runtime to `nvidia`
This used to work previously but with K3S v1.23+ I had issues. You will have to modify /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl
add:
[plugins.cri.containerd.runtimes.runc.options]
BinaryName = "/usr/bin/nvidia-container-runtime"
and modify
[plugins.cri.containerd.runtimes.runc]
runtime_type = "io.containerd.runtime.v1.linux"
to
[plugins.cri.containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
This should get everything running. Then either use the nvidia plugin to define resources or if you like me and want to share the GPU accross multiple pods then just add these ENV variables to your pod
NVIDIA_VISIBLE_DEVICES: all
NVIDIA_DRIVER_CAPABILITIES: all
Modifying the containerd config template is not necessary. K3s will automatically add runtimes to the containerd config if the nvidia binaries are present on the node when k3s is started. All you need to do is use the RuntimeClass and Pod specs shown in https://github.com/k3s-io/k3s/issues/4391#issuecomment-1181707242
@brandond hmm I'm happy to test again but I can guarantee you that it doesn't work out of the box
Which part of it doesn't work? The runtime binary detection and addition of runtimes to the containerd config, or your runtimeclass/pod spec making use of it?
Adding the runtimes to the containerd config.
I had the RuntimeClass and the pod configured in a fresh installation. Pod started but without the actual GPU exposed. Until I added config.toml.tmpl
file.
The current code checks /usr/bin
and /usr/local/nvidia/toolkit
for nvidia-container-runtime
and nvidia-container-runtime-experimental
binaries. I can confirm that k3s finds and adds runtimes for these if they are present. Note that it does NOT change the default runtime and does NOT add a RuntimeClass for you; it is up to you to create one with the correct name and reference it from your pod.
Can you verify the version of k3s you're using, and that you're using the expected binary paths?
[root@centos01 ~]# ln -s /usr/bin/true /usr/bin/nvidia-container-runtime
[root@centos01 ~]# curl -ksL get.k3s.io | sh -
[INFO] Finding release for channel stable
[INFO] Using v1.24.3+k3s1 as release
[INFO] Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.24.3+k3s1/sha256sum-amd64.txt
[INFO] Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.24.3+k3s1/k3s
[INFO] Verifying binary download
[INFO] Installing k3s to /usr/local/bin/k3s
[INFO] Creating /usr/local/bin/kubectl symlink to k3s
[INFO] Creating /usr/local/bin/crictl symlink to k3s
[INFO] Creating /usr/local/bin/ctr symlink to k3s
[INFO] Creating killall script /usr/local/bin/k3s-killall.sh
[INFO] Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO] env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO] systemd: Creating service file /etc/systemd/system/k3s.service
[INFO] systemd: Enabling k3s unit
Created symlink from /etc/systemd/system/multi-user.target.wants/k3s.service to /etc/systemd/system/k3s.service.
[INFO] systemd: Starting k3s
[root@centos01 ~]# cat /var/lib/rancher/k3s/agent/etc/containerd/config.toml
[plugins.opt]
path = "/var/lib/rancher/k3s/agent/containerd"
[plugins.cri]
stream_server_address = "127.0.0.1"
stream_server_port = "10010"
enable_selinux = false
enable_unprivileged_ports = true
enable_unprivileged_icmp = true
sandbox_image = "rancher/mirrored-pause:3.6"
[plugins.cri.containerd]
snapshotter = "overlayfs"
disable_snapshot_annotations = true
[plugins.cri.cni]
bin_dir = "/var/lib/rancher/k3s/data/1d787a9b6122e3e3b24afe621daa97f895d85f2cb9cc66860ea5ff973b5c78f2/bin"
conf_dir = "/var/lib/rancher/k3s/agent/etc/cni/net.d"
[plugins.cri.containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins.cri.containerd.runtimes.runc.options]
SystemdCgroup = false
[plugins.cri.containerd.runtimes."nvidia"]
runtime_type = "io.containerd.runc.v2"
[plugins.cri.containerd.runtimes."nvidia".options]
BinaryName = "/usr/bin/nvidia-container-runtime"
I spent lot of time on this and I finally managed to get it to work. @FischerLGLN got close but the issue is you are not suppose to modify /var/lib/rancher/k3s/agent/etc/containerd/config.toml and you suppose to use the ....config.toml.tmpl according to https://rancher.com/docs/k3s/latest/en/advanced/
This is what I did to get it to work
# Install gpg key from nvidia distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add - \ && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list # Install Drivers. Use the latest drivers! apt-get update && apt-get install -y nvidia-driver-515-server nvidia-container-toolkit nvidia-modprobe reboot # Check whether GPU recognized. You might have to restart the node after the driver installation to get this working nvidia-smi # Download template from k3d project sudo wget https://k3d.io/v5.4.1/usage/advanced/cuda/config.toml.tmpl -O /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl # Install nvidia plugin. This is optional. You can simply pass in the env variables to pass in GPU access to pods. However this is a nice way to debug whether you have access to the GPUs on contianerd kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.12.2/nvidia-device-plugin.yml
Try running nvidia plugin and check the logs. If you see this on the correct node then you have to modify the .tmpl file further.
2022/07/25 18:14:19 Initializing NVML. 2022/07/25 18:14:19 Failed to initialize NVML: could not load NVML library. 2022/07/25 18:14:19 If this is a GPU node, did you set the docker default runtime to `nvidia`
This used to work previously but with K3S v1.23+ I had issues. You will have to modify
/var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl
add:[plugins.cri.containerd.runtimes.runc.options] BinaryName = "/usr/bin/nvidia-container-runtime"
and modify
[plugins.cri.containerd.runtimes.runc] runtime_type = "io.containerd.runtime.v1.linux"
to
[plugins.cri.containerd.runtimes.runc] runtime_type = "io.containerd.runc.v2"
This should get everything running. Then either use the nvidia plugin to define resources or if you like me and want to share the GPU accross multiple pods then just add these ENV variables to your pod
NVIDIA_VISIBLE_DEVICES: all NVIDIA_DRIVER_CAPABILITIES: all
It work for me . Thank you.
This used to work previously but with K3S v1.23+ I had issues. You will have to modify >/var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl add:
[plugins.cri.containerd.runtimes.runc.options] BinaryName = "/usr/bin/nvidia-container-runtime"
Don't do this; it will change the default runtime to use the nvidia binary instead of runc. As described above, you should be creating a RuntimeClass and setting the runtimeClassName on pods that you want to use the nvidia container runtime.
@brandond are there any docs on how it should be done and since which k3s version?
Here's what I did to get this working on an Ubuntu node. You should be able to follow similar instructions to get it working on any other distro on any currently supported release of K3s:
apt install -y nvidia-container-runtime cuda-drivers-fabricmanager-515 nvidia-headless-515-server
curl -ksL get.k3s.io | sh -
grep nvidia /var/lib/rancher/k3s/agent/etc/containerd/config.toml
kubectl apply -f https://gist.githubusercontent.com/brandond/33e49bf094712f926c95d683d515ac95/raw/nvidia.yaml
Results:
root@ip-172-31-27-127:~# kubectl logs nbody-gpu-benchmark --tail=10
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Turing" with compute capability 7.5
> Compute 7.5 CUDA device: [Tesla T4]
40960 bodies, total time for 10 iterations: 91.529 ms
= 183.299 billion interactions per second
= 3665.981 single-precision GFLOP/s at 20 flops per interaction
I'm not sure how much of this we should cover in our docs though, as this is all owned by the various Nvidia projects; the only difference necessary to follow their instructions for K3s is the addition of the runtimeClass, since we don't replace the default.
Thanks. What I currently still don't understand is why you recommend to change the "upstream" manifests to add runtimeClassName: nvidia
instead of changing the default runtime (which seems easier to me down the road).
Changing the default system runtime based on the autodetected presence of the nvidia container runtime binary is potentially more disruptive.
If the container runtime were made the default, but other packages (such as the libraries, kernel module, and so on) are not properly installed, then the node will be unable to run any pods.
Additionally, it is usually only desired to run some pods with the nvidia runtime; for all of the other pods in the system that aren't going to use GPU, the default runtime is fine. Anyone running GPU pods is already going to be deploying nvidia-specific configuration to their cluster. Asking users to add a field to the pod spec to request the nividia runtime does not seem overly burdensome.
Thanks for your help @brandond ! You are right, not all pods need/use a GPU, but I think if you don't request it, it will not be used... but I clearly need to check on this a bit more. I'm just searching for the "best-practice" setup and one that also requires the least amount of changes in the manifest/helm charts, regardless of whether I'm deploying it on our on-prem k3s cluster, or our customers cluster or in the cloud....
You're correct, pods that don't request GPU won't get one, but if you change the default runtime to nvidia-container-runtime
then everything will run with that, instead of with runc
- which may lead to unexpected changes in behavior.
I think you should be able to inject the runtimeClass fairly easily with tools like kustomize
, if you're looking at ways to do that using tooling instead of manual editing of manifests?
Side question is there anyway to request a nvidia runtime in pod spec and being able to share a single gpu with multiple pods?
3s will automatically add runtimes to the containerd con
Without pre-installed NVIDIA Container Toolkit and gpu driver, I followed the gpu-operator(v22.9.0) installation guide in k3s(v1.24.3+k3s1) to deploy gpu operator successfully, but I ran the samples from https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/getting-started.html#running-sample-gpu-applications, it failed. I have to add runtimeClassName: nvidia in the pod spec,so I wonder that how these samples ran succeessfully without runtimeClassName: nvidia.
I wonder that how these samples ran succeessfully without
runtimeClassName: nvidia
.
They will not run as-is on K3s. You either need to explicitly specify the nvidia runtime class, or modify the containerd config template to use the nvidia container runtime for all pods.
For those who want to set the default runtime to nvidia, here is what works with k3s v1.24.6+k3s1 using containerd 1.6.8-k3s1:
Check that the nvidia runtime was detected as @brandond described above. If yes, get the default config.tompl.tmpl
from https://github.com/k3s-io/k3s/blob/master/pkg/agent/templates/templates_linux.go and change it to have
[plugins.cri.containerd]
default_runtime_name = "nvidia"
similar to what nvidia also describes in the k3s-device-plugin docs. Here is the template I use: config.toml.tmpl
For those who want to set the default runtime to nvidia, here is what works with k3s v1.24.6+k3s1 using containerd 1.6.8-k3s1: Check that the nvidia runtime was detected as @brandond described above. If yes, get the default
config.tompl.tmpl
from https://github.com/k3s-io/k3s/blob/master/pkg/agent/templates/templates_linux.go and change it to have[plugins.cri.containerd] default_runtime_name = "nvidia"
similar to what nvidia also describes in the k3s-device-plugin docs. Here is the template I use: config.toml.tmpl
Hello, i tried this on ubuntu 22.04 with k3s with kubernetes 1.25 and this does not work. Containerd version 1.6.9.
It seems to properly identity proper runtimes though.
[plugins.opt]
path = "/var/lib/rancher/k3s/agent/containerd"
[plugins.cri]
stream_server_address = "127.0.0.1"
stream_server_port = "10010"
enable_selinux = false
enable_unprivileged_ports = true
enable_unprivileged_icmp = true
sandbox_image = "rancher/mirrored-pause:3.6"
[plugins.cri.containerd]
snapshotter = "overlayfs"
disable_snapshot_annotations = true
[plugins.cri.containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins.cri.containerd.runtimes.runc.options]
SystemdCgroup = true
[plugins.cri.containerd.runtimes."nvidia"]
runtime_type = "io.containerd.runc.v2"
[plugins.cri.containerd.runtimes."nvidia".options]
BinaryName = "/usr/local/nvidia/toolkit/nvidia-container-runtime"
[plugins.cri.containerd.runtimes."nvidia-experimental"]
runtime_type = "io.containerd.runc.v2"
[plugins.cri.containerd.runtimes."nvidia-experimental".options]
BinaryName = "/usr/local/nvidia/toolkit/nvidia-container-runtime-experimental"
Error thrown by nvidia-device-plugin pod ->
Warning Failed 99s (x5 over 3m13s) kubelet Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli.real: initialization error: driver error: failed to process request: unknown
Can you try my steps documented at https://github.com/k3s-io/k3s/issues/4391#issuecomment-1233314825 instead?
Can you try my steps documented at #4391 (comment) instead?
So I just did a clean agent install on the GPU node. following your steps. I get the same config file generated, also instead of using the helm chart install of nvidia-device-plugin I used your URL. Once the pod starts on the node with the GPU I receive the same error.
I tried running nvidia-smi
and within normal containerd
here is the output.
christopher@k8s-gpu:~$ sudo ctr run --rm --gpus 0 -t docker.io/nvidia/cuda:11.0-base cuda-11.0-base nvidia-smi
Thu Nov 3 22:14:02 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 0% 37C P8 12W / 151W | 2MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
As soon as it's in kubernetes, it doesn't seem to work
ctr run --rm --gpus 0 -t docker.io/nvidia/cuda:11.0-base cuda-11.0-base nvidia-smi
That tag doesn't seem to exist: https://hub.docker.com/r/nvidia/cuda/tags?page=1&name=11.0-base
Did you mean 11.0.3-base
? Even if so, that image doesn't seem to contain the nvidia-smi
binary that you're trying to run:
brandond@dev01:~$ docker run --rm -it docker.io/nvidia/cuda:11.0.3-base nvidia-smi
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "nvidia-smi": executable file not found in $PATH: unknown.
So I'm not sure how these steps are working for you at all. Additionally, 11.0 is quite old; your command shows that your driver is actually using cuda 11.7, and 11.8 is the current release.
Can you try checking the output of the nbody-gpu-benchmark
pod, as shown in my example, instead of running other tests using deprecated examples and commands?
the image 11.0-base may have been previously pulled awhile back. I just tried with 11.0.3 and it worked fine as well. edit: also tried with 11.8 and also worked fine.
I tried running the nbody-gpu-benchmark
it's unscheduleable because the nvidia-device-plugin
pod is unable to complete.
nbody is looking for nvidia.com/gpu
Warning FailedScheduling 8s default-scheduler 0/4 nodes are available: 4 Insufficient nvidia.com/gpu. preemption: 0/4 nodes are available: 4 No preemption victims found for incoming pod.
If it is of interest: I actually abandoned the approach of setting the default runtime to nvidia (although that worked) again and went with @brandond's recommendation of explicitly setting runtimeClassName since longhorn had problems running under the nvidia runtime... Although that is extra work for my GPU workloads and I had to find workarounds to specify the runtimeClassName in flyte, this definitely seems to be the route to go.
@larivierec
the image 11.0-base may have been previously pulled awhile back.
Hmm, that'd have to be from quite a while ago then. What version of K3s are you currently using?
the nvidia-device-plugin pod is unable to complete.
Why not?
I just tried with 11.0.3 and it worked fine as well. edit: also tried with 11.8 and also worked fine.
How is this working? Which specific image are you using? Neither docker.io/nvidia/cuda:11.0.3-base
nor any of the newer versions I've tried appear to contain the nvidia-smi
binary.
@larivierec
the image 11.0-base may have been previously pulled awhile back.
Hmm, that'd have to be from quite a while ago then. What version of K3s are you currently using?
the nvidia-device-plugin pod is unable to complete.
Why not?
I just tried with 11.0.3 and it worked fine as well.
edit: also tried with 11.8 and also worked fine.
How is this working? Which specific image are you using? Neither
docker.io/nvidia/cuda:11.0.3-base
nor any of the newer versions I've tried appear to contain thenvidia-smi
binary.
Hmmmm, with regards to 11.X I don't know maybe perhaps I made modifications to the system containerd?
The k3s version is latest stable version: 1.25-k3s.
The Daemonset pod doesn't start because of the error I linked above sadly 😓
Edit; i was also using the ubuntu22.04 images, I don't know if that makes a difference
@brandond thanks for the help yesterday, turns out the binary that was being set in the config.toml by k3s was not the one that I installed with the package manager.
/usr/local/nvidia
/usr/bin/nvidia-container-runtime
Cheers :beer:
@larivierec if I remember correctly, the /usr/local/nvidia
runtime is installed by the gpu operator, and will be selected over the package manager-installed runtime if it exists. If you're still using the gpu operator it might try to install its own runtime again, so look out for that.
That would make sense, I had installed it previously however, when I removed it awhile back it didn't seem to clean itself up!
Thanks for the heads up
Hello, Personnaly it works with the following setup (works on Ubuntu 22.04, Nvidia drivers 515 and K3s 1.25) :
As @brandond said, we need to create a runtimeClass resource:
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: nvidia
handler: nvidia
Then, I prefer deploy the helm chart from directly from Nvidia repo
helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm upgrade -i nvdp nvdp/nvidia-device-plugin --namespace nvdp --create-namespace --version 0.12.3 --set=runtimeClassName=nvidia
I hope this will help.
Environmental Info: K3s Version:
1.22.3-rc4+k3s1
Node(s) CPU architecture, OS, and Version:
Linux metal01 5.4.0-89-generic #100-Ubuntu SMP Fri Sep 24 14:50:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration:
1 control-plane node, 3 agents
Describe the bug:
nvidia-device-plugin is crashlooping with the following errors:
Steps To Reproduce:
Expected behavior:
Nvidia device plugin is not crashlooping
Actual behavior:
Nvidia plugin is crashlooping and GPU is not usable.
Additional context / logs:
I upgraded cluster from 1.21 where GPU was using runc v1 and everything worked fine with custom containerd config. After upgrade and wiping out whole node I was presented with issues regarding NVML initialization. After following what was described in https://github.com/k3s-io/k3s/issues/4070 I got to a state where container cannot be started due to log message mentioned earlier. Other pods on that node using default runtimeClass are working just fine.
At current state I am not sure if this is some issue on my side, nvidia-plugin side, or k3s so any help would be apreciated.
My deployment manifests are available at https://github.com/thaum-xyz/ankhmorpork/tree/master/base/kube-system/device-plugins
Backporting