Open weixiujuan opened 2 years ago
same with me. did u solve it?
Hi,I lowered the version of kubernetes to v1.20 and it works fine,did u solve it?
we have the same issue at the kubernetes version of v1.18.6
the same error, I think the reason is "--container-runtime-endpoint=/var/run/containerd/containerd.sock --cgroup-driver=systemd" , use containerd as container-runtime cause this problem, i will try to solve this.
same with me. did u solve it?
I change the k8s cgroup from systemd to cgroup,it works well. Do not use -cgroup-driver=systemd The congfig like
env:
- name: LOG_LEVEL value: "5"
- name: EXTRA_FLAGS value: "--logtostderr=false --container-runtime-endpoint=/var/run/containerd/containerd.sock"
- name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName
Please help solve the problem,The information is as follows,Thank you.
The restricted GPU configuration is as follows:
The running algorithm program reports the following error:
gpu-manager.INFO log contents are as follows:
gpu-manager.WARNING log contents are as follows:
W0607 15:56:44.887813 626706 manager.go:290] Find orphaned pod tainerd.service
gpu-manager.ERROR and gpu-manager.FATAL are no error log.
my gpu-manager.yaml is follwing: