Closed cheungsuifai closed 1 year ago
I think you should report this to https://github.com/NVIDIA/nvidia-container-runtime
I think you should report this to https://github.com/NVIDIA/nvidia-container-runtime
Sorry to raise the issue here. But actually I have gone through the issue of nvidia-container-runtime. And I have found that you also engage in some issues there. In that issue thread, it seems you setup u7s cluster supported GPU which based on nividia container runtime with setting no-groups = true.
is that true?
I planned to test the GPU availability on u7s. Due to my GPU device is Nvidia Tesla T4, I tried to deploy the nvidia device plugin daemonset. For this, I switched the containerd runtime from origin crun to nvidia's (v1.13).
But after that, the Pod failed to start with the below log:
I guessed the reason is after cgroup v2 enabled, there was no
/sys/fs/cgroup/devices
mounted.So I set
no-cgroups = true
in/etc/nvidia-container-runtime/config.toml
.But "mountpoint for cgroup not found" problem still there.