Open vsoch opened 7 months ago
Also this seems to be an issue with creating (maybe?) more pods than the size of my resources (cpu, etc) can support. I deployed a smaller pytorch workflow ref and it worked! @AkihiroSuda this is SO cool it's rocking my socks!! :socks: This is what we wanted to get working many months ago and I'm over the moon it's starting to! :moon:
These just look like images for the kubelet or control plane (not any applications) and interesting, there aren't any subuid in the file here:
Please check the files on the host. You probably have 65536 ids there.
Yes the uid/gid for the host virtual machine (not inside of docker compose) looks OK.
Please try increasing 65536 there to a larger number
Please try increasing 65536 there to a larger number
Sure! I've never done that on my host. How large should it be?
Depends, but at least 1185200044 for your image
for UID 1185200044, GID 1185200044: lchown /var/lib/containerd/tmpmounts/containerd-mount180165242/opt/conda/pkgs/pytorch-1.0.0-py3.6_cuda10.0.130_cudnn7.4.1_1/bin: invalid argument (Hint: try increasing the number of subordinate IDs in /etc/subuid and /etc/subgid)
I'm (fairly successfully) running different kinds of pods in usernetes, but I just hit this error:
I'm not sure the error is correct for the error, I was wondering if there are too many containers running? I figured out I could do
make shell
to get into one of the nodes, and then I found a way to see containerd images:These just look like images for the kubelet or control plane (not any applications) and interesting, there aren't any subuid in the file here:
Is there a bug here / something we can do to get it to work?