Closed mengdong closed 4 years ago
When allocating master and worker on GPU nodes on GKE, I notice it is working when master and workers are on the same node, however, if some worker got allocated to a different GPU node, it will stuck on ContainerCreating stage forever.
Can you run kubectl describe to get the pod info and show the output here?
kubectl describe
I release it is due to a separate persistent volume issue.
When allocating master and worker on GPU nodes on GKE, I notice it is working when master and workers are on the same node, however, if some worker got allocated to a different GPU node, it will stuck on ContainerCreating stage forever.