SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
4240 introduced a bug when provisioning multi-node clusters with a custom image that does not have sudo installed:
sky launch -c test --num-nodes 4 --cloud kubernetes --image-id nvcr.io/nvidia/nemo:24.05.01 -- echo hi
Would fail with sudo not found error
As an optimization #4240 had ran privilege check in only the head node, but it's necessary to be run in all pods to make sure sudo alias is setup correctly. This PR fixes that.
Tested with sky launch -c test --num-nodes 4 --cloud kubernetes --image-id nvcr.io/nvidia/nemo:24.05.01 -- echo hi
4240 introduced a bug when provisioning multi-node clusters with a custom image that does not have sudo installed:
As an optimization #4240 had ran privilege check in only the head node, but it's necessary to be run in all pods to make sure sudo alias is setup correctly. This PR fixes that.
Tested with
sky launch -c test --num-nodes 4 --cloud kubernetes --image-id nvcr.io/nvidia/nemo:24.05.01 -- echo hi