SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
We identified that apt update is the slowest operation and triggering it after setup takes a lot of time, so we now inline it in container init args to let kubernetes run it in parallel.
Supersedes #4261 and #4270.
We identified that
apt update
is the slowest operation and triggering it after setup takes a lot of time, so we now inline it in container init args to let kubernetes run it in parallel.Testing on nemo image
sky launch -y -c test --num-nodes 100 --cloud kubernetes --image-id nvcr.io/nvidia/nemo:24.05.01
Master branch: 19:56.21 total
This branch: 15:26.56 total
Testing on default image:
sky launch -y -c test --num-nodes 100 --cloud kubernetes
This branch: 4:13.41 total
Tested (run the relevant ones):
bash format.sh