SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
Moved setup/ray start to the kubernetes pod args to make them async
TODO:
[x] Better observability: the current logs are in kubectl logs instead of the provision.log as before which makes it harder to debug for users. (We now have a best effort for printing out the failed logs if we detect it during setup).
Tested (run the relevant ones):
[ ] Code formatting: bash format.sh
[ ] Any manual or new tests for this PR (please specify below)
Moved setup/ray start to the kubernetes pod args to make them async
TODO:
kubectl logs
instead of the provision.log as before which makes it harder to debug for users. (We now have a best effort for printing out the failed logs if we detect it during setup).Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh