SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
This PR adds memory limitations for the number of concurrently running jobs, and CPU limitations for the number of concurrent sky launch by the jobs controller.
I followed SkyServe's implementation to only apply CPU limit to concurrent launches, as IIRC sky.launch consumes more compute than memory. Also, only apply memory limits to the number of concurrent jobs as ray jobs consume more memory.
Tested (run the relevant ones):
[ ] Code formatting: bash format.sh
[ ] Any manual or new tests for this PR (please specify below)
Fixes #4243.
This PR adds memory limitations for the number of concurrently running jobs, and CPU limitations for the number of concurrent
sky launch
by the jobs controller.I followed SkyServe's implementation to only apply CPU limit to concurrent launches, as IIRC
sky.launch
consumes more compute than memory. Also, only apply memory limits to the number of concurrent jobs as ray jobs consume more memory.Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh