SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
Three smoke test are failing on latest master: test_job_pipeline, test_managed_jobs and test_managed_jobs_pipeline_failed_setup. Seems to be introduced in #4169. Notice that the job name is not correctly displayed:
$ sky jobs queue --refresh
Fetching managed job statuses...
Managed jobs
In progress tasks: 1 RUNNING
ID TASK NAME RESOURCES SUBMITTED TOT. DURATION JOB DURATION #RECOVERIES STATUS
4 - t-managed-jobs-2e-2 1x[CPU:1+] 3 mins ago 3m 55s 2m 45s 0 RUNNING
3 - t-managed-jobs-2e-1 1x[CPU:1+] 4 mins ago 1m 49s 52s 0 CANCELLED
2 2 - 4 mins ago 8s - 0 CANCELLED
↳ 0 a 1x[CPU:2.0+][Spot] 4 mins ago 8s - 0 CANCELLED
↳ 1 b 1x[CPU:2.0] - - - 0 CANCELLED
↳ 2 eval1 1x[CPU:2+] - - - 0 CANCELLED
↳ 3 eval2 1x[CPU:2+] - - - 0 CANCELLED
1 1 - 5 mins ago 3m 18s 15s 0 FAILED_SETUP
↳ 0 a 2x[CPU:2.0+] 5 mins ago 1m 26s 7s 0 SUCCEEDED
↳ 1 b 1x[CPU:2.0+] 3 mins ago 1m 7s 0 FAILED_SETUP
↳ 2 eval1 1x[CPU:2.0] - - - 0 CANCELLED
↳ 3 eval2 1x[CPU:2.0] - - - 0 CANCELLED
and printing out the job info dict, we got the spec field holding the task name:
Three smoke test are failing on latest master:
test_job_pipeline
,test_managed_jobs
andtest_managed_jobs_pipeline_failed_setup
. Seems to be introduced in #4169. Notice that the job name is not correctly displayed:and printing out the job info dict, we got the
spec
field holding the task name:suggesting maybe some database parsing error is introduced.
Version & Commit info:
sky -c
: 46ea0d87ffa8bdcf05a32c938e2002924abb3c87