skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.81k stars 513 forks source link

[Jobs] Cancelling managed jobs can take a long time #4296

Open Michaelvll opened 2 weeks ago

Michaelvll commented 2 weeks ago

When there are many RUNNING and PENDING jobs, cancelling those PENDING managed jobs can take a significant amount of time, as we are actually cancelling them one by one. We should speed this up.

Version & Commit info: