skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.82k stars 513 forks source link

[Core] Cancel 1000 jobs can take 5-10 mins #4293

Closed Michaelvll closed 1 week ago

Michaelvll commented 2 weeks ago

Cancelling 1000 jobs on a unmanaged cluster can take 5-10 mins. We should speed this up by having some parallelism.

Version & Commit info: