skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.8k stars 510 forks source link

Autostop should consider the setup as not idle #715

Open Michaelvll opened 2 years ago

Michaelvll commented 2 years ago

Our autostop will only consider the run section, but the setup section can take very long and the cluster can be stopped even if the setup is still running.

WoosukKwon commented 2 years ago

@Michaelvll Just curious: Does this problem persist even now? To me, it seems like the auto stop flag is set only after the setup finishes. https://github.com/sky-proj/sky/blob/f13c5280215417a5662a65637209d3248dcd3a8a/sky/execution.py#L156-L161

concretevitamin commented 2 years ago

I think the issue is if there's a concurrent sky autostop command. Which will not consider the ongoing user-level setup as "busy".

WoosukKwon commented 2 years ago

@concretevitamin Got it. Makes sense. Thanks!

concretevitamin commented 2 years ago

I think the issue is if there's a concurrent sky autostop command. Which will not consider the ongoing user-level setup as "busy".

On Sun, Jul 17, 2022 at 5:38 PM Woosuk Kwon @.***> wrote:

@Michaelvll https://github.com/Michaelvll Just curious: Does this problem persist even now? To me, it seems like the auto stop flag is set only after the setup finishes. https://github.com/sky-proj/sky/blob/f13c5280215417a5662a65637209d3248dcd3a8a/sky/execution.py#L156-L161

— Reply to this email directly, view it on GitHub https://github.com/sky-proj/sky/issues/715#issuecomment-1186645548, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEQWHUB56VZQFT4NPBJSILVUSROTANCNFSM5SRKOO5A . You are receiving this because you are subscribed to this thread.Message ID: @.***>