spotty-cloud / spotty

Training deep learning models on AWS and GCP instances
https://spotty.cloud
MIT License
492 stars 43 forks source link

[feature request] Option for instance-initiated termination #47

Closed vadimkantorov closed 5 years ago

vadimkantorov commented 5 years ago

This is useful for preprocessing scripts with On-Demand instances.

The AWS syntax is at: https://stackoverflow.com/questions/10541363/self-terminating-aws-ec2-instance

apls777 commented 5 years ago

Do you want a feature to terminate the instance in N hours? At the moment you can simply put the shutdown -h +120 command to the instance commands parameter (not container commands) to shut it down in 2 hours, for example.

vadimkantorov commented 5 years ago

Will shutdown terminate the instance? Or just keep it allocated in in the turned-off mode? (that was my prior understanding)

vadimkantorov commented 5 years ago

Here they say that by default the instance is stopped (not terminated on shutdown), the good news is that it's still free, the bad news is that we need to check once in a while if the instance has stopped itself and only then terminate it. I think terminate-on-shutdown is a better fit for long-running preprocess jobs that don't need to keep the stopped instances hanging around.

https://superuser.com/questions/587727/does-shutdown-h-now-on-a-linux-aws-instance-stop-or-terminate-the-instance

apls777 commented 5 years ago

@vadimkantorov A Spot Instance will be terminated (they cannot be stopped), but an On-demand Instance will be stopped. As you said you're not paying for a stopped instance, but it preserves a root volume (with OS), so the difference is you're paying for an extra volume (13GB by default).

I don't want to implement this behaviour by default, because in theory, for an on-demand instance, it would be possible to implement logic when you can stop the instance, run it again later and then Spotty would just run stopped container on it. This would be much faster than killing a whole stack and recreating it again, it also would preserve a container state.

If you're using Spot Instances for your jobs than there is nothing to be worried about, instances will be terminated and root volumes will be deleted. If you need to do it for on-demand instances, I would suggest trying to use AWS API to change shutdown behavior before the shutdown command (see modify-instance-attribute). You might also need to add some managed policy to the managedPolicyArns parameter to use this API call.

apls777 commented 5 years ago

@vadimkantorov I'll close this issue. Feel free to reopen it if you have any other questions or suggestions.

vadimkantorov commented 5 years ago

Yes, this is finally for consistency mostly (so that on-demand instances terminate instead of stopping if shutdown is requested)