spotty-cloud / spotty

Training deep learning models on AWS and GCP instances
https://spotty.cloud
MIT License
492 stars 43 forks source link

Restart instance automatically #67

Open alexbussan opened 4 years ago

alexbussan commented 4 years ago

Super basic question - I intend to use this to start spot instances and if they are terminated, automatically restart them.

It seems like that's one thing this tool can help with, but it doesn't explicitly say in the docs, so that's why I asking here. Can this tool do this?

apls777 commented 4 years ago

@alexbussan Unfortunately, it's not supported at the moment.

It would a quite sophisticated functionality:

  1. It would require deploying some kind of a lambda function that keeps track of a running instance and restarts it when necessary.
  2. It should automatically resume all the previously running commands like model training, otherwise just restarting the instance is pointless.

Or Spotty could somehow reuse the SageMaker functionality: as far as I know, they support spot instances for their training jobs.