spotty-cloud / spotty

Training deep learning models on AWS and GCP instances
https://spotty.cloud
MIT License
493 stars 43 forks source link

How to persist storage / resume a terminated instance #63

Closed Tarang closed 4 years ago

Tarang commented 4 years ago

Is it possible to persist storage, so that if a spotty created instance is terminated its easy to resume using the same /workspace and instance as just before it was terminated?

I recently had some data loss as a result of not syncing data back and forth and hoping there's a way to easily prevent it, or resume the terminated spotty instance (terminated by aws).

apls777 commented 4 years ago

By default, Spotty creates snapshots for all your volumes and restores them back when you start your instance again. Your volumes were deleted only if you set the deletionPolicy parameter to delete. Please, read the docs here: Volumes and Deletion Policies.

Tarang commented 4 years ago

Would this work with instances that weren't stopped with spotty stop, i.e the ones Amazon terminates because the spot price goes above the on-demand price?

That's really amazing I didn't know it did that.

apls777 commented 4 years ago

If an instance was terminated, EBS volumes will be just detached and won't be deleted. But you can run spotty stop command even after your instance was terminated, it will apply deletion policies to the volumes.

Tarang commented 4 years ago

If I had used delete instead of retain or an equivalent and I ran spotty start on an AWS terminated instance shoulldn't it also re-attach the same EBS volume in this case?

apls777 commented 4 years ago

If you had delete policy in the configuration file, but you didn't use the spotty stop command, the volume won't be deleted. So if an instance was terminated by AWS and you're just starting it again with the spotty start command, the volume will be re-attached (as it wasn't deleted in the first place).

apls777 commented 4 years ago

Closing the issue as it's resolved. Feel free to reopen if you have more questions.

Tarang commented 4 years ago

Sorry it took a while to get back on this, I needed to wait until the instance got terminated!

The first time I had data loss so I thought it was a mistake, but it happened again. The instance was terminated and I ran spotty start:

I see this suggesting the old volume will be attached:

Syncing the project with S3 bucket...
upload: .git/ORIG_HEAD to s3://spotty-test/project/.git/ORIG_HEAD
upload: .git/FETCH_HEAD to s3://spotty-test/project/.git/FETCH_HEAD
Preparing CloudFormation template...
  - volume "spotty-test" (vol-0ea5adf9b8f520380) will be attached
  - availability zone: us-west-2b
  - maximum Spot Instance price: on-demand

But in the end I ended up with the same data I had in the folder on my local machine (so I lost what was on the remote machine). I had a update_snapshot deletion policy.

apls777 commented 4 years ago

Are you sure you saving your data to the attached EBS volume and not to the root one? Can you provide your full spotty.yaml config?