spotty-cloud / spotty

Training deep learning models on AWS and GCP instances
https://spotty.cloud
MIT License
492 stars 43 forks source link

AWS Batch #41

Open quazzuk opened 5 years ago

quazzuk commented 5 years ago

Hi

Great project!

How about adding support for training using AWS Batch? Basically, I'm looking for a setup where I can develop and test locally then deploy using AWS Batch on spot instances. Do you think this functionality would be a good fit for Spotty or am I best starting something from scratch?

Thanks Andrew

apls777 commented 5 years ago

Hi Andrew,

Thanks!

I think it's a great idea to add AWS Batch support. The idea behind Spotty is to have an abstraction over any "provider". It can be instances from different cloud providers, services like AWS Batch or AWS ParallelCluster, or just any machine accessible through SSH.

My next goal is to add support for Google Cloud, so right now I don't have time to work on this functionality, but if you want to make a contribution, it would be great :). If you're interested, we could discuss implementation details further.

Best regards, Oleg

tekumara commented 4 years ago

Interesting idea! What would AWS Batch give you over the current AWS capability?

apls777 commented 4 years ago

@tekumara An AWS Batch job you can just run and forget about it. The instance will be terminated automatically once the job is done. The current functionality assumes that you start and stop an instance manually. So, when you just want to train your model, it might be more convenient to do it with a single command, and not be afraid that you will forget to stop the instance after. Also, I think AWS Batch can run jobs on Spot instances in a way that it waits for them if they're not available at the moment.