spotty-cloud / spotty

Training deep learning models on AWS and GCP instances
https://spotty.cloud
MIT License
492 stars 43 forks source link

Wrong key error while using onDemandInstance parameter #36

Closed akmnko closed 5 years ago

akmnko commented 5 years ago

Hi there,

I'm having a problem setting onDemandInstance parameter to true in spotty.yaml, getting next error while trying to build an AMI with spottyaws create-ami:

Wrong key 'onDemandInstance' in {'region': 'us-east-2', 'onDemandInstance': 'true', 'amiName': 'my-project', 'instanceType': 'p2.xlarge', 'volumes': [{'name': 'Tacotron2', 'directory': '/workspace', 'size': 50}], 'docker': {'file': 'Dockerfile', 'workingDir': '/workspace/project', 'dataRoot': '/workspace/docker'}, 'ports': [6006, 8888]}

Spotty Version Installed: 1.2.0

Can you please advice? Thanks in advance.

apls777 commented 5 years ago

Hi @akmnko,

The format of a configuration file was changed in v1.2.0, but the old format still should be working. Unfortunately, I missed the "onDemandInstance" parameter when converting an old config to a new one, so it's not working for you.

I recommend you to rewrite your "spotty.yaml" file using the new format. You can find an example and the specification here: https://apls777.github.io/spotty/docs/configuration/.

akmnko commented 5 years ago

Thanks @apls777! All worked fine with new format.

One more small thing, after you run spotty start with onDemandInstance: True it throws an error: 'InstanceLifecycle' and doesn't exit properly. It doesn't affect instance creation in any way but figured is worth flagging.

apls777 commented 5 years ago

Oh, I see. The "InstanceLifecycle" key is not being returned by AWS API if it's an on-demand instance. The "spotty status" command will not be working as well. But, as you said, it doesn't affect any other command or the instance itself.

Thank you for the report! I will fix both bugs with the next release.

apls777 commented 5 years ago

Hi @akmnko,

I released version 1.2.1 with some bug fixes including the fix for on-demand instances. I recommend you to update the version because there also was a bug with the "update_snapshot" and "create_snapshot" deletion policies: in case of an error or a 10 minutes timeout during snapshot creation, an EBS volume would be deleted anyway.