Closed ZohaibAhmed closed 5 years ago
Hi @ZohaibAhmed,
Can you, please, go to your AWS account -> CloudFormation service, find there the AMI stack (it has name "spotty-nvidia-docker-ami-xxxxxxxx"), open it and check what is the error message in the "Events" tab?
@apls777 It shows that there are no stacks (failed, deleted, or otherwise)? Seems like spotty didn't get around to creating the stack at all?
@ZohaibAhmed make sure you are looking stacks in the right region, it should be there
@apls777 Got it, was looking at the default region, not the one in the spotty config. Here's the error:
` CREATE_FAILED | AWS::Lambda::Function | SetLogsRetentionFunction | The runtime parameter of nodejs4.3 is no longer supported for creating or updating AWS Lambda functions. We recommend you use the new runtime (nodejs8.10) while creating or updating functions. (Service: AWSLambdaInternal; Status Code: 400; Error Code: InvalidParameterValueException; Request ID: bb4a9a9b-3ab9-11e9-b03f-151ba91a4517) |
---|
`
@ZohaibAhmed it seems you are using old version of the tool. Use the pip install -U spotty
command to update it.
Thanks, also seemed like I need to request an increase for the particular instance I was allocating.
@ZohaibAhmed What do you mean by "request an increase"?
I had a limit on the particular type of instance I wanted to use (by default it was 0). Had to open a case with Amazon to allocate more (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-limits.html)
@apls777 Don't want to open another issue on this repo for this, but all of the sudden, I get this error when I create-ami using spotty:
ract_config.py:46: YAMLLoadWarning:
*** Calling yaml.load() without Loader=... is deprecated.
*** The default Loader is unsafe.
*** Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
Waiting for the AMI to be created...
- creating IAM role for the instance...
- launching the instance...
- installing NVIDIA Docker...
Error:
------
Stack "spotty-nvidia-docker-ami-hc4o3367" was not created.
See CloudFormation and CloudWatch logs for details.
Any ideas?
EDIT: More logs I dug up:
[ 212.678177] cloud-init[2664]: Error occurred during build: Command run_init failed
[ 212.690100] cloud-init[2664]: + INIT_EXIT_CODE=1
[ 212.690362] cloud-init[2664]: + /usr/local/bin/cfn-signal -e 1 --stack spotty-nvidia-docker-ami-sdrkb2nt --region us-west-2 --resource InstanceReadyWaitCondition
[ 212.928861] cloud-init[2664]: + [[ 1 -ne 0 ]]
[ 212.929111] cloud-init[2664]: + exit 1
[ 212.933173] cloud-init[2664]: Cloud-init v. 18.4-0ubuntu1~16.04.2 running 'modules:final' at Fri, 01 Mar 2019 05:23:51 +0000. Up 19.41 seconds.
[ 212.933355] cloud-init[2664]: 2019-03-01 05:27:04,492 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-001 [1]
[ 212.946474] cloud-init[2664]: 2019-03-01 05:27:04,505 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
[ 212.957059] cloud-init[2664]: 2019-03-01 05:27:04,516 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed
ci-info: no authorized ssh keys fingerprints found for user ubuntu.
Hi,
Coming back to this project after a while. I'm following the tutorial here: https://towardsdatascience.com/how-to-train-deep-learning-models-on-aws-spot-instances-using-spotty-8d9e0543d365
I get an error while it tries to create AMI.