puppetlabs / puppetlabs-pecdm

Puppet Bolt driven fusion of puppetlabs/peadm and Terraform.
Apache License 2.0
14 stars 19 forks source link

Deploying on aws yields error in resolve_reference #48

Closed jessereynolds closed 2 years ago

jessereynolds commented 3 years ago

Describe the Bug

Running an initial build on aws (with role switching) yields the following error:

{"target":"localhost","action":"task","object":"terraform::resolve_reference","status":"failure","value":{"_error":{"kind":"JSON::ParserError","msg":"783: unexpected token at ''","details":{"backtrace":["/opt/puppetlabs/bolt/lib/ruby/2.7.0/json/common.rb:156:in `parse'","/opt/puppetlabs/bolt/lib/ruby/2.7.0/json/common.rb:156:in `parse'","/tmp/e3ae0b25-9a2d-46b6-a275-da2c8f1acf41/terraform/tasks/resolve_reference.rb:43:in `load_statefile'","/tmp/e3ae0b25-9a2d-46b6-a275-da2c8f1acf41/terraform/tasks/resolve_reference.rb:19:in `resolve_reference'","/tmp/e3ae0b25-9a2d-46b6-a275-da2c8f1acf41/terraform/tasks/resolve_reference.rb:130:in `task'","/private/tmp/e3ae0b25-9a2d-46b6-a275-da2c8f1acf41/ruby_task_helper/files/task_helper.rb:58:in `run'","/tmp/e3ae0b25-9a2d-46b6-a275-da2c8f1acf41/terraform/tasks/resolve_reference.rb:135:in `<main>'"]}}}}
Error executing plugin terraform from resolve_reference in terraform: 783: unexpected token at ''

I was able to get past this by removing the inventory.yaml file and trying again.

Expected Behavior

The environment is built on aws.

Happy to provide more info if needed. Note that I am using assume-role to switch from the bastion role to the required role.

ody commented 3 years ago

@jessereynolds I was able to reproduce, thank you for the bug submission.

The scenario which reproduced it for me was running autope post run failure or user initiated kill, specifically while Bolt is running Terraform. When you get into this situation, run ps and check to see if there are any terraform processes running, if yes then kill them. After that, from the autope project directory cd .terraform/aws_pe_arch && terraform destroy. It might say that there is nothing to destroy but will clean up some lock files that were previously left around.

After Terraform finishes the destroy then return to the Bolt project's root and attempt to run autope again. This solved the issue for me.

If you use autope/terraform enough on AWS then you might notice that it "hangs" occasionally. I've found this to be related to the short lifespan of the STS token that is generated when going through the bastion assume-role workflow so I got in the habit of refreshing my credentials before doing anything with Terraform, e.g. re-assuming my role. Overall, we've found more quirks using Terraform with AWS then with GCP.

This is might be a bug more appropriate for puppetlabs/puppetlabs-terraform since the resolve_reference is happening before the autope plan runs so we have a limited ability to work around it in this module.

nigelkersten commented 2 years ago

We'll document that you may see this and other undesirable behaviours when your token expires. Thanks!