Closed dgarstang closed 6 years ago
Additional info with debug enabled.
[terragrunt] 2017/10/12 09:10:20 runtime.errorString runtime error: invalid memory address or nil pointer dereference
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/errors/errors.go:72 (0x620a1)
/usr/local/go/src/runtime/asm_amd64.s:479 (0x55d7c)
/usr/local/go/src/runtime/panic.go:458 (0x29473)
/usr/local/go/src/runtime/panic.go:62 (0x27fcd)
/usr/local/go/src/runtime/sigpanic_unix.go:24 (0x3e4c4)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/remote/remote_state.go:94 (0xc14da)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/remote/remote_state.go:79 (0xc134e)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:441 (0x5d71c)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:319 (0x5c78f)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:291 (0x5c629)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:230 (0x5c024)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:213 (0x5beee)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:190 (0x5be00)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:171 (0x5baec)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/vendor/github.com/urfave/cli/app.go:485 (0x99ea4)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/vendor/github.com/urfave/cli/app.go:259 (0x97cbf)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/main.go:20 (0x2104)
/usr/local/go/src/runtime/proc.go:183 (0x2b144)
/usr/local/go/src/runtime/asm_amd64.s:2086 (0x588a1)
[terragrunt] 2017/10/12 09:10:20 Unable to determine underlying exit code, so Terragrunt will exit with error code 1
Seems to work when I supply --terragrunt-source-update
Oof, that's no fun. Thank you for the stack trace. It seems to point to this line in remote_state.go. The remoteState
variable can't be the culprit, as there is a check for nil
earlier. That means it must be from the existingState
, which is parsed using this method, which can return nil
. Looks like we need to add a nil
check in the NeedsInit or differsFrom method!
PRs very welcome.
We are also running into this issue. I've also been trying to trace exactly what is happening, but I feel like I'm a bit confused on what is going on.
Some background on what is happening before discussing the problem at hand:
terragrunt apply
on my module using the following remote_state configuration in my terraform.tfvars file:terragrunt = {
remote_state {
backend = "local"
config {
path = "${get_tfvars_dir()}/terraform.tfstate"
}
}
...
}
terragrunt apply
and it failed to run because of a IAM permission issue. This is fine and expected, I didn't have the correct permissions to being with.terragrunt apply
Digging into the source code, I see the various points in the code where the state file is being parsed. I see what appears to be the local state file being parsed here. Looking at the parser there, it simply just loads in the file and unmarshals it using a JSON decoder.
After that there is a check on the state
variable for nil
. So already, the state is not nil when it gets to the differsFrom call. From here, it sends over state.Backend
attribute over to it.
Eventually, the whole thing bombs (here)[https://github.com/gruntwork-io/terragrunt/blob/master/remote/remote_state.go#L94].
Now, I'm trying to figure out exactly what is messed up.
We have two variables being compared existingBackend.Type
and remoteState.Backend
. From tracing the code, we already know that existingBackend
== state.Backend
, from which state
is not nil from the previous function and we at least know that remoteState
is also non-nil as the function would not have been called (I don't know Go, but I assume that this is the case). Also, I assume that remoteState
== my local state file, but I'm unclear how this one is parsed.
Anyways, if state
is just a unmarshaled JSON file, I assume then that state.Backend
is getting the key from the root of the JSON file. Inspecting the terraform.tfstate
file, the root keys that I see are:
{
"version": 3,
"terraform_version": "0.10.3",
"serial": 1,
"lineage": ...,
"modules": [
...
]
}
It appears that Backend
isn't a key in the root document. I'm not sure if the reason this is erroring is because of the initial terragrunt apply
failed or something else. I also inspected other state files that we have lying around and Backend
never appears to be a key in the root level of the JSON tfstate file as well.
Ah, so interesting, we recently upgraded terragrunt from v1.30.0 -> v1.30.9. It looks like in the older version we were running v1.30.0
, there is a check for the state
being a remote state. However, this got removed in v1.30.6
.
I'm not sure if the newer terraform versions include the new Backend
key, but the check in IsRemote()
accounted for this. But, it doesn't appear to be running this anywhere.
@bitbier Good find! It's a bit odd that Terraform state doesn't include a backend
key even when you explicitly configure the backend
to be local
. Does any of the state file config get stored for a local
backend (e.g., path
)? If not, then there is no reason to compare them, and the IsRemote
check should be restored.
BTW, why are you using a local
backend?
Looking at a few terraform state files, it looks like there isn't anything that indicates the backend. Again, this is for terraform 0.10.3
at least. Not sure if terraform has any information on what their state file is suppose to look like. I couldn't find any schematics on their site, but maybe their github repo has more information about that.
As for why we are using a local
backend, it mainly comes down to ease of use and compatibility with other providers. At our company, we try not to use vendor specific sources unless its absolutely necessary so that we can move to another cloud provider with out much hassle. We are a small enough team that there is only 3 or so people running terraform with plans to build out an automation tool for it so that it can be run in one place. We also like having history local without needing to write S3 tool for pulling back different versions of the state files if we need it. We also don't keep any sensitive data in our state files (no RDS instances or services that require generated passwords to be saved). Our git repo is already backed up nightly to S3 (which could easy be GCP storage) so we already have the backup/persistence handled in case something goes wrong, but if our git server goes down we have more problems than just our state file being gone.
Eventually we may use the S3 backends, but for now the use-case for us is just not there. Although we may try the consul backend or vault backend once that is supported since we would be running those services no matter the provider.
Looking at a few terraform state files, it looks like there isn't anything that indicates the backend. Again, this is for terraform 0.10.3 at least. Not sure if terraform has any information on what their state file is suppose to look like. I couldn't find any schematics on their site, but maybe their github repo has more information about that.
Terraform state is technically an internal API, so I can't fault them too much about not publishing a spec. They make backwards incompatible changes often, and that's totally understandable for this sort of thing.
My guess is we should restore the IsRemote
check. Are you interested in submitting a quick PR for that?
As for why we are using a local backend [...]
The main reasons to NOT use a local backend are:
You'll forget to git pull
before making a change or git push
after making one. That means someone will inevitably have stale state and end up blowing away something important.
Terraform state may contain secrets. It sounds like you don't have this issue for now, but it's remarkably simple to add something that has secrets and not think about it.
Locking. Right now, two of your devs could run apply
at the same time on the same state, and you'll get a race condition, with some state being clobbered.
IMO, the local
backend has so many potential problems for teams and S3 is such a minor dependency that's easy to replace, that it's a total no brainer to use it.
I haven't forgot about this. It would like to fix it, but currently I'm swamped with work. I may have some time in the coming weekends to get a PR with a fix. I also haven't done Go development, so I would need to get up to speed which is why I haven't participated yet.
I just hit this bug when I was following the readme on terragrunt, for setting up the remote state in s3, but the bucket says there is nothing there and the dynamo table is empty. Is there a remote state I can manually nuke to get past this bug? There would be nothing in it that needs saving.
Stack trace is nearly identical to my eyes, but I'm not a go-dev, so I'll post it just in case:
[terragrunt] 2017/11/14 10:54:03 runtime.errorString runtime error: invalid memory address or nil pointer dereference
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/errors/errors.go:72 (0x10ed113)
/usr/local/Cellar/go/1.9.2/libexec/src/runtime/asm_amd64.s:509 (0x1056dab)
/usr/local/Cellar/go/1.9.2/libexec/src/runtime/panic.go:491 (0x102b2c3)
/usr/local/Cellar/go/1.9.2/libexec/src/runtime/panic.go:63 (0x102a1ce)
/usr/local/Cellar/go/1.9.2/libexec/src/runtime/signal_unix.go:367 (0x1041c8c)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/remote/remote_state.go:93 (0x14489a2)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/remote/remote_state.go:79 (0x144881e)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:473 (0x14f85fc)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:283 (0x14f7228)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:256 (0x14f710f)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:402 (0x14f7b03)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/cli/download_source.go:363 (0x14fb000)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/cli/download_source.go:92 (0x14f95a2)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/cli/download_source.go:54 (0x14f924f)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:212 (0x14f6c31)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:196 (0x14f6ad0)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:177 (0x14f679e)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/vendor/github.com/urfave/cli/app.go:502 (0x1497b52)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/vendor/github.com/urfave/cli/app.go:268 (0x1495a23)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/main.go:20 (0x14fc2d4)
/usr/local/Cellar/go/1.9.2/libexec/src/runtime/proc.go:195 (0x102d0e6)
/usr/local/Cellar/go/1.9.2/libexec/src/runtime/asm_amd64.s:2337 (0x10595f1)
By ditching the remote state configuration and falling back to a local backend, I was able to get things up and running for now. Not ideal, but it's at least allowing me to keep working with terragrunt. Currently I'm just prototyping replacing our aws stack with terragrunt so remote state management isn't crucial. Hopefully whatever was wrong with my configuration will become easier to diagnose in the future :)
We're hitting this same error with an S3 state backend configured, not local.
What is happening here? First of all, my S3 state has not changed. Then, it tries to import it and crashes! :(
terragrunt version v0.13.7 Terraform v0.10.7