gruntwork-io / terragrunt

Terragrunt is a flexible orchestration tool that allows Infrastructure as Code written in OpenTofu/Terraform to scale.
https://terragrunt.gruntwork.io/
MIT License
8.08k stars 981 forks source link

Crash: "runtime error: invalid memory address or nil pointer dereference" #316

Closed dgarstang closed 6 years ago

dgarstang commented 7 years ago

What is happening here? First of all, my S3 state has not changed. Then, it tries to import it and crashes! :(

terragrunt version v0.13.7 Terraform v0.10.7

doug@ADMINs-MacBook-Pro-2 widgetcorp (master) [widgetcorp_eu] $ tg apply --var-file ~/tfvars/widgetcorp.tfvars
[terragrunt] [/Users/doug/git/terraform/terraform-live/terragrunt/eu-central-1/euprod/widgetcorp] 2017/10/12 08:32:09 Running command: terraform --version
[terragrunt] 2017/10/12 08:32:09 Reading Terragrunt config file at /Users/doug/git/terraform/terraform-live/terragrunt/eu-central-1/euprod/widgetcorp/terraform.tfvars
[terragrunt] 2017/10/12 08:32:09 WARNING: no double-slash (//) found in source URL /Users/doug/git/terraform/terraform-modules/widgetcorp. Relative paths in downloaded Terraform code may not work.
[terragrunt] 2017/10/12 08:32:09 Cleaning up existing *.tf files in /var/folders/jr/b3nl0vcj7hgclvbfbt2mdrhw0000gp/T/terragrunt/FUYc4vvbbeeAusv9CM_VpOpxY0M/TcDLwgb-0LmdbdvcWfM7FNWB6UM
[terragrunt] 2017/10/12 08:32:09 Downloading Terraform configurations from file:///Users/doug/git/terraform/terraform-modules/widgetcorp into /var/folders/jr/b3nl0vcj7hgclvbfbt2mdrhw0000gp/T/terragrunt/FUYc4vvbbeeAusv9CM_VpOpxY0M/TcDLwgb-0LmdbdvcWfM7FNWB6UM using terraform init
[terragrunt] [/Users/doug/git/terraform/terraform-live/terragrunt/eu-central-1/euprod/widgetcorp] 2017/10/12 08:32:09 Backend s3 has not changed.
[terragrunt] [/Users/doug/git/terraform/terraform-live/terragrunt/eu-central-1/euprod/widgetcorp] 2017/10/12 08:32:09 Running command: terraform init -backend-config=bucket=sws-tfstate -backend-config=key=widgetcorp/terraform.tfstate -backend-config=region=us-west-1 -backend-config=encrypt=true -from-module=file:///Users/douglasgarstang/git/terraform/terraform-modules/widgetcorp /var/folders/jr/b3nl0vcj7hgclvbfbt2mdrhw0000gp/T/terragrunt/FUYc4vvbbeeAusv9CM_VpOpxY0M/TcDLwgb-0LmdbdvcWfM7FNWB6UM
Copying configuration from "file:///Users/doug/git/terraform/terraform-modules/widgetcorp"...
Downloading modules...
Do you want to copy the state from "s3"?
  Terraform has detected you're unconfiguring your previously set backend.
  Would you like to copy the state from "s3" to local state? Please answer
  "yes" or "no". If you answer "no", you will start with a blank local state.

  Enter a value: yes

Do you want to copy state from "s3" to "local"?
  Pre-existing state was found in "s3" while migrating to "local". No existing
  state was found in "local". Do you want to copy the state from "s3" to
  "local"? Enter "yes" to copy and "no" to start with an empty state.

  Enter a value: yes

Successfully unset the backend "s3". Terraform will now operate locally.

Initializing provider plugins...

The following providers do not have any version constraints in configuration,
so the latest version was installed.

To prevent automatic upgrades to new major versions that may contain breaking
changes, it is recommended to add version = "..." constraints to the
corresponding provider blocks in configuration, with the constraint strings
suggested below.

* provider.template: version = "~> 1.0"

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
[terragrunt] 2017/10/12 08:32:56 Copying files from /Users/doug/git/terraform/terraform-live/terragrunt/eu-central-1/euprod/widgetcorp into /var/folders/jr/b3nl0vcj7hgclvbfbt2mdrhw0000gp/T/terragrunt/FUYc4vvbbeeAusv9CM_VpOpxY0M/TcDLwgb-0LmdbdvcWfM7FNWB6UM
[terragrunt] 2017/10/12 08:32:56 Setting working directory to /var/folders/jr/b3nl0vcj7hgclvbfbt2mdrhw0000gp/T/terragrunt/FUYc4vvbbeeAusv9CM_VpOpxY0M/TcDLwgb-0LmdbdvcWfM7FNWB6UM
[terragrunt] 2017/10/12 08:32:56 runtime error: invalid memory address or nil pointer dereference
[terragrunt] 2017/10/12 08:32:56 Unable to determine underlying exit code, so Terragrunt will exit with error code 1
dgarstang commented 7 years ago

Additional info with debug enabled.

[terragrunt] 2017/10/12 09:10:20 runtime.errorString runtime error: invalid memory address or nil pointer dereference
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/errors/errors.go:72 (0x620a1)
/usr/local/go/src/runtime/asm_amd64.s:479 (0x55d7c)
/usr/local/go/src/runtime/panic.go:458 (0x29473)
/usr/local/go/src/runtime/panic.go:62 (0x27fcd)
/usr/local/go/src/runtime/sigpanic_unix.go:24 (0x3e4c4)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/remote/remote_state.go:94 (0xc14da)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/remote/remote_state.go:79 (0xc134e)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:441 (0x5d71c)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:319 (0x5c78f)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:291 (0x5c629)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:230 (0x5c024)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:213 (0x5beee)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:190 (0x5be00)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:171 (0x5baec)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/vendor/github.com/urfave/cli/app.go:485 (0x99ea4)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/vendor/github.com/urfave/cli/app.go:259 (0x97cbf)
/home/ubuntu/.go_workspace/src/github.com/gruntwork-io/terragrunt/main.go:20 (0x2104)
/usr/local/go/src/runtime/proc.go:183 (0x2b144)
/usr/local/go/src/runtime/asm_amd64.s:2086 (0x588a1)

[terragrunt] 2017/10/12 09:10:20 Unable to determine underlying exit code, so Terragrunt will exit with error code 1

Seems to work when I supply --terragrunt-source-update

brikis98 commented 7 years ago

Oof, that's no fun. Thank you for the stack trace. It seems to point to this line in remote_state.go. The remoteState variable can't be the culprit, as there is a check for nil earlier. That means it must be from the existingState, which is parsed using this method, which can return nil. Looks like we need to add a nil check in the NeedsInit or differsFrom method!

PRs very welcome.

bitbier commented 7 years ago

We are also running into this issue. I've also been trying to trace exactly what is happening, but I feel like I'm a bit confused on what is going on.

Some background on what is happening before discussing the problem at hand:

  1. We are using a local backend state instead of a remote state.
  2. I ran my terragrunt apply on my module using the following remote_state configuration in my terraform.tfvars file:
terragrunt = {
  remote_state {
    backend = "local"
    config {
      path = "${get_tfvars_dir()}/terraform.tfstate"
    }
  }
  ...
}
  1. I ran terragrunt apply and it failed to run because of a IAM permission issue. This is fine and expected, I didn't have the correct permissions to being with.
  2. I updated the permissions on my AWS account to allow me to move forward
  3. I retried terragrunt apply
  4. I received the same stacktrace as @dgarstang.

Digging into the source code, I see the various points in the code where the state file is being parsed. I see what appears to be the local state file being parsed here. Looking at the parser there, it simply just loads in the file and unmarshals it using a JSON decoder.

After that there is a check on the state variable for nil. So already, the state is not nil when it gets to the differsFrom call. From here, it sends over state.Backend attribute over to it.

Eventually, the whole thing bombs (here)[https://github.com/gruntwork-io/terragrunt/blob/master/remote/remote_state.go#L94].

Now, I'm trying to figure out exactly what is messed up.

We have two variables being compared existingBackend.Type and remoteState.Backend. From tracing the code, we already know that existingBackend == state.Backend, from which state is not nil from the previous function and we at least know that remoteState is also non-nil as the function would not have been called (I don't know Go, but I assume that this is the case). Also, I assume that remoteState == my local state file, but I'm unclear how this one is parsed.

Anyways, if state is just a unmarshaled JSON file, I assume then that state.Backend is getting the key from the root of the JSON file. Inspecting the terraform.tfstate file, the root keys that I see are:

{
    "version": 3,
    "terraform_version": "0.10.3",
    "serial": 1,
    "lineage": ...,
    "modules": [
        ...
    ]
}

It appears that Backend isn't a key in the root document. I'm not sure if the reason this is erroring is because of the initial terragrunt apply failed or something else. I also inspected other state files that we have lying around and Backend never appears to be a key in the root level of the JSON tfstate file as well.

bitbier commented 7 years ago

Ah, so interesting, we recently upgraded terragrunt from v1.30.0 -> v1.30.9. It looks like in the older version we were running v1.30.0, there is a check for the state being a remote state. However, this got removed in v1.30.6.

I'm not sure if the newer terraform versions include the new Backend key, but the check in IsRemote() accounted for this. But, it doesn't appear to be running this anywhere.

brikis98 commented 7 years ago

@bitbier Good find! It's a bit odd that Terraform state doesn't include a backend key even when you explicitly configure the backend to be local. Does any of the state file config get stored for a local backend (e.g., path)? If not, then there is no reason to compare them, and the IsRemote check should be restored.

BTW, why are you using a local backend?

bitbier commented 7 years ago

Looking at a few terraform state files, it looks like there isn't anything that indicates the backend. Again, this is for terraform 0.10.3 at least. Not sure if terraform has any information on what their state file is suppose to look like. I couldn't find any schematics on their site, but maybe their github repo has more information about that.

As for why we are using a local backend, it mainly comes down to ease of use and compatibility with other providers. At our company, we try not to use vendor specific sources unless its absolutely necessary so that we can move to another cloud provider with out much hassle. We are a small enough team that there is only 3 or so people running terraform with plans to build out an automation tool for it so that it can be run in one place. We also like having history local without needing to write S3 tool for pulling back different versions of the state files if we need it. We also don't keep any sensitive data in our state files (no RDS instances or services that require generated passwords to be saved). Our git repo is already backed up nightly to S3 (which could easy be GCP storage) so we already have the backup/persistence handled in case something goes wrong, but if our git server goes down we have more problems than just our state file being gone.

Eventually we may use the S3 backends, but for now the use-case for us is just not there. Although we may try the consul backend or vault backend once that is supported since we would be running those services no matter the provider.

brikis98 commented 7 years ago

Looking at a few terraform state files, it looks like there isn't anything that indicates the backend. Again, this is for terraform 0.10.3 at least. Not sure if terraform has any information on what their state file is suppose to look like. I couldn't find any schematics on their site, but maybe their github repo has more information about that.

Terraform state is technically an internal API, so I can't fault them too much about not publishing a spec. They make backwards incompatible changes often, and that's totally understandable for this sort of thing.

My guess is we should restore the IsRemote check. Are you interested in submitting a quick PR for that?

As for why we are using a local backend [...]

The main reasons to NOT use a local backend are:

  1. You'll forget to git pull before making a change or git push after making one. That means someone will inevitably have stale state and end up blowing away something important.

  2. Terraform state may contain secrets. It sounds like you don't have this issue for now, but it's remarkably simple to add something that has secrets and not think about it.

  3. Locking. Right now, two of your devs could run apply at the same time on the same state, and you'll get a race condition, with some state being clobbered.

IMO, the local backend has so many potential problems for teams and S3 is such a minor dependency that's easy to replace, that it's a total no brainer to use it.

bitbier commented 7 years ago

I haven't forgot about this. It would like to fix it, but currently I'm swamped with work. I may have some time in the coming weekends to get a PR with a fix. I also haven't done Go development, so I would need to get up to speed which is why I haven't participated yet.

robacarp commented 7 years ago

I just hit this bug when I was following the readme on terragrunt, for setting up the remote state in s3, but the bucket says there is nothing there and the dynamo table is empty. Is there a remote state I can manually nuke to get past this bug? There would be nothing in it that needs saving.

Stack trace is nearly identical to my eyes, but I'm not a go-dev, so I'll post it just in case:

[terragrunt] 2017/11/14 10:54:03 runtime.errorString runtime error: invalid memory address or nil pointer dereference
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/errors/errors.go:72 (0x10ed113)
/usr/local/Cellar/go/1.9.2/libexec/src/runtime/asm_amd64.s:509 (0x1056dab)
/usr/local/Cellar/go/1.9.2/libexec/src/runtime/panic.go:491 (0x102b2c3)
/usr/local/Cellar/go/1.9.2/libexec/src/runtime/panic.go:63 (0x102a1ce)
/usr/local/Cellar/go/1.9.2/libexec/src/runtime/signal_unix.go:367 (0x1041c8c)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/remote/remote_state.go:93 (0x14489a2)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/remote/remote_state.go:79 (0x144881e)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:473 (0x14f85fc)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:283 (0x14f7228)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:256 (0x14f710f)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:402 (0x14f7b03)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/cli/download_source.go:363 (0x14fb000)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/cli/download_source.go:92 (0x14f95a2)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/cli/download_source.go:54 (0x14f924f)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:212 (0x14f6c31)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:196 (0x14f6ad0)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/cli/cli_app.go:177 (0x14f679e)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/vendor/github.com/urfave/cli/app.go:502 (0x1497b52)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/src/github.com/gruntwork-io/terragrunt/vendor/github.com/urfave/cli/app.go:268 (0x1495a23)
/private/tmp/terragrunt-20171113-67001-1wqnhvt/terragrunt-0.13.20/main.go:20 (0x14fc2d4)
/usr/local/Cellar/go/1.9.2/libexec/src/runtime/proc.go:195 (0x102d0e6)
/usr/local/Cellar/go/1.9.2/libexec/src/runtime/asm_amd64.s:2337 (0x10595f1)
robacarp commented 6 years ago

By ditching the remote state configuration and falling back to a local backend, I was able to get things up and running for now. Not ideal, but it's at least allowing me to keep working with terragrunt. Currently I'm just prototyping replacing our aws stack with terragrunt so remote state management isn't crucial. Hopefully whatever was wrong with my configuration will become easier to diagnose in the future :)

TimJones commented 6 years ago

We're hitting this same error with an S3 state backend configured, not local.