gruntwork-io / terragrunt

Terragrunt is a flexible orchestration tool that allows Infrastructure as Code written in OpenTofu/Terraform to scale.
https://terragrunt.gruntwork.io/
MIT License
7.96k stars 967 forks source link

Inconsistent dependency lock file #2646

Closed rromic closed 1 year ago

rromic commented 1 year ago

Describe the bug We are experiencing error when trying to do apply with provided plan file. From what we saw, on resources wirh mutliple dependencies we sporadicaly seeing this error message:

╷ │ Error: Inconsistent dependency lock file │ │ The given plan file was created with a different set of external dependency │ selections than the current configuration. A saved plan can be applied only │ to the same configuration it was created from. │ │ Create a new plan from the updated configuration. ╵

Also worth noticing that we are using shared cache folder on locally filesystems and without provider lock file in code repository.

An addition we are using for init following command: terragrunt run-all init -input=false --terragrunt-non-interactive --terragrunt-include-external-dependencies --terragrunt-parallelism 1;

Just before apply command we do have a plan command: terragrunt plan which also does have same init command as above, but the output of this plan command is not being used into apply command, but rather we are pull plan from artifactory store which got created while open PR on github.

So it kinda hard to understand what actually is happening here, so we would like to know if there is a way how to overcome this situation or at least what is causing the issue.

Versions

Thank you very much in advance!

Rwwn commented 1 year ago

Same here, except for the artifactory part. I'm running a plan in our CI/CD to generate a plan file, then passing the whole Terragrunt repo as an artifact (including the .lock file) to the apply stage, which occasionally fails with Error: Inconsistent dependency lock file. Rerunning the plan and apply usually gets it through, but I'm not sure what the root cause is. We've recently started using a shared plugin cache, I imagine it's something to do with that. Terragrunt version 0.45.12 and Terraform 1.3.1 for me.

denis256 commented 1 year ago

Hi, I suspect it may be because of different provider versions for different platforms

Reference: https://github.com/gruntwork-io/terragrunt/issues/2584#issuecomment-1569022915

rromic commented 1 year ago

Hi, I suspect it may be because of different provider versions for different platforms

Reference: #2584 (comment)

I kinda dont think that is a problem here. All out CICD Runners are same OS (Amazon Linux 2, basically same AMI image id). The error is happening on this line

https://github.com/hashicorp/terraform/blob/ca85d3bf8560762105202ada1894c75c10a40cbe/internal/backend/local/backend_local.go#L273

but its hard to fully understand why its happening from time to time. And as @Rwwn mentioned same solution is for us, we are then basically bumping same file and the cicd passes fine.

Also the same set of providers are being used (we are using strict provider versions on same OS) also not preserving lock file into git.

denis256 commented 1 year ago

Can be shared example repository/commands that lead to this issue? I was experimenting with https://github.com/denis256/terragrunt-tests/tree/master/plan-multiple-modules but still can't get same error message

rromic commented 1 year ago

Can be shared example repository/commands that lead to this issue? I was experimenting with https://github.com/denis256/terragrunt-tests/tree/master/plan-multiple-modules but still can't get same error message

so repo structure is kinda kike this

base/

and for example root-module-4 has dependency (dependency block) to all 3 root modules.

we are running run-all init with parallelism 1, so every root module is initialized sequentialy one by one. This way we are making things safier to not download providers at same time to cached folder which could lead to wrong hash sum of binary (in case that all. 4 root modules are init at same time and if cache is not populated at very begining).

Then we are running only terragrunt plan for root-module-4 (in this folder there is a change) and then we are running terragrunt apply -plan with provided plan.out from plan step which errors with above error message. Also as mentioned at first place, we are running terragrunt plan just before terragrunt apply -plan but not using the plan.out from that plan command but rather downloading stored plan from pull request.

For clarification, we are using run-all only for init phase, and for plan and apply just terragrunt plan and apply.

We are needing second terrugrunt plan just because we do have a lambda functions and using data source archive which only generate zips on plan and not on apply, so to make zip present on filesystem before apply (but this is not tight to the issue itself) just giving some idea how is our setup

rromic commented 1 year ago

looks like that in case we have cached folder with some provider version which is used for plan and later on apply if another runner is used with cached folder but with another provider version (in this case we dont have strict patches version set) so it will result with above error. Bcs used provider version on apply and before on plan are different.

So in out case we will store lock files to artifactory store during first plan and pull them later on appy phase to actually use same set of provider versions.

Will inform you if we hit same issue with this approach or if this will be solution for this error.

rromic commented 1 year ago

I can confirm that we got this work as described above. We have run a lot of pipelines with saved and loaded lock file and we havent faced same issue again!

So it seems that crucial part with using shared cache folder is to have either full strict provider version (even the patch versions) or lock file stored as part of git or store lock file to artifactory store after plan and download it on apply (so to use the same set to providers).

I will close this issue since we found a reason and solution for this case.

Hopefully it will help others on their journey with this issue :)