gruntwork-io / terragrunt

Terragrunt is a flexible orchestration tool that allows Infrastructure as Code written in OpenTofu/Terraform to scale.
https://terragrunt.gruntwork.io/
MIT License
7.94k stars 965 forks source link

[Feature Request] Propagating plan-all that works with dependencies #1127

Open vikas027 opened 4 years ago

vikas027 commented 4 years ago

Environment

➜  ~ terraform --version 
Terraform v0.12.24
➜  ~ terragrunt --version
terragrunt version v0.23.7

Problem

My apologies in advance if this has been answered before. I am a long time user of Terraform just starting with Terragrunt. So far, I have been able to use it fairly well, the only problem I am facing is referring to the dependent resources of the module. I have read the docs but looks like I am missing something here.

I have two modules vpc and sg-ec2 (security group for EC2), none of which have been applied. I can see terragrunt runs plan on the vpc module first and then on the sg-ec2 module which is good but sg-ec2 module fails with below error.

Error: InvalidVpcID.NotFound: The vpc ID 'temporary-dummy-id' does not exist
        status code: 400, request id: 27566d9d-ed8f-4f62-9799-5afe2de78f38

  on main.tf line 48, in data "aws_vpc" "selected":
  48: data "aws_vpc" "selected" {

I have forked terragrunt-infrastructure-live-example and stripped down to a very basic structure and pushed to my branch.

I am assuming this is a very common use case in terraform. What is the recommended terragrunt way to tackle this?

yorinasub17 commented 4 years ago

Assuming you are focusing on plan, we unfortunately don't have a solution for this. This is a known issue and pain point with plan-all that has been around for a while in this repo, but unfortunately is not a very easy problem to solve. Terraform does not provide us with a lot of tooling to implement cross state file plans, which makes it almost impossible to get a propagating plan-all using only terraform calls, which is what we want here.

As a workaround, you either need to:

Internally at Gruntwork we don't use xxx-all commands on a daily basis, and instead only use apply-all for the purpose of standing up a well known architecture template from scratch (something that has been run many many many times such that we have good confidence in the plan without seeing it). In fact, the xxx-all variants of the commands in terragrunt were added for this use case. For day to day usage we rely on only running plan and apply on the modules that changed, and when there are dependencies, we work through it manually. This is typically not an issue for us because most changes only span a few dependencies in the tree so the manual apply steps are, while tedious, not onerous to the point of wanting to avoid it.


I renamed the issue to track this feature since we don't really have a global one and this issue description is pretty good to highlight the problem. This should be used to track the propagating plan-all problem and discussions regarding solutions and workarounds to it should go here.

vikas027 commented 4 years ago

Thanks for the detailed explanation @yorinasub17 , I was under and impressions that I am missing something.

Cheers.

yorinasub17 commented 4 years ago

Keeping this open, since this ticket is now the ticket for tracking the feature request to have a propagating plan-all.

kromol commented 4 years ago

I also faced this issue, it happens if there are changes in two related module. Initially I was under impression that mock_outputs part of dependency block should be able to handle this issue, but seems like terragrunt does not use mocked values if output key is missing in the original output.

@yorinasub17 can using mocked values regardless of whether the key exist in the original output or not be a quick fix for this?

yorinasub17 commented 4 years ago

See https://github.com/gruntwork-io/terragrunt/issues/940#issuecomment-610108712 for my previous thoughts on that suggestion.

dudicoco commented 4 years ago

@vikas027 @kromol why are you using data resource with terragrunt? The correct way would be to have an input parameter such as vpc_id and use the terragrunt dependency block to get the value for this resource. This still doesn't solve the issue of propagating a changed plan-all, but it will make sure the plan-all execution cannot break.

dudicoco commented 4 years ago

@yorinasub17 I wonder if Terraform 0.13 solves this issue? According to the change log here https://github.com/hashicorp/terraform/releases/tag/v0.13.0: The terraform plan and terraform apply commands will now detect and report changes to root module outputs as needing to be applied even if there are no resource changes in the plan. This is an improvement in behavior for most users, since it will now be possible to change output blocks and use terraform apply to apply those changes.

I didn't try Terraform 0.13 yet, did anyone test the new behaviour with Terragrunt plan-all?

yorinasub17 commented 4 years ago

AFAIK, TF 0.13 won't help here because the outputs are crossing state boundaries and terraform is still a tool to operate within a single state file. This is specifically addressing the issue where a module block within terraform didn't reflect the outputs until you ran apply, but it doesn't address the issue where the planned output change doesn't show up in the current state and plan, and thus terragrunt can't feed forward the future view.

dudicoco commented 3 years ago

@yorinasub17 what do you think about making terragrunt pull the outputs during plan-all from the dependency's plan outputs if these show changes? Otherwise pull them from the state file as usual.

dudicoco commented 3 years ago

@yorinasub17 brikis98 what are your thoughts on this suggestion?

yorinasub17 commented 3 years ago

Hi sorry I missed that comment.

Looking at your suggestion, I don't think that will work because the planned outputs do not include the actual expected outputs. Moreover, there isn't a way to contaminate the plan through input vars in terraform. That is, if there is a (known after apply) output change, then there is no way to mark that metadata forward to contaminate the full plan. The most we can know from the output changes is if there will be some changes downstream, but there won't be any way for us to know what those changes would be. Unfortunately, Terraform just doesn't give us features that help with multi-state file plans like that.

dudicoco commented 3 years ago

@yorinasub17 can't we just use the value from mock_outputs whenever a change is detected on the depndency block's outputs?

dudicoco commented 3 years ago

Also, in some cases the plan's new outputs will show the true outputs and not just (known after apply) , for example: vpc_cidr_block = "10.10.0.0/16" -> "10.11.0.0/16"

yorinasub17 commented 3 years ago

We could use the mock_outputs when the change is detected, but I think that will be more cumbersome to work with as a user, as now you are overloading mock_outputs for two purposes: one to force a plan when something upstream is changing, and one to provide mocks for commands like validate where you want to run it without dependency outputs.

We could add a changed_plan_outputs or something like that to segregate the two, but that bloats the terragrunt.hcl dependency block further. FWIW, the mock_outputs by itself is already a cumbersome thing to specify that we are hoping to get rid of at some point with intelligent data generation.

Also, in some cases the plan's new outputs will show the true outputs and not just (known after apply) , for example: vpc_cidr_block = "10.10.0.0/16" -> "10.11.0.0/16"

Yes I am aware of this, but the fact that it can't consistently provide this makes it hard to build a feature around this. We can't just rely on the values that are provided and ignore the cases where you get (known after apply), because then the feature will lead to surprises to the user when you do get (known after apply). Just one case where the promise is broken is enough to instill distrust in the user for the tool, and thus avoid using the feature altogether because it is "broken". So whatever we decide to do here, it has to solve the problem fully and lead to consistent experiences across the use cases.

dudicoco commented 3 years ago

@yorinasub17 since mock_outputs is already used in plan-all (and not not just in validate) I think it provides an excellent solution to the problem. So basically we will end up with a consistent behaviour from the plan-all command, when either the output doesn't exist yet or is about to change.

At least until mock_outputs is replaced with intelligent data generation :)

yorinasub17 commented 3 years ago

FWIW, we (Gruntwork) don't use mock_outputs with plan-all (or plan-all for that matter) because it doesn't work in practice for complicated modules, especially when you are doing things like look up related values with data sources. Choosing the right mocks to get valid plans are in general, a huge pain. Obviously, YMMV.

The other reason for my hesitation for this is that this has a potentially dangerous implication, where folks will think that you can use plan files with this (we already have tons of examples by users who combine plan-all with plan files). That is, users will use plan-all with plan file generation, review the plan, and assume it will use the updated values for the outputs. But this doesn't work if you are using mocks because the plan file is generated using the mock outputs, and thus you can cause damage to your infrastructure by applying a plan that is based on mocks. This has the element of surprise which currently doesn't exist because the plan-all will output plan files where the downstream won't be touched, which is safer than applying broken plans.

dudicoco commented 3 years ago

@yorinasub17 how about putting this feature behind a flag then? This will make sure that the person is aware of the new functionality. We are using plan-all all the time in our CI, with multiple include-dir statements to include only the dirs which were changed in the commit. For mock_outputs we use values of "(known after apply-all)", this conforms to the terraform "(known after apply)" to give a good indication of what's gonna change. We don't use data sources to look up existing resources in order to not break the plan-all. There are some use cases where a more strict value needs to be used in mock_outputs due to validation of the provider (cidr block for example), but in most cases this works. Having a propagating plan-all is the missing piece of the puzzle and would be a great addition to Terragrunt! :)

yorinasub17 commented 3 years ago

I am open to the idea, but whatever we do here needs to address the concerns I brought up above. I know this is a much requested feature, but the implementation needs to work or otherwise, we get in situations where there are lots of surprises and gotchas like the current plan-all implementation.

I think what we need here is an RFC (similar to the imports RFC) that walks through all the implications and use cases of situations if and when values are available for this idea so we have a single document covering the bases. This will make it easier to discuss the pros and cons of such an approach.

geekofalltrades commented 1 year ago

It seems like a bite-sized piece of this could be a flag for run-all that only runs a certain group of dependencies.

Given these:

INFO[0012] The stack at /my/path will be processed in the following order for command apply:
Group 1
- Module /my/path/module1
- Module /my/path/module2

Group 2
- Module /my/path/module3
- Module /my/path/module4
- Module /my/path/module5

Group 3
- Module /my/path/module6

A flag like --terragrunt-group 1 would run the run-all only across the modules in the first group. This way you could run-all plan and then run-all apply across each group and get reliable output, then increment the value you're passing to the flag to go through each group until you're done.

I would guess that this is the exact use case that issues like #2016 and #2522 are trying to do themselves, and it seems like a pretty easy first step. But you would potentially want/need the output those issues request so that you can know ahead of time how many iterations you're going to do in an automation context.

(And it goes without saying that I have this use-case, as well.)

Quixotical commented 2 months ago

I have a situation where I ran a plan/apply with a value that had an invalid character in it. Seeing the mistake in the apply, I updated the value but now the plan fails because it references the invalid value I had previously used.

Would the recommended way to resolve this be to first run a plan/apply on the specific terragrunt module ("module 1") that had the invalid character so that it's state file is updated, and then run a run-all plan at which point the modules that use "module 1" would reference the correct state file data and use the updated value?

lmayorga1980 commented 1 week ago

In the case of unit testing a terragrunt(terratest driven) deployment with dependencies with controlled test data, would it be possible to support a terragrunt run-all apply --only-dependencies so in this case i can perform a successful plan on the targeted terragrunt.hcl