hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io
Other
43.16k stars 9.58k forks source link

Support moved addresses based on non-static values #31335

Open skeggse opened 2 years ago

skeggse commented 2 years ago

Current Terraform Version

Terraform v1.2.3
on darwin_amd64
+ provider registry.terraform.io/hashicorp/aws v4.8.0
+ provider registry.terraform.io/hashicorp/random v3.3.2

Use-cases

Let's say I want to restructure my Terraform configuration, such that my aws_subnet resources are keyed by CIDR block. In other words, I might desire a structure that looks like:

module.network.aws_subnet.subnet["10.3.0.0/21"]
module.network.aws_subnet.subnet["10.3.8.0/21"]
module.network.aws_subnet.subnet["10.3.16.0/21"]
module.network.aws_subnet.subnet["10.3.24.0/21"]
module.network.aws_subnet.subnet["10.3.32.0/21"]

This makes it fairly convenient to review plans, and to address the right resources when targeting applies.

If I have hundreds of workspaces using the same structure, and that share the same underlying code, I might vary some of this based on the workspace. For example, 10.X.Y.0/21 where X depends on which workspace is being planned. This might be sourced via Terraform Cloud from var.cidr_offset or something:

locals {
  cidr = cidrsubnet("10.0.0.0/8", 8, var.cidr_offset)
}

resource "aws_subnet" "subnet" {
  for_each = toset(
    cidrsubnets(local.cidr, [for i in range(5) : 5]...)
  )

  # ...

  cidr_block = each.value
}

However, my prior state might not be using this format. Instead, it might be doing something like us-east-1a-1 instead of 10.3.0.0/21. It'd be nice to be able to move the resources in these hundreds of workspaces to their appropriate destinations by writing a single set of moved blocks.

Attempted Solutions

Try to use a dynamic (variable-derived) value for the resource address:

moved {
  from = module.network.aws_subnet.subnet["us-east-1f-1"]
  to   = module.network.aws_subnet.subnet["10.${var.cidr_offset}.32.0/21"]
}

Manually update hundreds of workspaces.

Proposal

Support dynamic values derived from runtime state (e.g. just allow var.xxyy, but maybe not resource.abc expressions).

skeggse commented 2 years ago

~Oops, sorry - didn't finish filling out the template. Updating.~

apparentlymart commented 2 years ago

Hi @skeggse! Thanks for sharing this use-case.

The compromise of allowing only static move rules arose from having to work within some other constraints, so I think in order to make this more dynamic we'd need to either find a new design that meets the constraints or find a way to change the constraints. With that in mind, I'm going to record here some context about how moved blocks work today and why they work that way, and then I suppose this issue will serve as a place for us to think about and discuss ways we can either work within or bend the constraints to meet the use-case.

The current (as of Terraform v1.1, but there weren't any relevant changes in v1.2) sequences of events for handling moved blocks is the following:

  1. Parse and decode the configuration, as always.
  2. Build a graph of the moved blocks in the configuration, which takes into account situations where e.g. the to of one block refers to the from of another, which we call "chained moves", and where the from of one block is contained within the to of another.
  3. Walk the move graph and rewrite the addresses in the in-memory state to follow what the moved blocks describe, where necessary and where possible. (If there isn't any object at the from address, or if there's already an object at the to address, a moved block has no effect.)

    At the end of this step, the in-memory state is now exclusively using the new addresses, and has no record of the old addresses. In other words, it's as if Terraform secretly ran terraform state mv a number of times but is keeping the result only in memory and hasn't committed it anywhere yet.

  4. Build a normal planning graph, which is essentially the same algorithm Terraform has been using since before config-driven move existed. Terraform generates different graph objects for an instance that exists in both the state and the configuration than it does for when the instance is only in the configuration or only in the state, so this relies on the fact that we already rewrote the in-memory state in order to produce a correct graph.
  5. Generate a plan by walking the planning graph. This is the first point at which Terraform can evaluate expressions, because expressions must be evaluated in a correct order based on the dependency graph in order to produce correct results.
  6. Finally, once we have generated a plan we revisit all of the moved blocks in the configuration and confirm that they are valid. Leaving validation to the end like this is a little weird, but I'll say more below about why it works this way.

There are some important ordering constraints that informed the above ordering:

I think that's full coverage of all of the constraints that led to the current design, though I'm happy to answer questions if something isn't clear.


To support dynamic expressions in moved block from or to would therefore require somehow avoiding or bending the rules above, so that either we handle moved blocks during the planning graph walk or we can somehow safely evaluate expressions in the moved block preprocessing phase.

When thinking about that it's important to realize that the only kind of reference that can never potentially depend on a resource action in Terraform today is a root module input variable. A child module's input variables can be derived from any data the caller has access to, and so evaluation of those variables is subject to graph-based ordering just like any other expression in the configuration.

When we were designing config-driven move we discussed a slightly different use-case to the one you presented here which runs into the same constraints: if a particular shared module has a systematic rule for deriving for_each keys from an input variable, it cannot possibly predict and hard-code those instance keys in moved blocks, because the actual keys are chosen (indirectly) by the module's caller. It would be ideal if the shared module could similarly define a systematic rule for how the instance keys are derived so that a moved block could work regardless of the input variable values, but it isn't clear how to achieve that within the above constraints or how to bend the constraints to support it.

I must admit I had not directly considered the situation you are describing, but on reflection I suppose it is a variation of the same idea: creating multiple workspaces that all share the same configuration is more or less the same as multiple different calls to the same shared module for the purposes of this discussion, with the main difference being that it's Terraform Cloud's control plane that is deciding the input variable values, instead of a dynamic expression in a calling module.

One notable difference though is that we are guaranteed today that root module input variables are always dependency leaves: they will never depend on anything else in the planning graph. That does therefore give us some room for rules-bending, but it would come at the expense of making a special exception that applies only to root module input variables in particular, and that would be unfortunate since we've been aiming to reduce rather than increase the differences in capabilities between root vs. shared modules. :thinking:

Another possible way to think about the problem is to imagine Terraform Cloud's control plane as being the topmost "calling module" and pretend as if it somehow had moved blocks declared inside it that reach into the root module from the outside, in a similar sense as how moved blocks can specify source and/or addresses in child modules. That would require adding some new capabilities to Terraform Cloud so it can itself have a record of historical moves within a workspace, and require us to give Terraform Cloud a way to tell Terraform Core about the moves it knows about when it creates a plan. While that does seem like a relatively heavy lift given that it crosses an architectural boundary, one interesting benefit of that is the possibility that it could in future become a building block for moving objects between workspaces, by exploiting Terraform Cloud's wider view of the world as compared to Terraform Core.


For the moment I think this remains in a "more design required" state, and so isn't ready for any particular action yet. I hope the above information is a useful starting point for more design work.

skeggse commented 2 years ago

Thanks for the context, this is roughly what I expected.

While that does seem like a relatively heavy lift given that it crosses an architectural boundary, one interesting benefit of that is the possibility that it could in future become a building block for moving objects between workspaces, by exploiting Terraform Cloud's wider view of the world as compared to Terraform Core.

Cross-workspace resource moves are currently extremely painful, and having this implemented in Terraform Cloud would solve a lot of pain for us. In the interim, it's valuable enough to us that we'll likely implement custom move tooling, though it won't be as as nice as the moved blocks.

korinne commented 2 years ago

@skeggse Hi! Would you be open to chatting more about this issue, and maybe discuss some pains with moving objects between Terraform Cloud workspaces? If so, you can book time directly at my Calendly here, but no pressure whatsoever.

tanasegabriel commented 2 years ago

This would be a great addition. For context, we use workspaces to deploy multi-region (or multi-account) infrastructure using a single TF configuration (multi-statefile monolith). Our TF deployment pipeline will execute plan|apply against multiple workspaces, but backed by the very same configuration. We use "dynamic" locals for region-specific customization and we heavily rely on data sources. As a result, we cannot predict exactly the name of our resources in the statefile (especially under a for_each loop). Because moved blocks are static, we cannot use them in such a way that will cater for multiple workspaces (under the same config). Having support for non-static values will definitely be a plus for us (and other users of this approach).

@korinne I am also open for chatting about our use-case and other possible improvements of the moved statement (e.g. cross-statefile support, remove support, imports?) if the offer is extendible.

korinne commented 2 years ago

@tanasegabriel Yes, the offer is certainly extended! Feel free to grab any time you see on my Calendly here. Thank you in advance, and excited to chat.

rifelpet commented 2 years ago

I have a use-case where i'd like to change the value used in a for_each key without destroying and recreating any resources. In reality the for_each value is sourced dynamically

locals {
  foo = [
    {
      a = 1
      b = 2
    }
  ]
}

# Before
resource "null_resource" "foo" {
  triggers = {
    baz = each.value
  }
  for_each = {
    for bar in local.foo
    : "${bar.a}" => bar
  }
}

# After
resource "null_resource" "foo" {
  triggers = {
    baz = each.value
  }
  for_each = {
    for bar in local.foo
    : "${bar.b}" => bar
  }
}

It'd be great to be able to express this renaming in a moved block, something like:

moved {
  from = null_resource.foo[each.value.a]
  to = null_resource.foo[each.value.b]

  for_each = {
    for bar in local.foo
    : "${bar.a}${bar.b}" => bar
  }
}
apparentlymart commented 2 years ago

Hi @rifelpet! Thanks for sharing this example.

In your real system, when you say "sourced dynamically", can you say a little more about what exactly that means? For example, is it coming from a data source, and if so can you say which one? This is just to help us more fully understand the specific situation in case there's some other way to get it done that might be easier to achieve.

rifelpet commented 2 years ago

Sure thing, the real use-case is a reusable module with variables containing sets of AWS availability zones and Virtual Private Gateway IDs. The module creates one aws_route_table per zone and one aws_vpn_gateway_route_propagation per zone per VGW ID.

Originally this module used count and when refactoring to for_each I had used an attribute of a dependent resource in the for_each key (the route table ID) rather than the key of that dependent resource's for_each (availability zone ID). This lead to the "for_each map includes keys derived from resource attributes that cannot be determined until apply" limitation when applying new invocations of this module (during my initial refactoring, I migrated existing invocations of the module with terraform state mv).

I want to change the aws_vpn_gateway_route_propagation's for_each key from referencing the route table ID to referencing the availability zone ID without destroying + recreating any existing resources. I was hoping moved would support such a case.

variable "zones" {
  default = toset(["..."])
}
variable "vgw_ids" {
  default = toset(["..."])
}

resource "aws_route_table" "foo" {
  for_each = var.zones
}

locals {
  az_vgws = {
    for rt_vgw in [for product in setproduct(var.vgw_ids, var.zones):
      {
        vgw : product[0],
        rt : aws_route_table.foo[product[1]].id,
        zone : product[1],
      }
    ]
    : "${rt_vgw.rt}-${rt_vgw.vgw}" => rt_vgw # I want to change to "${rt_vgw.zone}-${rt_vgw.vgw}"
  }
}

resource "aws_vpn_gateway_route_propagation" "foo" {
  for_each = local.az_vgws
}

moved {
  from = aws_vpn_gateway_route_propagation.foo["${each.value.rt}-${each.value.vgw}"]
  to   = aws_vpn_gateway_route_propagation.foo["${each.value.zone}-${each.value.vgw}"]

  for_each = local.az_vgws
}
Nuru commented 1 year ago

I just ran into something similar. We were creating resources with for_each and found a situation where the keys could be duplicated, so we changed the keys by adding a second value, basically format("%s-%s", local.config.old_key, local.config.disambiguator). So I want something like

moved {
  for_each = local.config_map
  from = aws_lb.default[each.value.old_key]
  to = aws_lb.default[format("%s-%s", each.value.old_key,  each.value.disambiguator)]
}
geekofalltrades commented 1 year ago

Here's a use-case that I have:

Before:

resource "aws_route" "private_a" {
  route_table_id            = var.aws_route_table_private_a_id
  destination_cidr_block    = data.aws_vpc_peering_connection.connection.cidr_block
  vpc_peering_connection_id = data.aws_vpc_peering_connection.connection.id
}

resource "aws_route" "private_b" {
  route_table_id            = var.aws_route_table_private_b_id
  destination_cidr_block    = data.aws_vpc_peering_connection.connection.cidr_block
  vpc_peering_connection_id = data.aws_vpc_peering_connection.connection.id
 }

After:

resource "aws_route" "route" {
  for_each = var.route_tables

  route_table_id            = each.key
  destination_cidr_block    = data.aws_vpc_peering_connection.connection.cidr_block
  vpc_peering_connection_id = data.aws_vpc_peering_connection.connection.id
}

moved {
  from = aws_route.private_a
  to   = aws_route.route[aws_route.private_a.route_table_id]
}

moved {
  from = aws_route.private_b
  to   = aws_route.route[aws_route.private_b.route_table_id]
}

Trying to run this gives me this error:

There are some problems with the configuration, described below.                                                                                          

The Terraform configuration must be valid before initialization so that                                                                                   
Terraform can determine which modules and providers need to be installed.    
╷                                                                            
│ Error: Invalid expression                                                  
│                                                                            
│   on peer.tf line 23, in moved:                                            
│   23:   to   = aws_route.route[aws_route.private_a.route_table_id]         
│                                                                                                                                                         
│ A single static variable reference is required: only attribute access and                                                                               
│ indexing with constant keys. No calculations, function calls, template                                                                                  
│ expressions, etc are allowed here.                                                                                                                      
╵                                                                            

╷                                                                                                                                                         
│ Error: Invalid expression                                                  
│                                                                            
│   on peer.tf line 28, in moved:                                            
│   28:   to   = aws_route.route[aws_route.private_b.route_table_id]                                                                                      
│                                                                                                                                                         
│ A single static variable reference is required: only attribute access and                                                                               
│ indexing with constant keys. No calculations, function calls, template                                                                                  
│ expressions, etc are allowed here.                                         
╵   

Note how unhelpful/misleading this error is. After all, I am doing only attribute access. (Although upon looking at it further I guess I'm not indexing with constant keys.)

jbilliau-rcd commented 1 year ago

We have the exact same problem. I am trying to convert my aws_eks_node_group resource, of which I create 3 of using a count, to for_each, named off the subnet_id. So I'm trying to do this:

moved {
  from = aws_eks_node_group.spot_node_group[0]
  to   = aws_eks_node_group.spot_node_group[one(aws_eks_node_group.spot_node_group[0].subnet_ids)]
}
moved {
  from = aws_eks_node_group.spot_node_group[1]
  to   = aws_eks_node_group.spot_node_group[one(aws_eks_node_group.spot_node_group[1].subnet_ids)]
}
moved {
  from = aws_eks_node_group.spot_node_group[2]
  to   = aws_eks_node_group.spot_node_group[one(aws_eks_node_group.spot_node_group[2].subnet_ids)]
}

Doesn't work, same error. So now I have no recourse but to manually run 3 terraform mv state commands to try to migrate 200+ clusters :(

krupakar1329 commented 2 months ago

So , this is also not supported for import and removed blocks as well, because it didnt work for me. Correct it if i am wrong