hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.11k stars 9.47k forks source link

dynamic moved blocks #33236

Open archoversight opened 1 year ago

archoversight commented 1 year ago

Terraform Version

1.4.6

Use Cases

Moving large arrays of items is difficult and error prone

Attempted Solutions

dynamic "moved" {
    for_each = var.somevar
    content {
      from = old.resource[each.key]
      to = new.resource[each.key]
    }
  }

Proposal

I am moving a large amount of resources with computed names into a new submodule and due to lack of having a dynamic way of calculating the moved blocks or just being able to use the top resource name and having terraform figure out that its an array, I have to manually parse the state file and create the moved blocks which is error prone, and doesn't work well when multiple people are working on the same codebase and new array items may be added between me generating the static moved blocks and the actual move of the resources to a submodule.

References

No response

apparentlymart commented 1 year ago

Hi @archoversight! Thanks for sharing this use-case.

Although you are right that there isn't a generalized mechanism for dynamically transforming instance keys using arbitrary expressions, you mentioned as part of your writeup "just being able to use the top resource name and having terraform figure out that its an array" and I'm not sure I'm understanding what you meant by that but note that the following is valid today:

resource "example" "example" {
  count = 3
}

moved {
  from = example.old
  to   = example.example
}

If neither of the addresses in a moved block have an instance key specified then Terraform will interpret it as declaring that the whole resource has moved, with all of the same instance keys. For example, the above would declare that example.old[0] moved to example.example[0], and so on for all of the instance keys that are declared for example.example.

Is that what you meant by that part of your writeup, or did I misunderstand?

Thanks again!

archoversight commented 1 year ago

@apparentlymart

I have a for_each that is moving from the top-level to inside a module, and when I tried to just use the module directly:

resource "example" "old" {
for_each = toset(var.something)
}

To:

resource "example" "new" {
   source = "./newmodule"
   somevar = var.something
}

moved {
    from = example.old
    to = example.new.resource
}

This did not work, terraform complained that it has to be a singular object, so I ended up writing:

moved {
   from = example.old["key_name"]
   to = example.new.resource["key_name"]
}

For each of the resources. Granted, this was on an older version of terraform, not 1.4.6 (1.3 series), I just didn't get around to writing up the issue. I will try again in the newer versions.


Edit:

I will note that we have a lot of keys that are computed on the fly using locals/function transforms. That may also be causing issues for the moving of the resources wholesale.

apparentlymart commented 1 year ago

I think the fact that we're using contrived "example" names is making this a bit harder to follow and so we're talking past each other a little.

The example moved block you shared has invalid syntax, so if you tried something equivalent to that then I suspect Terraform may just have been confused about what your intent was: example.new.resource cannot be valid because resource addresses always come in two parts, like the example.old you showed. If this were a path to a resource in a child module instance then it would start with a module.something. prefix to describe that module.

I think the situation you described here is the one explored in Splitting One Module into Multiple, which shows the syntax for declaring that a resource has moved wholesale (with all of its instances unchanged) into a child module. Here's a reduced version of that example:

module "x" {
  source = "../modules/x"

  # ...
}

moved {
  from = aws_instance.a
  to   = module.x.aws_instance.a
}

Because neither the from or to address includes an instance key, Terraform understands this as having moved all of the instances of aws_instance.a together, so all of the individual instance keys would be preserved.

What this cannot model is any situation where the instance keys changed in some systematic way at the same time as moving into a child module. For example, if you previously had aws_instance.example["foo_a"] and aws_instance.example["foo_b"] and part of the refactoring is to remove those foo_ prefixes to produce module.x.aws_instance.example["a"] and module.x.aws_instance.example["b"] then in today's Terraform there is no alternative but to write out a separate moved block for each instance, explicitly describing how each source instance key maps to each destination instance key rather than just writing a general rule about removing the foo_ prefix.

elasticdotventures commented 11 months ago

+1 for dynamic moved blocks. 🤦

KrisShannon commented 11 months ago

I don't want to put words into @archoversight's mouth, but I think the problem they are trying to solve is moving a for_each from a resource in the parent module across to a child module call. E.g.:

# Parent Module

resource "example" "old" {
    for_each = toset(var.something)

    arg = each.key
}

module "new_module" {
    source = "..."

    for_each = toset(var.something)
    something = each.key
}
# New Child Module

resource "example" "new" {
    arg = var.something
}

and needing to move example.old[*] to module.new_module[*].example.new

I don't believe there is currently a way of expressing this with the current moved block without explicitly iterating over toset(var.something):

moved {
    from = example.old["something_a"]
    to = module.new_module["something_a"].example.new
}

moved {
    from = example.old["something_b"]
    to = module.new_module["something_b"].example.new
}

moved {
    from = example.old["something_c"]
    to = module.new_module["something_c"].example.new
}

# ...

If you could use the meta-argumentscount and for_each with moved blocks then it would be something like:

moved {
    for_each = toset(var.something)
    from = example.old[each.key]
    to = module.new_module[each.key].example.new
}
apparentlymart commented 11 months ago

Thanks for the additional context!

I think at this point we have a good understanding of what use-cases are missing today, and the remaining problem is that we have no known technical design with which to meet those use-cases.

Internally a moved block is essentially an adjustment to the "previous run state", which is what we call the state that was written at the end of the most recent run. Terraform essentially rewrites the addresses of objects in the prior state to pretend that they were originally created at the new addresses, and only then does it begin the normal planning process, with the previous run state already having been updated to reflect the new addresses.

In practice that means that move resolution cannot evaluate arbitrary expressions, because expression evaluation happens during the plan phase and relies on the "planned state", which is what we call the internal-only state snapshot that Terraform gradually builds during planning -- the diff we show is comparing the planned state against the previous state.

In practice I think that means that adding more flexibility here would require one of the following:

  1. Use a more constrained mechanism for describing the "rule" for a move -- not arbitrary expression evaluation, but something else with more restrictive capabilities -- that Terraform Core can safely evaluate prior to the planning phase, as an extension of the current behavior.
  2. Totally rework the approach so that resolving moves happens during the planning phase rather than before it, which will then make the possibility of objects changing address during evaluation a cross-cutting concern, since the whole of Terraform's runtime will need to be made aware that the address of a resource instance can change dynamically at the same time as other work is going on.

When we originally worked on this feature we could not find a sufficient design for option 2 that would not amount to essentially a rewrite of the entire runtime with a different design approach. There might still be a more feasible design out there waiting to be discovered, but my hunch is that path 1 is the more likely to succeed. We don't currently have a candidate design for either path though, so I think both are still on the table for someone to research.

gleb-boushev-effem commented 11 months ago

yet another barely useful feature by terraform. how do you think am I supposed to move thousands of resources from one resource to another as an author of internal module used by hundreds of configurations?

crw commented 11 months ago

@gleb-boushev-effem Thanks for the feedback. I acknowledge that it can be frustrating when a particular feature implementation does not contain the features you need to implement a solution. Just a reminder to please follow the Community Guidelines when posting. Thanks again!

Quanalogy commented 10 months ago

yet another barely useful feature by terraform. how do you think am I supposed to move thousands of resources from one resource to another as an author of internal module used by hundreds of configurations?

I also came here hoping for a better solution but what I'm currently doing is:

  1. make the change which makes X things to create and X things to destroy (the thing you're moving)
  2. make a script that is able to identify removed and new
  3. make a state-change script where you map each old value to the new and doing terraform state mv OLD_STATE NEW_STATE

This isn't fast, ideal or anything but well it works.

jubr commented 6 months ago

A bit late to the party, but perhaps for other people stumbling onto this discussion: given the technical architecture limitations, why not approach this from another angle?

The moved { } refactoring is by definition temporary, so this can also be fixed a little less elegantly. Perhaps a small script written in a language of choice that generates the terraform moved block permutations into a temporary refactor-reason.tf file? After the apply this file will be removed from SCM anyways.

If this module has a lot of users across other internal repos, one could consider a tool like mani to run this script as part of a mani task in many local repositories at once to checkout/commit/push/create-mr the same branch name with a proposed fix to all.

deitChi commented 4 months ago

I have a slightly different dynamic requirement from the moved block, which I believe is much simpler to implement, but absolutely follows the "Move all keys from resource A to resource B" ethos.

Merge 2 separate resources doing the same thing into one. (Don't ask... Predecessor)

moved {
  from = aws_subnet.tn_subnet
  to   = aws_subnet.main
}

moved {
  from = aws_subnet.customer_subnet
  to   = aws_subnet.main
}

I get this error:

â•·
│ Error: Ambiguous move statements
│ 
│   on ../../../../networks-aws-modules/modules/vpc_v4/subnets-moved.tf line 6:
│    6: moved {
│ 
│ A statement at ../../../../networks-aws-modules/modules/vpc_v4/subnets-moved.tf:1,1 declared that module.xyz.aws_subnet.tn_subnet moved to module.xyz.aws_subnet.main, but this statement instead declares that module.xyz.aws_subnet.customer_subnet moved there.
│ 
│ Each resource can have moved from only one source resource.

I know for a fact that the keys for each resources aws_subnet.tn_subnet & aws_subnet.customer_subnet are unique. But Terraform doesn't appear to evaluate this.

Perhaps it would be reasonable to read resource keys, and permit this behaviour, and for other, more complex state manipulation (where overlapping keys exist) a 2 stage move would work?

Alternatively, if moved blocks have an order, step or depends_on input, tf can evaluate which step failed for a useful error.

Edit: Based on @apparentlymart 's write up - Op1 works here, assuming tf has access to state, prior to evaluating vars / locals. Apologies for the mention, not sure if allowed

apparentlymart commented 4 months ago

Hi @deitChi,

For that example I suggest thinking of it as two moves along a chain of addresses, rather than as two independent moves to the same final location.

moved {
  from = aws_subnet.tn_subnet
  to   = aws_subnet.customer_subnet
}

moved {
  from = aws_subnet.customer_subnet
  to   = aws_subnet.main
}

Terraform will understand this as "first we renamed tn_subnet to customer_subnet, and then we renamed that to main", and so Terraform will first shuffle all of the instances of "tn_subnet" into "customer_subnet", and then move everything from there into "main".

I understand that this chain of moves is not actually an accurate description of how this module historically evolved, but I think from Terraform's perspective this fake history is close enough to what you are really trying to describe that it would get the same effect.

(Terraform has this requirement against ambiguity because it needs to be able to decide what to do if two objects try to move to the same address. The above example is using the "chained moves" rule which means that if there were conflicting instance keys between the first two resources then the one on "customer_subnet" would get preserved while the other one would be deleted. But as long as the instance keys are unique across both resources that detail won't matter, since every object can move to a unique new address.)

joshuacollins-deloitte commented 3 months ago

Thanks for the additional context!

I think at this point we have a good understanding of what use-cases are missing today, and the remaining problem is that we have no known technical design with which to meet those use-cases.

Internally a moved block is essentially an adjustment to the "previous run state", which is what we call the state that was written at the end of the most recent run. Terraform essentially rewrites the addresses of objects in the prior state to pretend that they were originally created at the new addresses, and only then does it begin the normal planning process, with the previous run state already having been updated to reflect the new addresses.

In practice that means that move resolution cannot evaluate arbitrary expressions, because expression evaluation happens during the plan phase and relies on the "planned state", which is what we call the internal-only state snapshot that Terraform gradually builds during planning -- the diff we show is comparing the planned state against the previous state.

In practice I think that means that adding more flexibility here would require one of the following:

  1. Use a more constrained mechanism for describing the "rule" for a move -- not arbitrary expression evaluation, but something else with more restrictive capabilities -- that Terraform Core can safely evaluate prior to the planning phase, as an extension of the current behavior.
  2. Totally rework the approach so that resolving moves happens during the planning phase rather than before it, which will then make the possibility of objects changing address during evaluation a cross-cutting concern, since the whole of Terraform's runtime will need to be made aware that the address of a resource instance can change dynamically at the same time as other work is going on.

When we originally worked on this feature we could not find a sufficient design for option 2 that would not amount to essentially a rewrite of the entire runtime with a different design approach. There might still be a more feasible design out there waiting to be discovered, but my hunch is that path 1 is the more likely to succeed. We don't currently have a candidate design for either path though, so I think both are still on the table for someone to research.

It would be great to see something along the lines of Option 1, which would at least support basic conditional statements on variables known prior to plan (i.e. declare a moved block which is only required in some environments but not others)

Because for_each/count can be used as filters for different behaviour between environments, when the environments become inconsistent and a filter is added, it would be nice to be able to use moved to avoid resource churn.

deitChi commented 2 months ago

I've done some more testing, and annoyingly, chaining as described above doesn't seem to work 1.7.2

# rt-moved.tf
moved {
  from = aws_route_table_association.tn
  to   = aws_route_table_association.cust
}

moved {
  from = aws_route_table_association.cust
  to   = aws_route_table_association.private
}

Plan:

# module.foo.aws_route_table_association.private["awsendpt.eu-west-1a"] will be created
+   resource "aws_route_table_association" "private" {
+       id             = (known after apply)
+       route_table_id = "rtb-0602c1e998ac721f7"
+       subnet_id      = (known after apply)
    }

  # module.foo.aws_route_table_association.private["awsendpt.eu-west-1b"] will be created
+   resource "aws_route_table_association" "private" {
+       id             = (known after apply)
+       route_table_id = "rtb-0602c1e998ac721f7"
+       subnet_id      = (known after apply)
    }

  # module.foo.aws_route_table_association.tn["awsendpt.eu-west-1a"] will be destroyed
  # (because aws_route_table_association.tn is not in configuration)
-   resource "aws_route_table_association" "tn" {
-       id             = "rtbassoc-02500663e6c425929" -> null
-       route_table_id = "rtb-0602c1e998ac721f7" -> null
-       subnet_id      = "subnet-0d9dd0c50f9bfbec2" -> null
        # (1 unchanged attribute hidden)
    }

  # module.foo.aws_route_table_association.tn["awsendpt.eu-west-1b"] will be destroyed
  # (because aws_route_table_association.tn is not in configuration)
-   resource "aws_route_table_association" "tn" {
-       id             = "rtbassoc-045555a54fe2895a6" -> null
-       route_table_id = "rtb-0602c1e998ac721f7" -> null
-       subnet_id      = "subnet-02ff831cd91bb4ce7" -> null
        # (1 unchanged attribute hidden)
    }

But all the 'cust' moved as expected. I do know the names explicitly, and it would be possible to manually transfer them - but it's not exactly my idea of a good time.

apparentlymart commented 2 months ago

Hi @deitChi,

I'm sorry that didn't work out. Unfortunately I can't speculate as to why it didn't work without some more context.

Since others have reported success using it I think we should treat this as a troubleshooting exercise rather than as a bug report for now -- though we can bring it back here as a bug report if we reveal one -- so I'd like to move discussion about that problem into a topic in the community forum where it's easier to get into the details of one person's situation without making this GitHub issue thread very long and creating notification noise for the folks already subscribed to it.

If you do choose to open a community forum topic about this, it would help to also include information about where you've placed each of those declarations; my first theory here is that the moved block addresses are for some reason not matching the addresses of the objects already tracked in state, but I can't prove or disprove that with the information given.

Thanks!

deitChi commented 2 months ago

For time travellers, I'll include the discussion topic here, just in case it proves useful https://discuss.hashicorp.com/t/terraform-moved-block-chaining-issue/68302