hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.31k stars 9.49k forks source link

Execution order within `for_each` loop #30841

Open djfinnoy opened 2 years ago

djfinnoy commented 2 years ago

Current Terraform Version

v1.1.7

Use-cases

for_each is great for reducing the number of resource blocks within your modules. But sometimes one is forced to split a for_each resource into multiple resources because you need depends_on.

Assume you have ten helm charts to install, each chart depends on the previous chart. In an ideal world, you would simply do this:

resource "helm_release" "my_charts" {
  for_each = var.my_charts

  name    = each.value.name
  version = each.value.version
  ...
}

As far as I know, this won't work when there are dependencies between the objects in var.my_charts, so in the case where there are 10 helm charts that need to be applied in order, you would need ten separate resource blocks.

Proposal

It would be nice if one could configure the order in which a for_each loop should be applied.

An idea that comes to mind is that resoucres/modules that accept for_each, also accept an option list of keys that determines execution order, eg:

resource "helm_release" "my_charts" {
  for_each = var.my_charts
  for_each_order = [
    "chart1",
   " chart2",
   ...
   "chart10"
  ]

  name    = each.value.name
  version = each.value.version
  ...
}

A more flexible approach that could cover situations where the keys are unknown, is to filter based on some key within each objects values:

resource "helm_release" "my_charts" {
  for_each = var.my_charts
  for_each_priority = each.value.sync_order

  name    = each.value.name
  version = each.value.version
  ...
}
kmoe commented 2 years ago

Thanks for the well-written feature request. As you have noted, Terraform does not currently have a way to ensure these resources are created serially and in order - it will create them in parallel. It is also not possible to use count to create dependencies between resource instances, because Terraform's dependency graph operates on resources, not resource instances.

andrecorreaneto commented 1 year ago

Since maps are lexically sorted by keys, someone could think a for_each block would be processed in that order, but that's not the case indeed. I see this is an extremely powerful feature for building resource trees with the ability to move nodes around the structure without causing destruction and recreation (since the resource address would remain intact).

sherifkayad commented 1 year ago

I am facing the same exact issue and for priorities 1 - 5 (orders) I added 5 blocks, however, that doesn't look right. I really believe that a priority for_each would be a great feature

sherifkayad commented 1 year ago

@djfinnoy did you manage to work around this without duplicating the resources 10 times?

djfinnoy commented 1 year ago

@djfinnoy did you manage to work around this without duplicating the resources 10 times?

No, as far as I know this is not possible right now.

edelmannle commented 1 year ago

I have a somewhat similiar issue. I am trying to create several Bigquery view using a for_each block. The views themselves are interdependent e.g. view1 has view2 in its FROM clause, therefore view2 must be created before view1 The instance parameters are loaded from a json, as well as the supposed dependencies. So every view has a list of other views, which should be created before it. This setup fails because of 2 reasons as far as I can tell:

  1. I cannot reference an instance from the same resource in the depends_on parameter
  2. Terraform does not allow to use a dynmically generated key as a map index

Is there a place where I could upvote this feature request?

example of my setup:

resource "google_bigquery_table" "bigquery-view" {
  for_each = { for idx, record in local.bigquery_views_list : record["view"] => record }

  depends_on              = [for x in each.value.depends_on : google_bigquery_table.bigquery-view[x]]
  dataset_id                 = each.value.dataset
  table_id                    = each.value.view

  view {
    ...
  }
}
crw commented 1 year ago

@edelmannle

Is there a place where I could upvote this feature request?

Please use the 👍 on the original issue description to upvote; that is how we determine top issues. You might also try asking this question on the community forum where there are more people ready to help. The GitHub issues here are monitored only by a few core maintainers. Thanks!

boxrick commented 1 year ago

I am having an issue with the aws_servicecatalog_provisioning_artifact resource where it is technically only able to destroy or create one at a time due to AWS API limitations, so the Parallelism in Terraform basically breaks the usage of this the moment the resource is wrapped in for_each or count.

If I could make each resource dependent on the last ( ordering is largely irrelevant ) then I could fix it.

I was thinking of perhaps a 'depends_on' and using the index function to make it depend on the last. The main issue being that the first one would have a depends_on for no resource and it would break.

Would be interesting to hear if anyone has some workarounds...

My only way to fix this for now is to literally force Terraform to do parallelism 1 globally for all resources, but this is a crappy fix. Its unfortunate you cannot define -parallelism 1 per resource

sadminriley commented 1 year ago

@boxrick did you ever figure out a better way to do that? Out of curiosity

boxrick commented 1 year ago

@boxrick did you ever figure out a better way to do that? Out of curiosity

Sadly not, I worked around it by changing parallelism to 1. But only for the apply, then disabled a refresh during apply and used the outputted plan file.

This then allowed me to use normal parallelism during the much shower plan stage.

francisferreira commented 11 months ago

Not only we cannot control parallelism, but we cannot control the order in which resources will be created either... Sigh... And that is true not only for for_each loops, but count loops as well.

We have a business requirement that IPs from IP prefixes need to be created with their names reflecting their own IP addresses. For instance, 'pip-101.102.103.104' would have address 101.102.103.104. The way Azure API handles requests support that, for the 1st IP in a prefix will surely be the first available address in it, the 2nd will be the second, and so on. But Terraform falls short and sadly does not offer a way to achieve that allocation pattern.

Okay. We can't control parallelism per resource, but we do have the flag '-parallelism=1' to force the resources to be created sequentially. Hurray, let's kill our performance to adhere to our stupid rule! At least that would work, right? Well, think again... Yes, we can control parallelism with that flag. But NO, WE CANNOT ENSURE ORDERING! For some reason known only by Terraform devs (and the Devil himself), a count-loop is NOT executed in order when we use the '-parellelism=1' flag. Example of one such apply:

azurerm_public_ip.smtp_pfix_02[2]: Creating...
azurerm_public_ip.smtp_pfix_02[2]: Creation complete after 7s [id=/...]
azurerm_public_ip.smtp_pfix_02[6]: Creating...
azurerm_public_ip.smtp_pfix_02[6]: Creation complete after 5s [id=/...]
azurerm_public_ip.smtp_pfix_02[3]: Creating...
azurerm_public_ip.smtp_pfix_02[3]: Creation complete after 4s [id=/...]
azurerm_public_ip.smtp_pfix_02[4]: Creating...
azurerm_public_ip.smtp_pfix_02[4]: Creation complete after 7s [id=/...]
azurerm_public_ip.smtp_pfix_02[11]: Creating...
azurerm_public_ip.smtp_pfix_02[11]: Creation complete after 7s [id=/...]
azurerm_public_ip.smtp_pfix_02[8]: Creating...
azurerm_public_ip.smtp_pfix_02[8]: Creation complete after 7s [id=/...]
azurerm_public_ip.smtp_pfix_02[13]: Creating...
azurerm_public_ip.smtp_pfix_02[13]: Creation complete after 7s [id=/...]
azurerm_public_ip.smtp_pfix_02[10]: Creating...
azurerm_public_ip.smtp_pfix_02[10]: Creation complete after 7s [id=/...]

Why? WHY? Why wouldn't a count-loop be executed in order when parallelism is set to 1? I'm honestly crying inside for having to work with Terraform right now... And yeah, some will come with that old excuse: "TF operates on resources, not resource instances, so we cannot use count or for_each loops to create dependencies between resource instances, and bla, bla, bla..." First, we are not talking about physical dependencies (like that of a VM and a Vnet/Subnet), but simply a logical relationship. Second, why would any iteration loop not execute in order if no parallelism is at play? No respectful language would randomly iterate through a repetition loop.

mkielar commented 1 month ago

Similar scenario here:

The only need I have for this is to update these 20VMs one-by-one, to keep the impact on the cluster minimal. It doesn't even have to be in specific order (although it would be a good addition), just one at a time.

Oh, and the instance names must be set upfront, according to a naming convention, because of Organisational Policy (please, don't ask, it's a hell by itself), so even if features like AWS Autoscaling Groups would exist in my case, I wouldn't be able to use it.

I've originally posted in https://discuss.hashicorp.com/t/for-each-support-sequential-operation/34680/3?u=mkielar, but I can see that @djfinnoy has almost an identical idea, so I vote for that.

dga-nagra commented 1 month ago

Hi,

We noticed that the loop order is alphabetically and we are currently using this for preserving the order. The issue is that "10-..." < "9-..:" because "1" < "9".

The way to solve this is to ensure they all have the same size by left padding it with "0", but the easiest way to do it is to add a power of 10 to the index.

locals {
   data = ["foo", "bar", bat", "baz"]

    # This offset should be enough, but you can add 0 if needed
    index_offset = 10000000
}

resource "myresource" "this" {
    dynamic "myblock" {
        for_each = { for idx, d in local.data: "${index + local.index_offset}" => d}

        ...
    }
}

We confirmed this behavior on multiple projects.

Now, this looping order is NOT official and might be changed even in a minor version. It would be great if this becomes part of the specification. That will be a quick fix but with a big commitment.


Also, I rejoin the proposal but would make a small amendment to it: Instead of passing a list in order, I would like to be able to pass a tuple as the key:

locals {
   data = ["foo", "bar", bat", "baz"]
}

resource "myresource" "this" {
    dynamic "myblock" {
        for_each = { for idx, d in local.data: (idx, d.key) => d}

        ...
    }
}

This would be a lot easier to maintain in my opinion. Also, javascript and python sorting works with tuple


That being said, making the looping order explicit is not necessarily a good thing:

  1. The order of the blocks in your terraform is not necessarily the order in which the provider will receive them (this is not part of the specification). The order of the blocks could be completely lost because of internal operations. In the same manner, for any proposed solution here, the specification must state that the order of blocks is preserved when sent to the provider, but how do you manage this:
    resource "myresource" "this" {
        blockA { ... }
        blockB { ... }
        blockA { ... }
    }

    ?

  2. If the order of the loop matters for blocks, then the provider is at fault by designing it badly. It should be done by either:

    1. Ask for a list

      resource "myresource" "this" {
          blocks = [
              {...},
              {...},
          ]
      
      }
    2. Have a "index"/"priority" field (as AWS modules does)
      resource "myresource" "this" {
          blockA { priority = 1 ... }
          blockB { priority = 1 ... }
          blockA { priority = 2 ... }
      }

      NOTE: This is not exactly a solution. As said, this is used in modules where each blocks become an individual resource (like a listener rule) that must hook itself to another resource at a specific position. So I wouldn't recommend it either except in modules.

If the provider's interface is done correctly, we wouldn't need to ensure an order in the blocks, but once again, this particularity is never made explicit in the documentation.

I personally think that enforcing correct implementation on providers will be hard and it will induce a massive breaking change for both the providers and users. The current block structure is nicer than enforcing lists. So I think we should at least make official the loop order being alphabetically and the order the blocks are passed to the provider.

dannysauer commented 6 days ago

I don't so much care about explicit order; I'd like Terraform to just implicitly order instances of resources on its own. So really I just need this to be no longer the case:

...because Terraform's dependency graph operates on resources, not resource instances.

This example should work just fine (assuming the data structure is correct), as top-level OUs can get created in any order, and child OUs just need the parent OU to be created first.

resource "aws_organizations_organizational_unit" "ou" {
  for_each = local.aws_ou
  name     = each.key
  parent_id = (lookup(each.value, "parent", false)
    ? aws_organizations_organizational_unit.ou[each.value.parent].id
    : aws_organizations_organization.org.roots[0].id)
}

I think this is related to this feature request, but I'll happily open a new request if need be.