hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
41.88k stars 9.44k forks source link

feature request: parallelism parameter for resources with count #14258

Open dkiser opened 7 years ago

dkiser commented 7 years ago

As a Terraform User, I should be able to specify a parallelism count on resource instances containing a count parameter So That I Can handle creating API resources via providers at a more sane rate and/or deal with mediocre api backends.

Example Terraform Configuration File

resource "providera_resourcea" "many" {
   count = 1000
   parallelism = 10

   attributeA = "a"
   attributeB = "b"
   attributeC = "c"
}

Expected Behavior

GIVEN terraform apply -parallelism = X where X < ${providera_resourcea.many.*.parallelism) WHEN terraform creates/deletes/refreshes resources THEN I expect only X concurrent goroutines creating this resource type.

GIVEN terraform apply -parallelism = X where X >= ${providera_resourcea.many.*.parallelism) WHEN terraform creates/deletes/refreshes resources THEN I expect only ${providera_resourcea.many.*.parallelism) concurrent goroutines creating this resource type.

dkiser commented 7 years ago

Possibly related to #7388

kwach commented 7 years ago

+1

Stono commented 6 years ago

+1

pbusquemdf commented 5 years ago

I just try to create 3 vsphere_virtual_machine resource.
Because all 3 virtual machines are created at the same time, they take exponentially more time to create, causing the apply operation to time out.

Creating 1 machine take 5 minutes. All 3 machines therefor take 15 minutes with a single thread. Creating 3 machines timeout after 10 minutes because each machine are now taking longer than 5 minutes each and are creating disk, balancing disk and reconfiguring bottleneck on the vsphere server.

But other resources are working fine. So, I should be able to limit the number of simultaneously job running into either the resource, or the provider (or both)

invidian commented 5 years ago

Another use case I would see for that is rolling update of immutable infrastructure, where you just roll the update one server/resource at a time.

mkjmdski commented 4 years ago

I found this issue when trying to run apply against "pass" provider which needs to communicate with git repository and this should be done one by one, but my infrastructure covers many other resources types (from different providers) so I'd like to run it with high parallelism but limited to 1 only for resources of certain type (or provider as mnetioned @invidian )

davidquarles commented 4 years ago

I'm hitting this today. There is still no known workaround, I take it?

Anecdotally, what I'm trying to do is create many managed instance groups in GCP that are all backends for the same load-balancer (using count) but can't be collapsed into one because of upstream constraints and how we're partitioning outbound traffic. Doing so forfeits rolling update semantics, of course.

I started to hack at having each instance group depend on the prior one after the head of the list until I realized that depends_on is static and the entire resource group is actually a single node in the DAG. Any ideas? As it stands, my only real strategy is to move this stuff out into a dedicated repo run with -parallelism=1 and use the remote state provider to loosely couple back to our primary repository :(

brendan-sherrin commented 4 years ago

I'm getting an error: Deleting CloudWatch Events permission failed: ConcurrentModificationException: EventBus default was modified concurrently

I believe this suggestion would let me work around this issue by applying a parellism limit on permissions affecting the default event bus on the account.

ie. adding a parellism attribute to this resource: resource "aws_cloudwatch_event_permission" "PerAccountAccess" { for_each = local.accountslist

mrsimonemms commented 4 years ago

I've found a similar issue with multiple Google SQL databases on a private IP where this would be incredibly useful (detailed on SO.

delwaterman commented 4 years ago

👍 Have this issue with a custom redshift provider. Need to limit the number of concurrent requests being made.

hege-aliz commented 3 years ago

Same here with AWS task definitions within the same family.

antanof commented 3 years ago

Similar issue with Azure dns and Public IP : I want to create severals A record for the same public IP

resource "azurerm_dns_a_record" "new" {
  count               = length(var.subdomains)
  name                = coalesce(var.subdomains[count.index])
  zone_name           = "var.zone"
  resource_group_name = "var.dns_rgname"
  ttl                 = 60
  target_resource_id  = azurerm_public_ip.public_ip.id

  depends_on          = [azurerm_public_ip.public_ip]
}

have an issue with terraform apply :

dns.RecordSetsClient#CreateOrUpdate: Failure responding to request: StatusCode=409 -- Original Error: autorest/azure: Service returned an error. Status=409 Code="Conflict" 

No issue with terraform apply --parallelism=1

clarsonneur commented 3 years ago

Like many others, I'm face the same kind of issue for some resources with google cloud. (network peering, firestore indexes, ...)

tshawker commented 3 years ago

I've also run into this when making changes to load balancers and target groups. Certain changes destroy everything before recreating anything. I'd like to be able to group the changes so that only some of them are done at a time. Alternatively, changes to the lifecycle sequencing would be as useful.

In this case, we aren't using count but for_each. I don't think that should make a difference for limiting parallelism.

cbus-guy commented 3 years ago

We are experiencing issues when attempting to bootstrap Chef using a null resource, or, when using Chef provisioner and building servers, either with VMWare or Azure. There are issues with vault permissions being assigned properly, to the node in Chef server. This succeeds when we set parallelism to 1, but fails intermittently, but fairly consistently, when not set to 1. It would be nice to only set the null resource for the bootstrap to a parallelism to 1, but everything else, to be allowed to be ran in parallel.

zhujinhe commented 3 years ago

+1. # 2021-07-16

iyinoluwaayoola commented 2 years ago

I'm surprised there is no news on this. I have only one resource that requires parallelism to be 1 but the only native solution is to disable parallelism for the entire infrastructure (of many many objects) using terraform apply --parallelism=1. I'll love the see this feature suffice for resources with for_each or count.

mhaddon commented 2 years ago

+1 for #currentyear.

even being able to set parallelism on a module level would be great

surajsbharadwaj commented 2 years ago

Need this very much. For me the count parameter is creating subnets in same vlan. Wish i could control count parallelism. I need it to create one after the other.

guidooliveira commented 2 years ago

+1 here, I can easily accomplish this using batchsize on any copy loop with arm templates, suprised this isn't ready after almost 5 years

mukundjalan commented 2 years ago

Very much needed feature. I need to run a module using for each, but the system runs out of space in some situations. If I had the possibility to limit the parallelism, I could easily manage this.

AresiusXP commented 2 years ago

I have the same need. My main issue is when managing subnets in Azure within the same VNET, Azure doesn't allow to modify multiple subnets at the same time. My only workaround is using a null_resourcewith az clicommands, which is a very sucky way of working. If I had a way setting the parallelism for this, I can reduce it to 1 and have everything managed by terraform code.

viniciuscastro-hotmart commented 2 years ago

+1

zhujinhe commented 2 years ago

First of all , i like hashicorp products a lot, they really helped me a lot. ❤️ @armon @mitchellh

And I know as a non-paying user I don't have any right to ask you to do anything, but feature requests with core tag have been opened for over 5 years.

I love Terraform and Nomad, but they occasionally give me a one-last-mile-needed feeling to production ready. Other request like https://github.com/hashicorp/nomad/issues/1635 have been opened for over 6 years.

I really hope that you lovely developers of Hashicorp have time and willing to make a plan of the old core feature requests when developing new features.

sherifkayad commented 1 year ago

Hi folks,

I am too struggling with this here missing! I have some resources and modules that can't handle parallelism while other work perfectly fine with it.

Setting the whole apply with the parallelism flag to 1 is really excruciatingly slow!! Can we have this one somehow prioritized for the sake of all those having similar issues?!

danjamesmay commented 1 year ago

Same issue when having hundreds of monitored projects in google: https://github.com/hashicorp/terraform-provider-google/issues/12883

TF also doesn't seem to handle 429's very well and completely screws up plan/apply steps when heavily rate limited.

abij commented 1 year ago

We are having the same issue in Azure with multiple PrivateEndpoints into a Subnet. PE creation also fails while performing VNet peering. @AresiusXP can you explain the null_resource in detail, are you validating the subnet is ready or controlling parallism? Comparable with issues in terraform-provider-azurerm: #21293 #16182

AresiusXP commented 1 year ago

We are having the same issue in Azure with multiple PrivateEndpoints into a Subnet. PE creation also fails while performing VNet peering. @AresiusXP can you explain the null_resource in detail, are you validating the subnet is ready or controlling parallism? Comparable with issues in terraform-provider-azurerm: #21293 #16182

We have 2 null_resource that have a depends_on the vnet with subnets resource. Once it's done, it's running an az cli command on each subnet to add private endpoints, and to associate to a route_table.


  triggers = {
    subnets = join(" ", azurerm_virtual_network.vnet.subnet.*.name)
  }

  provisioner "local-exec" {
    command     = "az login --service-principal -u $ARM_CLIENT_ID -p $ARM_CLIENT_SECRET --tenant $ARM_TENANT_ID; az account set --subscription ${var.global_settings.subscription_id}; for subnet in ${self.triggers.subnets}; do az network vnet subnet update -g ${var.rg_name} -n $subnet --vnet-name ${var.vnet_name} --service-endpoints ${join(" ", local.service_endpoints)}; done"
    interpreter = ["/bin/bash", "-c"]
  }

  depends_on = [
    azurerm_virtual_network.vnet
  ]
}

resource "null_resource" "rt" {
  count = var.rt_enabled ? 1 : 0

  triggers = {
    subnets = join(" ", azurerm_virtual_network.vnet.subnet.*.name)
  }

  provisioner "local-exec" {
    command     = "az login --service-principal -u $ARM_CLIENT_ID -p $ARM_CLIENT_SECRET --tenant $ARM_TENANT_ID; az account set --subscription ${var.global_settings.subscription_id}; for subnet in ${self.triggers.subnets}; do az network vnet subnet update -g ${var.rg_name} -n $subnet --vnet-name ${var.vnet_name} --route-table ${var.route_table_id}; done"
    interpreter = ["/bin/bash", "-c"]
  }

  depends_on = [
    azurerm_virtual_network.vnet,
    null_resource.endpoints
  ]
}
pspot2 commented 1 year ago

My use-case for this would be building multiple Docker images (using the registry.terraform.io/providers/kreuzwerker/docker provider) from a single (parametrized) Dockerfile in a for_each loop. I'd like to reduce the parallelism of the docker_image resource (and only of that resource) to 1 because my Dockerfile installs Python packages, and having N versions of this installation concurrently can result in sporadic issues because pip presently does not have any synchronization mechanisms around its cache directory (which is shared between Docker processes) click.

M0NsTeRRR commented 1 year ago

Having same issue on powerdns with SQLITE backend because of concurrent access to database.

cbus-guy commented 11 months ago

+1 We are using the buggy dns provider, which will only work consistently if we set parallelism to 1. It's ridiculous that the build process for many server has to slow down, just because a single provider is buggy. I should be able to set parallelism on a per resource basis.

bschaatsbergen commented 9 months ago

I would be happy to look into this issue and see if I can come up with something 👍🏼

Note: in the issue description the second given,when, then seems redundant as it describes the same behaviour as the first given, when, then.

bschaatsbergen commented 9 months ago

@jbardin, a couple years ago you replied on a similar issue: https://github.com/hashicorp/terraform/issues/24433#issuecomment-603236550

Has your view changed on this and if so, would this be something I could take a stab it? It seems like there's quite a few people experiencing issues with it not being possible.

jbardin commented 9 months ago

@bschaatsbergen, no, there have been no architectural changes around this that would alter the situation. If a resource type (or even the individual account controlling those resources) has limitations on concurrency, that is something which is in the provider's domain to control. In fact many providers already frequently use internal concurrency limits, along with API rate limiting and retries for similar reasons.

A cli flag is not appropriate for this type of per-resource configuration, which is one of the reasons the other issue was closed outright. So while this issue remains possible to implement, it's a bit more invasive than I would like to see for something the provider should handle directly. Since it is not a required feature for operation, it would also first have be approved through product management to begin implementation.

Thanks!

cbus-guy commented 9 months ago

I still think this would be an extremely useful update. Some resources don't always behave as expected. If you have code creating 20-30 resources, it slows things down to the extreme, to set parallelism to 1 as a whole, versus setting it to 1 for a specific resource, that is giving your trouble.

mng1dev commented 2 months ago

I am not sure why 5 years later this request is still open and ignored.

If I need to create many resources of the same type using count/for_each it would be quite helpful if I could set how many resources I am creating concurrently, especially when these resources share a lock and throw an error when they cannot acquire it. This is as easy as implementing a for loop.

While I agree that this must be implemented at provider level, it would be nice to have some high-level option to orchestrate resource creation in case the provider's logic has no way to implement this mechanism and/or is not maintained.

Moreover, I don't agree with fully "passing the bucket" to the maintainers of providers because this could also be a strict requirement of my very own deployment, for any unquestionable reasons, so I would expect my IaC tool of choice to offer such degree of flexibility, and not the single provider.

There is the -parallelism option, but if I am creating 2000 resources and only 10 of them require to be created sequentially, I don't see why I should create all of them sequentially.

kuteninja commented 2 months ago

I'm having this exact issue with aws_appautoscaling_scheduled_action, since they can not be modified concurrently.

I have a list of actions, and I need them to be executed one at a time, but they try to run simultaneously resulting in "ConcurrentUpdateException: You already have a pending update to an Auto Scaling resource"

I also have issues related to this with Postgres server roles, since trying to remove and add roles at the same time, causes a tuple exception.

manitgupta commented 1 month ago

Facing same issue with google_datastream_stream resource. This feature will be very beneficial!

UDtorrey commented 3 weeks ago

I also have issue with issues related to this with Postgres server roles; multiple applies needed to complete when count>1