hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.83k stars 9.17k forks source link

[Bug]: aws_batch_compute_environment: launch_template version known after apply does not ForceNew #37440

Open nomeelnoj opened 6 months ago

nomeelnoj commented 6 months ago

Terraform Core Version

1.8.2

AWS Provider Version

5.49.0

Affected Resource(s)

aws_batch_compute_environment aws_launch_template

Expected Behavior

When updating the launch template associated with a compute environment and pointing the compute environment to the default_version, and also having the launch_template set to update_default_version, the compute environment should be recreated and no error should occur.

Actual Behavior

The compute environment is NOT recreated, and terraform throws "Provider produced inconsistent final plan"

Relevant Error/Panic Output Snippet

╷
│ Error: Provider produced inconsistent final plan
│
│ When expanding the plan for module.batch.aws_batch_compute_environment.default["default"] to include new
│ values learned so far during apply, provider "registry.terraform.io/hashicorp/aws" changed the planned action from
│ Update to DeleteThenCreate.
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
╷
│ Error: Provider produced inconsistent final plan
│
│ When expanding the plan for module.batch.aws_batch_compute_environment.default["default"] to include new
│ values learned so far during apply, provider "registry.terraform.io/hashicorp/aws" produced an invalid new value
│ for .ecs_cluster_arn: was known, but now unknown.
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
╷
│ Error: Provider produced inconsistent final plan
│
│ When expanding the plan for module.batch.aws_batch_compute_environment.default["default"] to include new
│ values learned so far during apply, provider "registry.terraform.io/hashicorp/aws" produced an invalid new value
│ for .status_reason: was known, but now unknown.
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
╷
│ Error: Provider produced inconsistent final plan
│
│ When expanding the plan for module.batch.aws_batch_compute_environment.default["default"] to include new
│ values learned so far during apply, provider "registry.terraform.io/hashicorp/aws" produced an invalid new value
│ for .compute_environment_name: was known, but now unknown.
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
╷
│ Error: Provider produced inconsistent final plan
│
│ When expanding the plan for module.batch.aws_batch_compute_environment.default["default"] to include new
│ values learned so far during apply, provider "registry.terraform.io/hashicorp/aws" produced an invalid new value
│ for .id: was known, but now unknown.
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
╷
│ Error: Provider produced inconsistent final plan
│
│ When expanding the plan for module.batch.aws_batch_compute_environment.default["default"] to include new
│ values learned so far during apply, provider "registry.terraform.io/hashicorp/aws" produced an invalid new value
│ for .status: was known, but now unknown.
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
╷
│ Error: Provider produced inconsistent final plan
│
│ When expanding the plan for module.batch.aws_batch_compute_environment.default["default"] to include new
│ values learned so far during apply, provider "registry.terraform.io/hashicorp/aws" produced an invalid new value
│ for .arn: was known, but now unknown.
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
╷
│ Error: Provider produced inconsistent final plan
│
│ When expanding the plan for module.batch.aws_batch_compute_environment.default["default"] to include new
│ values learned so far during apply, provider "registry.terraform.io/hashicorp/aws" produced an invalid new value
│ for .compute_resources[0].bid_percentage: was cty.NumberIntVal(0), but now null.
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
╷
│ Error: Provider produced inconsistent final plan
│
│ When expanding the plan for module.batch.aws_batch_compute_environment.default["default"] to include new
│ values learned so far during apply, provider "registry.terraform.io/hashicorp/aws" produced an invalid new value
│ for .compute_resources[0].placement_group: was cty.StringVal(""), but now null.
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
╷
│ Error: Provider produced inconsistent final plan
│
│ When expanding the plan for module.batch.aws_batch_compute_environment.default["default"] to include new
│ values learned so far during apply, provider "registry.terraform.io/hashicorp/aws" produced an invalid new value
│ for .compute_resources[0].spot_iam_fleet_role: was cty.StringVal(""), but now null.
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
╷
│ Error: Provider produced inconsistent final plan
│
│ When expanding the plan for module.batch.aws_batch_compute_environment.default["default"] to include new
│ values learned so far during apply, provider "registry.terraform.io/hashicorp/aws" produced an invalid new value
│ for .compute_resources[0].desired_vcpus: was known, but now unknown.
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
╷
│ Error: Provider produced inconsistent final plan
│
│ When expanding the plan for module.batch.aws_batch_compute_environment.default["default"] to include new
│ values learned so far during apply, provider "registry.terraform.io/hashicorp/aws" produced an invalid new value
│ for .compute_resources[0].ec2_key_pair: was cty.StringVal(""), but now null.
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
╷
│ Error: Provider produced inconsistent final plan
│
│ When expanding the plan for module.batch.aws_batch_compute_environment.default["default"] to include new
│ values learned so far during apply, provider "registry.terraform.io/hashicorp/aws" produced an invalid new value
│ for .compute_resources[0].image_id: was cty.StringVal(""), but now null.
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
╷
│ Error: Provider produced inconsistent final plan
│
│ When expanding the plan for module.batch.aws_batch_compute_environment.default["default"] to include new
│ values learned so far during apply, provider "registry.terraform.io/hashicorp/aws" produced an invalid new value
│ for .compute_resources[0].launch_template[0].launch_template_name: was cty.StringVal(""), but now null.
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵

Terraform Configuration Files

# modules/batch/main.tf
resource "aws_batch_compute_environment" "default" {
  for_each                        = var.compute_environments
  compute_environment_name_prefix = each.value["name"]
  service_role                    = module.service_role[each.key].arn
  type                            = lookup(each.value, "type", var.type)
  state                           = lookup(each.value, "state", var.state)

  compute_resources {
    allocation_strategy = lookup(each.value, "allocation_strategy", var.allocation_strategy)
    bid_percentage      = lookup(each.value, "bid_percentage", var.bid_percentage)
    min_vcpus           = contains(["FARGATE", "FARGATE_SPOT"], lookup(each.value, "compute_type", var.compute_type)) ? null : lookup(each.value, "min_vcpus", var.min_vcpus)
    max_vcpus           = lookup(each.value, "max_vcpus", var.max_vcpus)

    dynamic "launch_template" {
      for_each = contains(["EC2", "SPOT"], lookup(each.value, "compute_type", var.compute_type)) && length(lookup(each.value, "launch_template", var.launch_template)) > 0 ? [aws_launch_template.default[each.key]] : []
      content {
        launch_template_id = launch_template.value.id
        version            = launch_template.value.default_version
      }
    }
    # removed for brevity
}

resource "aws_launch_template" "default" {
  for_each = { for k, v in var.compute_environments : k => merge(lookup(v, "launch_template", var.launch_template), { name = v["name"] }) if length(lookup(v, "launch_template", var.launch_template)) > 0 }
  name                   = lookup(each.value, "name")
  description            = lookup(each.value, "description", "Managed by Terraform")
  default_version        = lookup(each.value, "default_version", null)
  update_default_version = lookup(each.value, "default_version", null) == null ? lookup(each.value, "update_default_version", true) : null
  # removed for brevity
}

resource "aws_batch_job_queue" "default" {
  for_each             = var.job_queues
  name                 = each.value["name"]
  state                = lookup(each.value, "queue_state", var.queue_state)
  priority             = lookup(each.value, "queue_priority", var.queue_priority)
  compute_environments = local.compute_environments[each.key]
  tags = merge(
    var.tags,
    {
      Name = each.value["name"]
    }
  )
}
# state
module "batch" {
  source = "../../../../../../../providers/aws/modules/batch"
  compute_environments = {
    default = {
      name = "default"
      launch_template = {
        user_data = data.cloudinit_config.base.rendered
      }
      instance_types = [
        "m5a",
        "m6a",
      ]
    }
  }

  job_queues = {
    default = {
      name                 = "default"
      compute_environments = ["default"]
    }
  }
  # removed for brevity
}

Steps to Reproduce

  1. Run the apply to create the resources.
  2. Change the user_data so that the launch_template updates and wants to create a new version
  3. Run the apply -> ERROR
  4. Run the apply again -> SUCCESS

The first plan shows:

  # module.batch.aws_batch_compute_environment.default["default"] will be updated in place

          ~ launch_template {
              ~ version              = "2" -> (known after apply)
                # (2 unchanged attributes hidden)
            }
        }
    }

As you can see above, there is no "recreation" of resource, but it is an update of a field that is immutable.

After the error, if you run another plan, at this point the launch template has been updated so the value is known:

  # module.batch.aws_batch_compute_environment.default["default"] must be replaced

          ~ launch_template {
              ~ version              = "2" -> "3" # forces replacement
                # (2 unchanged attributes hidden)
            }
        }
    }

As you can see above, there is a known value this time (3), and it triggers a replacement correctly, and there is no error.

Solution

To resolve this issue, we can use a simple replace_triggered_by, which produces the following plan:

  # module.batch.aws_batch_compute_environment.default["default"] will be replaced due to changes in replace_triggered_by

          ~ launch_template {
              ~ version              = "2" -> (known after apply)
                # (2 unchanged attributes hidden)
            }
        }
    }

However, I feel that this should be something handled by the provider, versus us having to implement our own solution.

Debug Output

No response

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

No

github-actions[bot] commented 6 months ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

nomeelnoj commented 5 months ago

A quick update. Turns out the replace_triggered_by does not actually work here. Our batch module conditionally creates the launch template only if the module instantiation requires it--as you can create a batch compute environment without custom user data, and therefore no launch template required.

By setting the replace_triggered_by to aws_launch_template.default[each.key], we run into the following error when the module is called for states that do not have launch template configurations:

╷
│ Error: no change found for aws_launch_template.default["foobar"] in module.batch
│
│
╵

Trying to find a workaround now to support our various use cases--both with and without launch templates, but figured it was worth posting as it provides additional evidence for the necessity of this fix.

LucaIcaro commented 1 week ago

I opened a similar issue #39470 . Not sure if that's the same bug.