hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.87k stars 9.21k forks source link

[Bug]: Control Tower Landing Zone reports changes when no material changes present #35763

Open jack-parsons-bjss opened 9 months ago

jack-parsons-bjss commented 9 months ago

Terraform Core Version

1.7.0

AWS Provider Version

5.36.0

Affected Resource(s)

Expected Behavior

No changes on planning against an unchanged manifest

Actual Behavior

With the following configuration:

resource "aws_controltower_landing_zone" "main" {
  manifest_json = jsonencode({
    governedRegions = concat([local.identifiers.region], local.identifiers.additional_regions)
    organizationStructure = {
      security = {
        name = var.control_tower.organizational_units.security
      }
      sandbox = {
        name = var.control_tower.organizational_units.sandbox
      }
    }
    centralizedLogging = {
      accountId = var.account_ids.log_archive
      configurations = {
        loggingBucket = {
          retentionDays = var.log_retention_days
        }
        accessLoggingBucket = {
          retentionDays = var.log_retention_days
        }
        kmsKeyArn = module.kms_control_tower.key_arn
      }
      enabled = true
    }
    securityRoles = {
      accountId = var.account_ids.audit
    }
    accessManagement = {
      enabled = false
    }
  })

  version = var.control_tower.version

  timeouts {
    create = "${var.control_tower.deploy_timeout_mins}m"
    update = "${var.control_tower.deploy_timeout_mins}m"
    delete = "${var.control_tower.deploy_timeout_mins}m"
  }

  depends_on = [
    aws_iam_role_policy_attachment.control_tower_admin,
    aws_iam_role_policy_attachment.control_tower_admin_awscontroltowerservicerolepolicy,
    aws_iam_role_policy_attachment.control_tower_cfg_agg_role_for_orgs,
    aws_iam_role_policy_attachment.control_tower_cloudtrail_role,
    aws_iam_role_policy_attachment.control_tower_stackset_role,
  ]
}

we got the following diff:

  # module.control_tower.aws_controltower_landing_zone.main will be updated in-place
  ~ resource "aws_controltower_landing_zone" "main" {
        id                       = "LRHQ3EK6J5N2DRT3"
      ~ manifest_json            = jsonencode(
          ~ {
              ~ centralizedLogging    = {
                  ~ configurations = {
                      ~ accessLoggingBucket = {
                          ~ retentionDays = "365" -> 365
                        }
                      ~ loggingBucket       = {
                          ~ retentionDays = "365" -> 365
                        }
                        # (1 unchanged attribute hidden)
                    }
                    # (2 unchanged attributes hidden)
                }
              ~ governedRegions       = [
                  - "us-east-1",
                    "eu-west-2",
                  + "us-east-1",
                ]
                # (3 unchanged attributes hidden)
            }
        )
        tags                     = {}
        # (5 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

Relevant Error/Panic Output Snippet

No response

Terraform Configuration Files

resource "aws_controltower_landing_zone" "main" {
  manifest_json = jsonencode({
    governedRegions = concat([local.identifiers.region], local.identifiers.additional_regions)
    organizationStructure = {
      security = {
        name = var.control_tower.organizational_units.security
      }
      sandbox = {
        name = var.control_tower.organizational_units.sandbox
      }
    }
    centralizedLogging = {
      accountId = var.account_ids.log_archive
      configurations = {
        loggingBucket = {
          retentionDays = var.log_retention_days
        }
        accessLoggingBucket = {
          retentionDays = var.log_retention_days
        }
        kmsKeyArn = module.kms_control_tower.key_arn
      }
      enabled = true
    }
    securityRoles = {
      accountId = var.account_ids.audit
    }
    accessManagement = {
      enabled = false
    }
  })

  version = var.control_tower.version

  timeouts {
    create = "${var.control_tower.deploy_timeout_mins}m"
    update = "${var.control_tower.deploy_timeout_mins}m"
    delete = "${var.control_tower.deploy_timeout_mins}m"
  }

  depends_on = [
    aws_iam_role_policy_attachment.control_tower_admin,
    aws_iam_role_policy_attachment.control_tower_admin_awscontroltowerservicerolepolicy,
    aws_iam_role_policy_attachment.control_tower_cfg_agg_role_for_orgs,
    aws_iam_role_policy_attachment.control_tower_cloudtrail_role,
    aws_iam_role_policy_attachment.control_tower_stackset_role,
  ]
}

In the snippet above, important to note that log retentions are number types.

Steps to Reproduce

Create a landing zone with more than one region enabled through Terraform while using jsonencode to write your configuration in HCL, and run a plan.

Debug Output

No response

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

None

github-actions[bot] commented 9 months ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

agnasillo commented 9 months ago

I think the variables are misconfigured. Try doing "${var.log_retention_days}" instead of var.log_retention_days:

     configurations = {
        loggingBucket = {
          retentionDays = "${var.log_retention_days}"
        }
        accessLoggingBucket = {
          retentionDays = "${var.log_retention_days}"
        }
        kmsKeyArn = module.kms_control_tower.key_arn
jack-parsons-bjss commented 9 months ago

@anasillo According to the docs, these are numbers, and the API returns them as numbers: https://docs.aws.amazon.com/controltower/latest/userguide/lz-api-launch.html Nonetheless, we set them to strings and the region ordering is still an issue

justinretzolk commented 9 months ago

Hey @jack-parsons-bjss 👋 Thanks for taking the time to raise this, and for trying out the other suggestion with the quotes. I don't have an easy way to test this, but converting the list of regions to a set (by wrapping it in toset()) might help, since it discards the ordering.

jack-parsons-bjss commented 9 months ago

@justinretzolk We are more than happy to help test fixes for this 🙂 I have already tried using the toset function, however I believe that as we are simply using jsonencode, toset becomes a regular old list. I suspect that the json-encoded HCL has the list in alphabetical order, and an example response describing the Control Tower deployment shows that the region list returned by AWS is:

...
            "governedRegions": [
                "us-east-1",
                "eu-west-2"
            ],
...

I'm not sure if the provider parses the response after the fact though!

justinretzolk commented 9 months ago

Thanks for giving that a shot @jack-parsons-bjss! I've marked this as a bug -- unfortunately I don't have any additional follow ups at the moment.

grumpper commented 9 months ago

timeouts { create = "${var.control_tower.deploy_timeout_mins}m" update = "${var.control_tower.deploy_timeout_mins}m" delete = "${var.control_tower.deploy_timeout_mins}m" }

If you set them as strings you get the following error:

│ Error: creating ControlTower Landing Zone: operation error ControlTower: CreateLandingZone, https response error StatusCode: 400, RequestID: c386d5ed-615c-4be8-a4d4-8af3c6b6af6b, ValidationException: The LandingZoneManifest that you provided is not compliant with the LandingZoneManifest schema. For information about formatting, see https://docs.aws.amazon.com/controltower/latest/userguide/lz-api-launch.html.

because (as correctly it is pointed out by @jack-parsons-bjss) number type should be provided.

@jack-parsons-bjss did you manage to pass them as strings without getting the above error?

If I manage to successfully deploy I can then change it to be string just to avoid the drift. BUT if nothing is provisioned and I am yet to run apply it will fail with the above error.

jack-parsons-bjss commented 9 months ago

@grumpper We didn't get the above error, but I will admit we didn't apply the changes, only observed the plan. As we couldn't fix the plan completely we haven't changed the type of the log retention variables to string either. We didn't use this resource to create the landing zone, having previously been using a CloudFormation stack resource to deploy and manage changes to Control Tower.

agnasillo commented 9 months ago

@jack-parsons-bjss This won't necessarily fix the issue permanently, but as a workaround, have you tried using thesort() function for the regions?

jack-parsons-bjss commented 9 months ago

@anasillo We tried this as well; I think the issue is that the ordering is compared to whatever is returned by AWS, which is in a different order to what is returned by sort()?

haniffm commented 2 months ago

We managed to bypass the changes by using tostring function. Like this:

loggingBucket = {
  retentionDays = tostring(var.log_retention_days)
}
accessLoggingBucket = {
  retentionDays = tostring(var.log_retention_days)
}

However, if there are other changes that needs to be applied, the tostrings needs to be removed, otherwise you get a schema error. :man_facepalming:

sbkg0002 commented 3 days ago

We managed to bypass the changes by using tostring function. Like this:

loggingBucket = {
  retentionDays = tostring(var.log_retention_days)
}
accessLoggingBucket = {
  retentionDays = tostring(var.log_retention_days)
}

However, if there are other changes that needs to be applied, the tostrings needs to be removed, otherwise you get a schema error. 🤦‍♂️

We are in the same situation. Since an update takes ~45mins, this is very annoying if you forget to change it back.

haniffm commented 3 days ago

We managed to bypass the changes by using tostring function. Like this:

loggingBucket = {
  retentionDays = tostring(var.log_retention_days)
}
accessLoggingBucket = {
  retentionDays = tostring(var.log_retention_days)
}

However, if there are other changes that needs to be applied, the tostrings needs to be removed, otherwise you get a schema error. 🤦‍♂️

We are in the same situation. Since an update takes ~45mins, this is very annoying if you forget to change it back.

Yes, you have to change it forth and back. But this goes kind of the line with other stuff in Control Tower. It doesn't seem its well thought through. :cry: :man_facepalming: