terraform-google-modules / terraform-google-vpc-service-controls

Handles opinionated VPC Service Controls and Access Context Manager configuration and deployments
https://registry.terraform.io/modules/terraform-google-modules/vpc-service-controls/google
Apache License 2.0
59 stars 67 forks source link

breaking behavior between 3.1 and 3.2 #72

Closed michaeldeman closed 2 years ago

michaeldeman commented 2 years ago

TL;DR

We noticed what appears to be a resource name change that seems to cause breaking behavior between 3.1 and 3.2. Rolling back to 3.0.1, then 3.1, a regression test terraform apply results in no changes.

Updating to 3.2 terraform wants to create a new perimeter resource with a slightly different terraform resource name but is in fact a duplicate perimeter of the same google id/name.

Expected behavior

Updating from 3.1.x to 3.2.x should not have resulted in breaking behavior. Rather the upgrade path following 3.0.x, 3.1.x, 3.2x, etc. should be non-breaking feature improvements and breaking behavior reserved for 4.x ?

Observed behavior

Org, Project number and project ID are replaced in the below.

A terraform apply on complete unchanged code other than bumping from version 3.1.0 to 3.2.0 suddenly desires to create this new resource.

module.dedicated_svc_perimeter[0].google_access_context_manager_service_perimeter_resource.service_perimeter_resource["projects/<project_number>"] will be created
  + resource "google_access_context_manager_service_perimeter_resource" "service_perimeter_resource" {
      + id             = (known after apply)
      + perimeter_name = "accessPolicies/<org_id>/servicePerimeters/<project_id>"
      + resource       = "projects/<project_number>"
    }

The above perimeter already exists as created with version 3.1 or lower so the above woud result in a failure with a duplicate perimeter.

Terraform state shows the following resource that was already built. Interestingly enough it also does not desire to delete the existing one below and replace it with the above. It simply wants to create the new (duplicate) one above.

module.dedicated_svc_perimeter[0].google_access_context_manager_service_perimeter.regular_service_perimeter
 module.dedicated_svc_perimeter[0].google_access_context_manager_service_perimeter.regular_service_perimeter:
resource "google_access_context_manager_service_perimeter" "regular_service_perimeter" {
    description = "Created/managed by terraform; in project creation code."
    id  = "accessPolicies/<same_org_id>/servicePerimeters/<same_project_id>
   name   = "accessPolicies/<same_org_id>/servicePerimeters/<same_project_id>

Terraform Configuration

I do not have a simple enough snippet of code I can provide.
We have proven the behavior in three projects on our side.
We have project factory wrapped in a template that optionally builds the service perimeter as part of project creation/updates.

Simply changing version from: "~> 3.1.0" to "~> 3.0" will pick up 3.2 and exhibit the problem.

Terraform Version

terraform version
Terraform v0.13.6
+ provider registry.terraform.io/datadog/datadog v3.4.0
+ provider registry.terraform.io/hashicorp/google v3.90.1
+ provider registry.terraform.io/hashicorp/google-beta v3.90.1
+ provider registry.terraform.io/hashicorp/null v2.1.2
+ provider registry.terraform.io/hashicorp/random v2.3.1

Additional information

No response

morgante commented 2 years ago

Have you tried applying the change? In #61, we moved project attachment to a separate resource. But this new resource should be fine to apply since it's just re-attaching the project that is already attached.

Note: we did not delete or modify the perimeter. google_access_context_manager_service_perimeter_resource attaches a resource to the perimeter. It does not create the perimeter.

michaeldeman commented 2 years ago

A terraform apply with 3.2.0 results in:


Error: Unable to create ServicePerimeterResource, existing object already found: map[resource:projects/<project_number>]

  on .terraform/modules/dedicated_svc_perimeter/modules/regular_service_perimeter/main.tf line 187, in resource "google_access_context_manager_service_perimeter_resource" "service_perimeter_resource":
 187: resource "google_access_context_manager_service_perimeter_resource" "service_perimeter_resource" {
michaeldeman commented 2 years ago

It looks like for #61, it was merged to master and 3.2.0 was based off of that? A local clone from master and setting source to that results in the same error.

Also we are a bit off the well trodden path with how we are managing perimeters in as much as we optionally (frequently) create a dedicated perimeter per project on the fly as part of project creation rather than having pre-existing perimeters that we attach projects to. A somewhat sanitized snippet of where how call this is below if that is helpful.

Overall I would say unless others notice any regression problems over the next week or two then this could probably be written off as a side effect of the anti-pattern we are using for project creation/updates? :) See comments later on about bridges.

It still might be interesting to provision something simple with 3.1.0 and then update to 3.2.0 and see what happens. Not sure I will have time to do that though.

module "dedicated_svc_perimeter" {
  count               = var.vpc_service_control_perimeter_attach_existing_name == null ? 1 : 0
  source              = "terraform-google-modules/vpc-service-controls/google//modules/regular_service_perimeter"
  version             = "~> 3.2.0"
  perimeter_name      = "proj_${replace(module.company_project.project_id, "-", "_")}"
  description         = "Created/managed by terraform; in project creation code."
  resources           = var.vpc_service_control_perimeter_dry_run ? [] : [module.company_project.project_number]
  resources_dry_run   = var.vpc_service_control_perimeter_dry_run ? [module.company_project.project_number] : []
  restricted_services = local.project_restricted_services
  policy = data.terraform_remote_state.vpc_svc_controls.outputs.access_policy-access_policy_COMPANY-policy_id
  access_levels = concat(
    var.vpc_service_control_access_levels_preexisting
    # local.vpcsc_access_levels_created_here
  )
}
michaeldeman commented 2 years ago

Update on the above - I hit this regression on some projects built about a year ago as well that do not rely on the template and code snippet above. For these, the service perimeter is still created within the project terraform directory but very explicitly and with manually set variables, etc.

Also if any regular perimeters already exist within bridges a replace is going to fail on a terraform apply?

Something still does not feel right on this change as terraform should at least want to destroy the old resource (perimeter) and create the new one if just the terraform resource was changed? And it does not recognize that the old one was already provisioned?

michaeldeman commented 2 years ago

A little more info.

Over in a separate terraform area where we manage only VPCSC bridges, one project, and specific access lists and have none of our quirky 'create a dedicated perimeter on the fly for this project'. A terraform init --upgrade and terraform apply resulted in.

Plan: 57 to add, 14 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: no

Apply cancelled.

Updating versions to pin all references to ~3.1.0, including updating some from ~2.1, resulted in the expected 'nothing to be done'.

LuizSDCit commented 2 years ago

We are also affected by this change from 3.1 to 3.2, by a different reason:

Error: Invalid for_each argument

  on .terraform/modules/secured_data_warehouse.vpc_sc_bridge_data_ingestion_governance/modules/bridge_service_perimeter/main.tf line 31, in resource "google_access_context_manager_service_perimeter_resource" "service_perimeter_resource":
  31:   for_each       = toset(formatlist("projects/%s", var.resources))

The "for_each" value depends on resource attributes that cannot be determined
until apply, so Terraform cannot predict how many instances will be created.
To work around this, use the -target argument to first apply only the
resources that the for_each depends on.

This error in related to Limitations on values used in for_each.

Maybe this change could be made in the same way that the flag prevent_destroy is used in the kms module to add a lifecycle block?

lifecycle {
    prevent_destroy = true
  }
morgante commented 2 years ago

I'm going to back out the 3.2 release. We will need to address this better in v4.0, likely by providing an (optional) list of deterministic keys.

morgante commented 2 years ago

I have deleted the v3.2 release and will hold v4.0 until we have a fix for #76.