hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.31k stars 1.73k forks source link

access_context_manager_service_perimeter egress/ingress rule resource exists in statefile but terraform wants to create again #19203

Open roman2025 opened 1 month ago

roman2025 commented 1 month ago

Community Note

Terraform Version & Provider Version(s)

Terraform v1.3.10 on Alpine Linux/GCP Storage Bucket for Statefile Backend Storage

Affected Resource(s)

Affected Resources google_access_context_manager_service_perimeter_egress_policy google_access_context_manager_service_perimeter_ingress_policy

I would guess this also effects the new dry-run versions of these resources as well but we haven't started using those yet.

We have several fairly large GCP VPC Service Control Perimeter deployments. We use Terraform to manage the perimeters and all of the rules.

A major issue that has cropped up recently is that terraform wants to (re)create ingress/egress rules that we are creating via our deployment.

Example: Deployment Day 1 - Perimeters created, rules added Day 2 - Rules added, some rules updated (multiple applies throughout the day) Day 3 - X - everything continues working just fine

Then - randomly - we will run a plan action and terraform will want to create rules like they don't already exist. We have analyzed the state file, and the rules (policies) are in there. There doesn't seem to be much rhyme or reason to when it happens but when it does, it may impact 1 rule or multiple.

I have tried to recreate this in our lower environments but thus far haven't found a silver bullet as to a cause.

I have checked the state file directly and I can also see in the plan output where it is clearly refreshing the object from state (i.e. the named resource is in the state file). The resource is definitely in the state file. But terraform wants to create the exact same named resource again with an identical configuration.

If we go ahead and do an apply, this will result in an API error from GCP stating that the rule already exists.

We typically have to "fix" the issue by going into the console, manually making a minor change to the existing rules (that were already created by terraform) and then running the apply which will then go through as GCP will allow the rules to be created because they differ. After that point, we can go in and delete the old rules that were created by an earlier deployment. This isn't tenable however as we have a lot of rules and it can be hard to step through everything and safely modify and remove rules manually (hence the reason for using TF in the first place).

We were using NESTED policies inside of the google_access_context_manager_service_perimeter resource before we switched to using the separate linked policy resource. This issue never occurred when using nested policies - however we really want to use independent policy resources (for a litany of reason).

Anyhow, that is why I think it is a bug or issue related to the two mentioned affected resources above.

Terraform Configuration

here is an example of an egress_policy block. We are using for_each with a map and the map key values are used to title the resources. The keys are static so the resource names do not change in state. We are doing something similar with ingress policies.

resource "google_access_context_manager_service_perimeter_egress_policy" "status_this" {
    for_each = local.enforced ? {for p in local.egress_policies : p["title"] => p} : {}
    perimeter = google_access_context_manager_service_perimeter.this.name
    depends_on = [
        google_access_context_manager_service_perimeter_resource.status_projects
    ]
    lifecycle {
        create_before_destroy = true
    }
    egress_from {
        identity_type = lookup(each.value["from"], "identity_type", null)
        identities    = lookup(each.value["from"], "identities", null)
    }
    egress_to {
        resources = [for project in lookup(each.value["to"], "resources", ["*"]) : startswith(project,"projects/") || project == "*" ? project : format("%s/%s", "projects",lookup(local.gcp_projects_map, project, "NAME_NOT_FOUND"))]
        dynamic "operations" {
            for_each = lookup(each.value["to"], "operations", [])
            content {
                service_name = operations.key
                dynamic "method_selectors" {
                    for_each = operations.key != "*" ? merge(
                        { for v in lookup(operations.value, "methods", []) : v => "method" },
                        { for v in lookup(operations.value, "permissions", []) : v => "permission" }
                    ) : {}
                    content {
                        method     = method_selectors.value == "method" ? method_selectors.key : null
                        permission = method_selectors.value == "permission" ? method_selectors.key : null
                    }
                }
            }
        }
    }
}

Debug Output

Unfortunately I am unable to recreate the issue (despite much efforts) and our terraform is running via a devops pipeline that has discarded the builds that had the output. If the issue crops up again I will do my best to get debug logs from the plan action.

Expected Behavior

Terraform Plan should see the resources in the state file and NOT want to recreate them.

Actual Behavior

Terraform Plan wants to create resources that already exist in state and in our environment which causes an apply failure because GCP will not allow duplicates of VPC Service Control perimeter policies (nor would we want them).

I should clarify this is a brand new resource creation action, not a replace and update-in-place. It's like TF has no knowledge of the existing resource even though it is in state.

Steps to reproduce

I am not sure what triggers this behavior. It only happens every so often (once every 12 - 20 applies perhaps) and there doesn't seem to be any consistent changes or anything that precede it happening.

In short, setup a VPC Service control perimeter with terraform and have it protect 20 some odd projects. Added 30 - 40 ingress and egress rules via separate linked resources in the terraform config and keep making changes to the rules and planning and applying until the issue crops up.

Important Factoids

No response

References

This description on the AWS Provider thread seems almost identical to what we are experiencing however the issue it was said to be a duplicate it of is not the same.

https://github.com/hashicorp/terraform/issues/3498

This was marked as duplicate of another issue. That other issue is not the same because in that case the created resources were never making it INTO the statefile, However in the above issue and our issue, the resources ARE in the statefile.

b/362264399

ggtisc commented 1 month ago

Confirmed permadiff issue