hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.28k stars 1.72k forks source link

google_access_context_manager_service_perimeter_resource doesn't work inside a VPC-SC perimeter #5738

Open husunal opened 4 years ago

husunal commented 4 years ago

Community Note

Description

If we run Terraform inside a VPC-SC perimeter to create a project and its resources, Terraform apply fails with Error 403: Request is prohibited by organization's policy. during resource creation.

This happens because adding a new project to a perimeter using google_access_context_manager_service_perimeter_resource takes a few minutes to propagate and take effect on GCP. Until it takes effect it's not possible to make any change against new project using protected Google APIs, because the new project is still outside the perimeter.

Affected Resource(s)

Terraform Configuration

resource "google_project" "my_project" {
  name       = "My Project"
  project_id = "project-12345"
  org_id     = "620189913244"
}

resource "google_access_context_manager_service_perimeter_resource" "service-perimeter-resource" {
  perimeter_name = "accessPolicies/650557138361/servicePerimeters/perimeter_1"
  resource       = "projects/${google_project.my_project.number}"
  depends_on     = [google_project.my_project]
}

resource "google_project_iam_member" "project" {
  project    = google_project.my_project.project_id
  role       = "roles/editor"
  member     = "user:a@b.com"
  depends_on = [google_access_context_manager_service_perimeter_resource.service-perimeter-resource]
}

Steps to Reproduce

  1. Create a VPC-SC perimeter that protects all APIs or only Google Cloud Resource Manager API
  2. Inside the perimeter run the Terraform code above.
  3. Terraform will fail at google_project_iam_member resource,
google_project.my_project: Creating...
google_project.my_project: Creation complete after 9s [id=projects/project-12345]
google_access_context_manager_service_perimeter_resource.service-perimeter-resource: Creating...
google_access_context_manager_service_perimeter_resource.service-perimeter-resource: Creation complete after 4s [id=accessPolicies/650557138361/servicePerimeters/perimeter_1/projects/645541782529]
google_project_iam_member.project: Creating...
Error setting IAM policy for project "project-12345": googleapi: Error 403: Request is prohibited by organization's policy.
  1. If you wait for a couple of minutes and rerun Terraform, google_project_iam_member resource will be created successfully because adding the new project to the perimeter will take effect by then.

References

b/299442505

edwardmedia commented 4 years ago

@husunal I can't repro this issue. Could you please post your plan and full debug log so I can try to hit the error? Please also include the steps how you did Create a VPC-SC perimeter that protects all APIs or only Google Cloud Resource Manager API Thanks

husunal commented 4 years ago

@edwardmedia Updated reproduce steps as below. I will add plan and debug logs later today.

Steps to Reproduce

  1. Use google_access_context_manager_service_perimeter resource to create a perimeter. Make sure you protect Cloud Resource Manager API and a project (projects/12345678) which you will run Terraform inside. We protect cloudresourcemanager.googleapis.com because "google_project_iam_member" resource uses that API.

    resource "google_access_context_manager_service_perimeter" "service-perimeter" {
    parent = "accessPolicies/${google_access_context_manager_access_policy.access-policy.name}"
    name   = "accessPolicies/${google_access_context_manager_access_policy.access-policy.name}/servicePerimeters/restrict_all"
    title  = "restrict_all"
    status {
    restricted_services = ["cloudresourcemanager.googleapis.com"]
    resources = ["projects/12345678"]
    }
    }
  2. In the projects/12345678 spin-up a test GCE VM to run the Terraform code. This step is important because Terraform must run inside the perimeter.

  3. When you run the following Terraform code inside the protected project and call the protected Cloudresourcemanager API, google_project_iam_member resource will fail first, if you run Terraform again it will be created succesfully.

    
    resource "google_project" "my_project" {
    name       = "My Project"
    project_id = "project-12345"
    org_id     = "620189913244"
    }

resource "google_access_context_manager_service_perimeter_resource" "service-perimeter-resource" { perimeter_name = "accessPolicies/650557138361/servicePerimeters/perimeter_1" resource = "projects/${google_project.my_project.number}" depends_on = [google_project.my_project] }

resource "google_project_iam_member" "project" { project = google_project.my_project.project_id role = "roles/editor" member = "user:a@b.com" depends_on = [google_access_context_manager_service_perimeter_resource.service-perimeter-resource] }

emilymye commented 4 years ago

this looks like an issue with eventual consistency with access context manager/IAM - we might just have to add some retries for one of these resources.

edwardmedia commented 4 years ago

@husunal have you had a chance to add plan and logs?

husunal commented 4 years ago

@edwardmedia please find them below.

terraform.log

Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # google_access_context_manager_service_perimeter_resource.service-perimeter-resource will be created
  + resource "google_access_context_manager_service_perimeter_resource" "service-perimeter-resource" {
      + id             = (known after apply)
      + perimeter_name = "accessPolicies/650557139361/servicePerimeters/perimeter_1"
      + resource       = (known after apply)
    }

  # google_project.my_project will be created
  + resource "google_project" "my_project" {
      + auto_create_network = true
      + folder_id           = (known after apply)
      + id                  = (known after apply)
      + name                = "My Project"
      + number              = (known after apply)
      + org_id              = "620189913244"
      + project_id          = "project-452433"
      + skip_delete         = (known after apply)
    }

  # google_project_iam_member.project will be created
  + resource "google_project_iam_member" "project" {
      + etag    = (known after apply)
      + id      = (known after apply)
      + member  = "user:h@domain.com"
      + project = "project-452433"
      + role    = "roles/editor"
    }

Plan: 3 to add, 0 to change, 0 to destroy.

------------------------------------------------------------------------

Note: You didn't specify an "-out" parameter to save this plan, so Terraform
can't guarantee that exactly these actions will be performed if
"terraform apply" is subsequently run.
emilymye commented 4 years ago

I'm not sure whether we want to retry specifically on the 403 (since the org might actually deny the request, and we'll be hitting the IAM API for several minutes until it finally fails) -seems like a good candidate for #6251

husunal commented 4 years ago

Thanks for the update.

If you add "retry" I think it should be added to all resources which use VPC-SC protected APIs not the IAM API only. Also, IAM is not supported(protected) by VPC-SC, google_project_iam_member resource in the example uses Cloud Resource Manager API which is in this example protected by VPC-SC.

I think a possible fix should be added to the google_access_context_manager_service_perimeter_resource, when terraform creates this resource, it should also check and confirm that adding new project to perimeter operation is propagated, then terraform can continue to create other "protected" resources successfully.

emilymye commented 4 years ago

Marking this as a persistent-bug because it requires determining what resources/errors should be waited upon, or if we need to add a sleep to a resource. I'd be hesistant to actually wait on the 403 since there are real 403s that shouldn't eat quota.

c2thorn commented 4 years ago

Reached out to the ACM team for assistance as modifying the long-running operation from the API would be ideal here.

Charlesleonius commented 2 months ago

The LRO api for ACM cannot be used to determine with certainty when changes to a perimeter take effect but it does give an approximation. There are implementation details (caching layers, etc) which prevent this from being feasible. Waiting on the LRO + a small sleep may help reduce customers seeing this issue but doesn't provide a real fix. If you are making changes that depend on adding a new protected resource, the protected resource should be added first and then further changes can be made once the new project is fully enforced by VPC-SC.