hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.33k stars 1.73k forks source link

Terraform google_project_iam_binding deletes GCP compute engine default service account from IAM principals #10903

Open oonisim opened 2 years ago

oonisim commented 2 years ago

Update

See Usability improvements for _iam_policy and _iam_binding resources #8354

As in Terraform google_project_iam_binding deletes GCP compute engine default service account from IAM principals,

google_project_iam_binding resource is Authoritative which mean it will delete any binding that is NOT explicitly specified in the terraform configuration.

Authoritative for a given role. Updates the IAM policy to grant a role to a list of members. Other roles within the IAM policy for the project are preserved.

Not sure who can get the clear idea what terraform does with google_project_iam_binding but as GCP has identified, Terraform google_project_iam_binding has deleted all the accounts not in the members attribute that have "roles/Editor" role.

Still, I believe this is a terraform defect.

As per the Google APIs Service Agent document, it is the essential service accounts that GCP internally manages. Terraform should not delete any such GCP managed internal service accounts as it bring the GCP projects down. I doubt in what use cases do we need this to happen.

Please, instead of of the assertin "work as designed", do not delete the GCP managed internal service accounts, as they are essential to make the GCP project work.


Original issue raised

Terraform google_project_iam_binding deletes GCP compute engine default service account from IAM principals has the detailed step-by-step reproduction steps and snapshots.

Community Note

Terraform Version

$ terraform version
Terraform v1.0.4
on linux_amd64
+ provider registry.terraform.io/hashicorp/google v4.6.0

Affected Resource(s)

GCP IAM Compute Engine default service account. It gets deleted by Terraform and cannot manage Compute Engine, hence GKE nodes as well.

  1. google_project_iam_binding

Terraform Configuration Files

After further investigation, "roles/Editor" is sufficient to reproduce the issue.

variable "PROJECT_ID" {
  type        = string
  description = "GCP Project ID"
  default     = "test-tf-sa"
}

variable "REGION" {
  type        = string
  description = "GCP Region"
  default     = "us-central1"
}

variable "roles_to_grant_to_service_account" {
  description = "IAM roles to grant to the service account"
  type        = list(string)
  default = [
    "roles/editor",    # <------------------------------ Only including role/Editor will reproduce the issue
    "roles/iam.serviceAccountAdmin",
    "roles/resourcemanager.projectIamAdmin"
  ]
}

provider "google" {
  project = var.PROJECT_ID
  region  = var.REGION
}
resource "google_service_account" "terraform" {
  account_id   = "terraform"
  display_name = "terraform service account"
}

resource "google_project_iam_binding" "terraform" {
  project = var.PROJECT_ID

  #--------------------------------------------------------------------------------
  # Grant the service account to have the roles
  #--------------------------------------------------------------------------------
  members = [
    "serviceAccount:${google_service_account.terraform.email}"
  ]
  for_each = toset(var.roles_to_grant_to_service_account)
  role     = each.value
}
$ terraform apply --auto-approve

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # google_project_iam_binding.terraform["roles/editor"] will be created
  + resource "google_project_iam_binding" "terraform" {
      + etag    = (known after apply)
      + id      = (known after apply)
      + members = (known after apply)
      + project = "test-tf-sa"
      + role    = "roles/editor"
    }

  # google_project_iam_binding.terraform["roles/iam.serviceAccountAdmin"] will be created
  + resource "google_project_iam_binding" "terraform" {
      + etag    = (known after apply)
      + id      = (known after apply)
      + members = (known after apply)
      + project = "test-tf-sa"
      + role    = "roles/iam.serviceAccountAdmin"
    }

  # google_project_iam_binding.terraform["roles/resourcemanager.projectIamAdmin"] will be created
  + resource "google_project_iam_binding" "terraform" {
      + etag    = (known after apply)
      + id      = (known after apply)
      + members = (known after apply)
      + project = "test-tf-sa"
      + role    = "roles/resourcemanager.projectIamAdmin"
    }

  # google_service_account.terraform will be created
  + resource "google_service_account" "terraform" {
      + account_id   = "terraform"
      + disabled     = false
      + display_name = "terraform service account"
      + email        = (known after apply)
      + id           = (known after apply)
      + name         = (known after apply)
      + project      = (known after apply)
      + unique_id    = (known after apply)
    }

Plan: 4 to add, 0 to change, 0 to destroy.
google_service_account.terraform: Creating...
google_service_account.terraform: Creation complete after 2s [id=projects/test-tf-sa/serviceAccounts/terraform@test-tf-sa.iam.gserviceaccount.com]
google_project_iam_binding.terraform["roles/resourcemanager.projectIamAdmin"]: Creating...
google_project_iam_binding.terraform["roles/editor"]: Creating...
google_project_iam_binding.terraform["roles/iam.serviceAccountAdmin"]: Creating...
google_project_iam_binding.terraform["roles/iam.serviceAccountAdmin"]: Creation complete after 9s [id=test-tf-sa/roles/iam.serviceAccountAdmin]
google_project_iam_binding.terraform["roles/editor"]: Creation complete after 9s [id=test-tf-sa/roles/editor]
google_project_iam_binding.terraform["roles/resourcemanager.projectIamAdmin"]: Still creating... [10s elapsed]
google_project_iam_binding.terraform["roles/resourcemanager.projectIamAdmin"]: Creation complete after 10s [id=test-tf-sa/roles/resourcemanager.projectIamAdmin]

Apply complete! Resources: 4 added, 0 changed, 0 destroyed.

Debug Output

Panic Output

Expected Behavior

Terraform will not remove the GCP Compute Engine Default service account from the IAM principals.

Actual Behavior

Before running the script, the Compute Engine default account exists in the IAM principals (with Compute Engine API enabled).

After running the terraform script. The GCP Compute Engine default service account get deleted by the script.

image

image

gcloud projects get-iam-policy command does not show the Compute Engine default service account 1079157603081-compute@developer.gserviceaccount.com, either.

$ GCP_PROJECT_ID=test-tf-sa
$ gcloud projects get-iam-policy $GCP_PROJECT_ID
bindings:
- members:
  - serviceAccount:service-1079157603081@compute-system.iam.gserviceaccount.com
  role: roles/compute.admin
- members:
  - serviceAccount:service-1079157603081@compute-system.iam.gserviceaccount.com
  role: roles/compute.instanceAdmin
- members:
  - serviceAccount:service-1079157603081@compute-system.iam.gserviceaccount.com
  role: roles/compute.serviceAgent
- members:
  - serviceAccount:service-1079157603081@container-engine-robot.iam.gserviceaccount.com
  role: roles/container.serviceAgent
- members:
  - serviceAccount:service-1079157603081@containerregistry.iam.gserviceaccount.com
  role: roles/containerregistry.ServiceAgent
- members:
  - serviceAccount:service-1079157603081@compute-system.iam.gserviceaccount.com
  role: roles/editor
- members:
  - user:****@gmail.com
  role: roles/owner
- members:
  - serviceAccount:service-1079157603081@gcp-sa-pubsub.iam.gserviceaccount.com
  role: roles/pubsub.serviceAgent
etag: BwXVf2S5fCQ=
version: 1

Because of this, GKE cluster cannot be deleted, created because Compute Engine permissions have gone.

image

$ gcloud container clusters delete cluster-1 --zone=us-central1-c
The following clusters will be deleted.
 - [cluster-1] in [us-central1-c]

Do you want to continue (Y/n)?  Y

Deleting cluster cluster-1...done.                                                                                                                                  
ERROR: (gcloud.container.clusters.delete) Some requests did not succeed:
 - args: ['Operation [<Operation\n clusterConditions: [<StatusCondition\n canonicalCode: CanonicalCodeValueValuesEnum(PERMISSION_DENIED, 7)\n message: "Google Compute Engine: Required \'compute.instanceGroups.update\' permission for \'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'.">]\n detail: "Google Compute Engine: Required \'compute.instanceGroups.update\' permission for \'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'."\n endTime: \'2022-01-14T00:20:54.190004708Z\'\n error: <Status\n code: 7\n details: []\n message: "Google Compute Engine: Required \'compute.instanceGroups.update\' permission for \'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'.">\n name: \'operation-1642119632548-20038ec5\'\n nodepoolConditions: []\n operationType: OperationTypeValueValuesEnum(DELETE_CLUSTER, 2)\n selfLink: \'https://container.googleapis.com/v1/projects/1079157603081/zones/us-central1-c/operations/operation-1642119632548-20038ec5\'\n startTime: \'2022-01-14T00:20:32.548792723Z\'\n status: StatusValueValuesEnum(DONE, 3)\n statusMessage: "Google Compute Engine: Required \'compute.instanceGroups.update\' permission for \'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'."\n targetLink: \'https://container.googleapis.com/v1/projects/1079157603081/zones/us-central1-c/clusters/cluster-1\'\n zone: \'us-central1-c\'>] finished with error: Google Compute Engine: Required \'compute.instanceGroups.update\' permission for \'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'.']
   exit_code: 1

image

Google Compute Engine: Not all instances running in IGM after 18.798524988s. Expected 3, running 0, transitioning 3. Current errors: [PERMISSIONS_ERROR]: Instance 'gke-cluster-2-default-pool-36522bb7-0vkl' creation failed: Required 'compute.instances.create' permission for 'projects/1079157603081/zones/us-central1-c/instances/gke-cluster-2-default-pool-36522bb7-0vkl' (when acting as '1079157603081@cloudservices.gserviceaccount.com'); [PERMISSIONS_ERROR]: Instance 'gke-cluster-2-default-pool-36522bb7-0vkl' creation failed: Required 'compute.disks.create' permission for 'projects/1079157603081/zones/us-central1-c/disks/gke-cluster-2-default-pool-36522bb7-0vkl' (when acting as '1079157603081@cloudservices.gserviceaccount.com'); [PERMISSIONS_ERROR]: Instance 'gke-cluster-2-default-pool-36522bb7-0vkl' creation failed: Required 'compute.disks.setLabels' permission for 'projects/1079157603081/zones/us-central1-c/disks/gke-cluster-2-default-pool-36522bb7-0vkl' (when acting as '1079157603081@cloudservices.gserviceaccount.com'); [PERMISSIONS_ERROR]: Instance 'gke-cluster-2-default-pool-36522bb7-0vkl' creation failed: Required 'compute.subnetworks.use' permission for 'projects/1079157603081/regions/us-central1/subnetworks/default' (when acting as '1079157603081@cloudservices.gserviceaccount.com'); [PERMISSIONS_ERROR]: Instance 'gke-cluster-2-default-pool-36522bb7-0vkl' creation failed: Required 'compute.subnetworks.useExternalIp' permission for 'projects/1079157603081/regions/us-central1/subnetworks/default' (when acting as '1079157603081@cloudservices.gserviceaccount.com') (truncated).

image

Steps to Reproduce

Please see * Terraform google_project_iam_binding deletes GCP compute engine default service account from IAM principals

  1. Enable the Compute Engine API in the GCP project.
  2. Verify the GCP Compute Engine Default service account exists in the IAM console view.
  3. terraform apply
  4. Verify the GCP Compute Engine default service account has gone in IAM principals menu although it still remains in the IAM Service Accounts menu.

Now the GCP Compute Engine default service account was compromised and cannot manage Compute Engines and GKE nodes.

Important Factoids

No

References

Impact

GKE cannot be created anymore after the GCP Compute Engine Default Service Account disappeared in the IAM console. Need to create another project to be able to create GKE.


Cause

GCP identified that Terraform has deleted the Google APIs Service Agent which is Google-managed service accounts.

Some Google Cloud services need access to your resources so that they can act on your behalf. For example, when you use Cloud Run to run a container, the service needs access to any Pub/Sub topics that can trigger the container.

To meet this need, Google creates and manages service accounts for many Google Cloud services. These service accounts are known as Google-managed service accounts. You might see Google-managed service accounts in your project's IAM policy, in audit logs, or on the IAM page in the Cloud Console.

Google-managed service accounts are not listed in the Service accounts page in the Cloud Console.

Google APIs Service Agent. Your project is likely to contain a service account named the Google APIs Service Agent, with an email address that uses the following format: project-number@cloudservices.gserviceaccount.com

This service account runs internal Google processes on your behalf. It is automatically granted the Editor role (roles/editor) on the project.

Terraform should not delete any such GCP managed internal service account essential to run GCP services, hence I regard this is a Terraform bug.

Fix

According to GCP:

To fix this issue you can add the service agent in the IAM page using the Add option at the top. The principal will be "${PROJECT_ID}@cloudservices.gserviceaccount.com" and add the editor role.

As per the error message, add '1079157603081@cloudservices.gserviceaccount.com' in IAM.

'compute.subnetworks.useExternalIp' permission for 'projects/1079157603081/regions/us-central1/subnetworks/default' (when acting as '1079157603081@cloudservices.gserviceaccount.com') (truncated).

The Google APIs Service Agent is restored in the view.

enter image description here

Create GKE.

enter image description here

b/304725229

aledsdavies commented 2 years ago

Just to add as a workaround using terraform I have been using the following after creating a project.

data "google_iam_policy" "editor" {
  binding {
    members = [
      "serviceAccount:${google_project.project.number}@cloudservices.gserviceaccount.com",
#      "serviceAccount:${google_project.project.number}-compute@developer.gserviceaccount.com",
    ]
    role = "roles/editor"
  }
}

resource "google_project_iam_policy" "add" {
  policy_data = data.google_iam_policy.editor.policy_data
  project     = google_project.project.project_id
}

This will fix the issue preventing the GKE cluster from being removed.

This will still remove any permissions not tracked by Terraform. Including the users which created the project which would generally have the owner role.

However, the issue then is if you add any additional policies later that arent all tracked one big policy will override the one which was previously created. The same is true for I am binding you can get in a loop where one will override the other and each terraform apply will always delete already applied values.

The only way you can get it so that it won't override is if all changes to policies are applied using one google_project_iam_policy or google_project_iam_binding per project.

BartDuwez commented 2 years ago

Ran into this myself, seems like we have to use "google_project_iam_member" instead. This will add a role to a member, without removing the other members from the role you are assigning.

michalswi commented 2 years ago

Tested with:

Terraform v1.2.9
terraform/provider hashicorp/google:4.38.0

problem persists.

jeremiahbowen commented 2 years ago

This is a feature of the provider and the project_iam_binding module. In authoritative mode, it removes any additions to the specified role. You just have to add the "Google-managed service accounts" to your terraform specification for those roles.

  bindings = {
    # We need it because of this.
    # https://cloud.google.com/iam/docs/service-accounts#google-managed
    "roles/editor" = [
      "serviceAccount:${module.project-factory.project_number}-compute@developer.gserviceaccount.com",
      "serviceAccount:${module.project-factory.project_number}@cloudservices.gserviceaccount.com"
    ]
    # ...
  }
roaks3 commented 1 year ago

Removing iam-serviceaccount team because I don't think this is an issue with the google_service_account resource.

roaks3 commented 8 months ago

Updated the description to prevent our bot from continuing to add the label

cooervo commented 4 months ago

Ran into this myself, seems like we have to use "google_project_iam_member" instead. This will add a role to a member, without removing the other members from the role you are assigning.

  • google_project_iam_binding Authoritative for a given role. Updates the IAM policy to grant a role to a list of members. Other roles within the IAM policy for the project are preserved. -> Applies the role to the list of members, retaining the other roles for those members, removing the roles from all the members not in that list. (And for some reason also removing the iam user (member), maybe if they only had one role, that is now removed?)
  • google_project_iam_member Non-authoritative. Updates the IAM policy to grant a role to a new member. Other members for the role for the project are preserved. -> applies the role to a member, retaining the roles of that member, retaining all members having that role.

If anyone using pulumi my service accounts roles where deleted I believe due to using new gcp.projects.IAMBinding I changed it to:

      new gcp.projects.IAMMember(`foo`, {
        project: gcpServiceAccountProject || gcpProject,
        role,
        member: pulumi.interpolate`serviceAccount:${gcpServiceAccount.email}`,
      })

Also had to delete the original service accounts and then recreate them.