terraform-google-modules / terraform-google-project-factory

Creates an opinionated Google Cloud project by using Shared VPC, IAM, and Google Cloud APIs
https://registry.terraform.io/modules/terraform-google-modules/project-factory/google
Apache License 2.0
826 stars 535 forks source link

GKE - Missing edit permissions on account #277

Closed cilindrox closed 4 years ago

cilindrox commented 4 years ago

After creating a project and a GKE cluster with the gke module, I was pointed to this troubleshooting doc, from the GKE dashboard. Ended up un-deleting the default Compute Engine for the project.

Opening this cause it seems GKE might need a disclaimer/troubleshooting entry like the one for app engine Flex for running Compute Engine's SA deprivileged. If that's the case, I can open a PR with the updated docs.

morgante commented 4 years ago

Thanks for taking a look at this.

To clarify, there are two service accounts involved in creating a GKE cluster:

  1. The GKE service account
  2. The default compute engine service account

Project Factory doesn't do anything to the GKE service account but does delete the default compute engine account.

However, the compute engine account is not required to run GKE modules. Instead, we recommend you create a dedicated Service Account for each cluster (which the GKE module will do be default, see the create_service_account param). This is much more secure than relying on the default service account.

I'd be happy to look at a PR which clarifies this relationship.

Also, if you were using the latest version of the GKE module and having issues with a deleted default account that sounds like a bug we'd like to investigate further.

cilindrox commented 4 years ago

thanks for the quick reply and apologies for not being clear - I'm using a GKE cluster with its own service account per node, and experiencing no issues there.

I'm referring to the PROJECT_NUMBER-compute@developer.gserviceaccount.com SA here, which seems GKE also requires for operating the dashboard or some other parts of the stack.

It wasn't until I restored this account that I was able to continue using the GKE for some reason. After investigating further, I'm assuming this was a transient error.

morgante commented 4 years ago

Interesting, that definitely shouldn't cause issues: in theory, everything the GKE service does should be using the GKE robot account.

If you encounter further issues, please let us know.

davi5e commented 4 years ago

I'm referring to the PROJECT_NUMBER-compute@developer.gserviceaccount.com SA here, which seems GKE also requires for operating the dashboard or some other parts of the stack.

Same issue here... Nothing works over GKE and the culprit seems to be the lack of the default service account.

Project Factory doesn't do anything to the GKE service account but does delete the default compute engine account.

Is there a way to avoid this and keep the default service account?

cilindrox commented 4 years ago

You can use the depriviledge (sic) role or keep @davi5e

morgante commented 4 years ago

Same issue here... Nothing works over GKE and the culprit seems to be the lack of the default service account.

Can you share your Terraform config details and the info on why you think this is the culprit?

You can certainly use the argument to override Project Factory behavior, but we really strive to make this GKE module not require the default Service Account so I'd like to know if there's a bug we haven't seen.

davi5e commented 4 years ago

Found a way to make it work using default_service_account = "keep".

I just don't understand why disable this behavior. Just to avoid drama, it's an honest question: what are the benefits to this approach? Security ones?

I noticed a lot of modules I use have an option to try another security account in case the default they use don't work and I can assume the reason would reside in the answer to my previous question.

morgante commented 4 years ago

I just don't understand why disable this behavior. Just to avoid drama, it's an honest question: what are the benefits to this approach? Security ones?

Security and explicit control. Instead of depending on a nebulous non-Terraform managed Service Account, it's much more preferable to have a separate Service Account (managed via TF) for each service/cluster.

One of the implications of using the default service account is that any workload running in your nodes automatically has full editor access on your project. This isn't a secure default.

cilindrox commented 4 years ago

@morgante dunno if it helps, but it's pretty standard use-case:

module "project-factory" {
  source = "github.com/terraform-google-modules/terraform-google-project-factory?ref=268a666"

  project_id      = "foo-bar-123456"
  name            = local.project
  org_id          = "MYORGID"
  billing_account = "SOME-BILLING-ACCOUNT"
  apis_authority  = true
  activate_apis = [
    "bigquery-json.googleapis.com",
    "bigquerystorage.googleapis.com",
    "cloudbilling.googleapis.com",
    "cloudbuild.googleapis.com",
    "cloudkms.googleapis.com",
    "cloudresourcemanager.googleapis.com",
    "compute.googleapis.com",
    "container.googleapis.com",
    "containerregistry.googleapis.com",
    "iam.googleapis.com",
    "iamcredentials.googleapis.com",
    "logging.googleapis.com",
    "monitoring.googleapis.com",
    "oslogin.googleapis.com",
    "pubsub.googleapis.com",
    "serviceusage.googleapis.com",
    "storage-api.googleapis.com",
  ]

  shared_vpc = local.shared_host
  shared_vpc_subnets = [
    data.google_compute_subnetwork.subnet.self_link,
  ]

  labels = {
    "env" = "test"
  }
}