hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
41.66k stars 9.41k forks source link

Run Terraform on workstation with gcs backend fails with "Error: Failed to get existing workspaces (403)" when using GCP service account impersonation #31344

Open aibazhang opened 2 years ago

aibazhang commented 2 years ago

Hi there, I want to use service accounts without physically downloading the keys (this is even mentioned by Google as best practice). It's working well without backend "gcs". However, when I set Terraform backend to gcs (on workstation), it always fails with "Error: Failed to get existing workspaces (403)". BTW, it's also working well with downloading service_account_key.json directly instead of service account impersonation even in the case of using backend "gcs".

Terraform Version

Terraform v1.1.2
on darwin_arm64
+ provider registry.terraform.io/hashicorp/google v4.27.0

Terraform Configuration Files

terraform {
  required_version = "1.1.2"

  backend "gcs" {
    bucket = "my-bucket"
    prefix = "my-prefix"
  }

  required_providers {
    google = "4.15.0"
  }
}

Debug Output

Initializing the backend...
2022-06-30T10:45:15.173+0900 [TRACE] Meta.Backend: built configuration for "gcs" backend with hash value 2134967571
2022-06-30T10:45:15.173+0900 [TRACE] Preserving existing state lineage "xxxxxxx-xxxx-xxxx-xxxx"
2022-06-30T10:45:15.173+0900 [TRACE] Preserving existing state lineage "xxxxxxx-xxxx-xxxx-xxxx"
2022-06-30T10:45:15.173+0900 [INFO]  state modified during read or write. incrementing serial number
2022-06-30T10:45:15.173+0900 [TRACE] Meta.Backend: working directory was previously initialized for "gcs" backend
2022-06-30T10:45:15.173+0900 [TRACE] Meta.Backend: moving from default local state only to "gcs" backend
2022-06-30T10:45:15.174+0900 [TRACE] providercache.fillMetaCache: scanning directory .terraform/providers
2022-06-30T10:45:15.174+0900 [TRACE] getproviders.SearchLocalDirectory: found registry.terraform.io/hashicorp/google v4.15.0 for darwin_arm64 at .terraform/providers/registry.terraform.io/hashicorp/google/4.15.0/darwin_arm64
2022-06-30T10:45:15.174+0900 [TRACE] providercache.fillMetaCache: including .terraform/providers/registry.terraform.io/hashicorp/google/4.15.0/darwin_arm64 as a candidate package for registry.terraform.io/hashicorp/google 4.15.0
2022-06-30T10:45:15.208+0900 [DEBUG] checking for provisioner in "."
2022-06-30T10:45:15.208+0900 [DEBUG] checking for provisioner in "/opt/homebrew/Cellar/tfenv/2.2.3/versions/1.1.2"
2022-06-30T10:45:15.208+0900 [TRACE] backend/local: state manager for workspace "default" will:
 - read initial snapshot from terraform.tfstate
 - write new snapshots to terraform.tfstate
 - create any backup at terraform.tfstate.backup
2022-06-30T10:45:15.208+0900 [TRACE] statemgr.Filesystem: reading initial snapshot from terraform.tfstate
2022-06-30T10:45:15.208+0900 [TRACE] statemgr.Filesystem: snapshot file has nil snapshot, but that's okay
2022-06-30T10:45:15.208+0900 [TRACE] statemgr.Filesystem: read nil snapshot
2022-06-30T10:45:15.208+0900 [TRACE] Meta.Backend: ignoring local "default" workspace because its state is empty
╷
│ Error: Failed to get existing workspaces: querying Cloud Storage failed: Get "https://storage.googleapis.com/storage/v1/b/.../.../: impersonate: status code 403: {
│   "error": {
│     "code": 403,
│     "message": "The caller does not have permission",
│     "status": "PERMISSION_DENIED"
│   }
│ }
│ 
│ 
│ 

Expected Behavior

Terraform has been successfully initialized!

Actual Behavior

impersonate: status code 403: {
│   "error": {
│     "code": 403,
│     "message": "The caller does not have permission",
│     "status": "PERMISSION_DENIED"
│   }
│ }

Steps to Reproduce

  1. gcloud config set auth/impersonate_service_account my-service-account@my-org.iam.gserviceaccount.com
  2. gcloud auth application-default login
  3. export GOOGLE_IMPERSONATE_SERVICE_ACCOUNT=my-service-account@my-org.iam.gserviceaccount.com
  4. terraform init

Additional Context

References

crw commented 2 years ago

Thanks for this report, I'll forward it to the GCP team.

tofke commented 1 year ago

I have a similar error when i try to impersonate a service account :

│ Error: error loading state: Failed to open state file at gs://terraform-state-projectname.../...path.../default.tfstate: Get "https://storage.googleapis.com/terraform-state-projectname.../...path.../default.tfstate": impersonate: status code 403: {
│   "error": {
│     "code": 403,
│     "message": "The caller does not have permission",
│     "status": "PERMISSION_DENIED"
│   }
│ }

No error when i use the SA's key with GOOGLE_APPLICATION_CREDENTIALS ... I wonder if this has to do with API quotas ? (see warnings about GOOGLE_AUTH_SUPPRESS_CREDENTIALS_WARNINGS)

leewoobin789 commented 1 year ago

any Update on it? I am also facing the same issue using workload identity provider bound to the GitHub OIDC.

fadiorg  Initializing the backend...
(DEDACTED)
fadiorg  ╷
(DEDACTED)
         │ Error: Failed to get existing workspaces: querying Cloud Storage failed: Get "https://storage.googleapis.com/storage/v1/b/{BUCKET_NAME}t/o?alt=json&delimiter=%2F&pageToken=&prefix=kcd-state%2F&prettyPrint=false&projection=full&versions=false": oauth2/google: status code 403
brettcurtis commented 1 year ago

Has anyone figured this out yet? I'm seeing the same thing with GitHub OIDC. I get one of two errors back, the one you guys describe, as well as:

Error: Failed to get existing workspaces: querying Cloud Storage failed: Get "https://storage.googleapis.com/storage/v1/b/<clip>terraform%2F&prettyPrint=false&projection=full&versions=false": oauth2/google: status code 403: {
│   "error": {
│     "code": 403,
│     "message": "Permission 'iam.serviceAccounts.getAccessToken' denied on resource (or it may not exist).",
│     "status": "PERMISSION_DENIED",
│     "details": [
│       {
│         "@type": "type.googleapis.com/google.rpc.ErrorInfo",
│         "reason": "IAM_PERMISSION_DENIED",
│         "domain": "iam.googleapis.com",
│         "metadata": {
│           "permission": "iam.serviceAccounts.getAccessToken"
│         }
│       }
│     ]
│   }
│ }

The service account I'm using does in fact have the predefined role of Workload Identity User, which does have that permission.

I have this working in a different org, so I really feel like I just have some wires crossed somewhere :(

EDIT: In my case, I wasted hours debugging because the code generating my principalSet member had the repository name type incorrectly .. 💩

dinigo commented 1 year ago

@brettcurtis No. It's been half a year and we don't have any information from the TF team :/ seems like this isn't actually an option

brettcurtis commented 1 year ago

@dinigo - In my case it was a code issue, we have since then fixed it and are successfully using OIDC from GitHub to our Terraform Backend GCS. Sounds like the context of our problems may be different.

fancybear-dev commented 1 year ago

We've had this exact same issue.

Colleague found out what it was.

The underlying issue was that the gcloud cli had a project set, that did not exist anymore. This happened because we use the same gcloud cli for manual interaction - as well use it for TF. When we deleted a project via TF, the issue presented - because the deleted project was still configured in the gcloud cli.

You can verify the set project in; ~/.config/gcloud/application_default_credentials.json

You set a project manually when you perform a command like this; gcloud config set project {project}

instead of just adding it to every command as a parameter; --project {project}

Hope this helps other people facing the same type of issue. As far as I can tell by others as well, this almost always seems to be a configuration issue - rather than an actual bug. Although I must say, that the error does not point anywhere useful in this case.

OmkarG1986 commented 1 year ago

I ran into exact same problem. I am using Google GKE Workload Identity. GKE/K8s SA is properly mapped to IAM SA. IAM policy binding exist for workloadIdentityUser. pod is annotated with correct SA name. Not sure whats wrong ? Anybody found fix to this problem? Thanks.

$ terraform init Initializing modules...

brettcurtis commented 1 year ago

@OmkarG1986 - can you run this test from your POD and get the correct SA back? Your use case adds a different layer of complexity to the use case compared to mine since you're running on k8s. I'm confused tho if you're trying to run this from a compute node or a POD, the POD is the thing that will be able to use workload identity, not the compute?

dermatologist commented 1 year ago

I ran into exact same problem. I am using Google GKE Workload Identity. GKE/K8s SA is properly mapped to IAM SA. IAM policy binding exist for workloadIdentityUser. pod is annotated with correct SA name. Not sure whats wrong ? Anybody found fix to this problem? Thanks.

I am having the same issue!

can you run this test from your POD and get the correct SA back?

I get the correct SA back.

brettcurtis commented 1 year ago

@dermatologist and just to confirm you're getting the IAM service account back that has the role of iam.workloadIdentityUser that has the member of the k8s service account?

brettcurtis commented 1 year ago

This code works for us:

# Google Service Account Data Source
# https://registry.terraform.io/providers/hashicorp/google/latest/docs/data-sources/service_account

data "google_service_account" "this" {
  account_id = var.service_account_id
}

# Google Service Account IAM Binding Resource
# https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/google_service_account_iam#google_service_account_iam_member

resource "google_service_account_iam_member" "workload_identity" {
  member             = "serviceAccount:${var.project_id}.svc.id.goog[${kubernetes_namespace_v1.this.metadata.0.name}/${kubernetes_service_account_v1.this.metadata.0.name}]"
  role               = "roles/iam.workloadIdentityUser"
  service_account_id = var.service_account_id
}

# Kubernetes Namespace Resource
# https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/namespace_v1

resource "kubernetes_namespace_v1" "this" {
  metadata {
    name = var.namespace
  }
}

# Kubernetes Service Account Resource
# https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/service_account_v1

resource "kubernetes_service_account_v1" "this" {
  metadata {

    annotations = {
      "iam.gke.io/gcp-service-account" = data.google_service_account.this.email
    }

    name      = var.namespace
    namespace = kubernetes_namespace_v1.this.metadata.0.name
  }
}
jameswjr commented 1 year ago

I also had this issue, and found it happened on a very newly-created backend bucket. I worked around it by adding a retry with a two-minute delay to make it succeed on the second try.

fallard84 commented 9 months ago

I am having the exact same problem in GKE with workload identity binding and GCP service account impersonation.

During my troubleshooting, I discovered that terraform is getting the service account access token with the scope devstorage.read_write, so I reproduced the same within the pod running terraform:

curl -sSL -H 'Metadata-Flavor: Google' 'http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdevstorage.read_write'

I used the token to then make a call to generate an access token for the impersonated account:

curl -sSL -H 'Authorization: Bearer <TOKEN>' \
-X POST 'https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/<ACCOUNT>:generateAccessToken' \
-d '{"scope": ["https://www.googleapis.com/auth/devstorage.read_write"]'

And I get the same error:

{
    "error": {
        "code": 403,
        "message": "Request had insufficient authentication scopes.",
        "status": "PERMISSION_DENIED",
        "details": [
            {
                "@type": "type.googleapis.com/google.rpc.ErrorInfo",
                "reason": "ACCESS_TOKEN_SCOPE_INSUFFICIENT",
                "domain": "googleapis.com",
                "metadata": {
                    "method": "google.iam.credentials.v1.IAMCredentials.GenerateAccessToken",
                    "service": "iamcredentials.googleapis.com"
                }
            }
        ]
    }
}

If I remove the scope parameter from the request to get the service account access token, I can generate the impersonated account token without any error.

fallard84 commented 9 months ago

I was able to fix my problem. In my case the problem was a combination of being on GKE 1.25 and terraform 0.14. Upgrading terraform to 0.15 fixed my issue, mainly because of this rework on the impersonation implementation.

I can't reproduce the issue with terraform 0.14 on GKE 1.24.

My hypothesis is that the new version of the gke-metadata-server on GKE 1.25 behaves differently when being passed the scopes parameter. I sniffed the traffic while running terraform 0.15 and I can see that it correctly passes the scope https://www.googleapis.com/auth/cloud-platform when getting the token from the metadata server instead of https://www.googleapis.com/auth/devstorage.read_write. This scope allows access to the API to generate access token.

SelimAcerbas commented 6 months ago

I fixed following issue with following changes;

---ERROR---

Run terraform init

Initializing the backend... ╷ │ Error: Failed to get existing workspaces: querying Cloud Storage failed: Get "https://storage.googleapis.com/storage/v1/b/a_bucket/o?alt=json&delimiter=%2F&endOffset=&includeTrailingDelimiter=false&pageToken=&prefix=terraform%2Fstate%2F&prettyPrint=false&projection=full&startOffset=&versions=false": oauth2/google: status code 403: { │ "error": { │ "code": 403, │ "message": "Permission 'iam.serviceAccounts.getAccessToken' denied on resource (or it may not exist).", │ "status": "PERMISSION_DENIED", │ "details": [ │ { │ "@type": "type.googleapis.com/google.rpc.ErrorInfo", │ "reason": "IAM_PERMISSION_DENIED", │ "domain": "iam.googleapis.com", │ "metadata": { │ "permission": "iam.serviceAccounts.getAccessToken" │ } │ } │ ] │ } │ }

---HOW I SOLVED---

In this link https://github.com/google-github-actions/auth/blob/main/docs/EXAMPLES.md the principalSet took my attention.

I was using this principalSet for the service account used by Workload Identity Federation;

principalSet://iam.googleapis.com/projects/PROJECT_ID(FROM WIF)/locations/global/workloadIdentityPools/POOL_ID/attribute.repository/GITHUB_REPO_OWNER_ID/GITHUB_REPO_ID

Changed into this; principalSet://iam.googleapis.com/projects/PROJECT_ID(FROM WIF)/locations/global/workloadIdentityPools/POOL_ID/*

After that re-run failed and done.

PS: You can use this gcloud command to set this IAM binding;

gcloud iam service-accounts add-iam-policy-binding "my-service-account@${PROJECT_ID}.iam.gserviceaccount.com" \ --project="${PROJECT_ID}" \ --role="roles/iam.workloadIdentityUser" \ --member="principalSet://iam.googleapis.com/projects/1234567890/locations/global/workloadIdentityPools/my-pool/*"

I hope it helps!