hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.64k stars 9.02k forks source link

v3.36.0 forbidden: User "system:anonymous" cannot get resource #18852

Open odg0318 opened 3 years ago

odg0318 commented 3 years ago

Community Note

Terraform CLI and Terraform AWS Provider Version

$ terraform version
Terraform v0.14.6
+ provider registry.terraform.io/hashicorp/aws v3.36.0
+ provider registry.terraform.io/hashicorp/kubernetes v1.11.4
+ provider registry.terraform.io/hashicorp/random v3.1.0

Affected Resource(s)

Terraform Configuration Files

Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.

# Copy-paste your Terraform configurations here - for large Terraform configs,
# please use a service like Dropbox and share a link to the ZIP file. For
# security, you can also encrypt the files using our GPG public key: https://keybase.io/hashicorp

Debug Output

N/A

Panic Output

N/A

Expected Behavior

provider "kubernetes" {
  host                   = aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
  load_config_file       = false
}

data "aws_eks_cluster_auth" "cluster" {
  name = aws_eks_cluster.cluster.name
}

Kubernetes provider can access the EKS cluster using the data.aws_eks_cluster_auth.cluster.token. I confirmed that this works fine with v3.35.0 version but it doesn't work with v3.36.0.

I think that data.aws_eks_cluster_auth.cluster.token has invalid information. It looks like that the token is authenticated but the user is anonymous so that the below error happened.

It is required to patch in emergency.

Actual Behavior

Error: roles.rbac.authorization.k8s.io "cluster-autoscaler" is forbidden: User "system:anonymous" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "kube-system"

Error: clusterroles.rbac.authorization.k8s.io "cluster-autoscaler" is forbidden: User "system:anonymous" cannot get resource "clusterroles" in API group "rbac.authorization.k8s.io" at the cluster scope

Steps to Reproduce

  1. terraform apply

Important Factoids

References

mattjamesaus commented 3 years ago

Seeing this also with a new cluster we provisioned.

YakDriver commented 3 years ago

@odg0318 Sorry you are seeing this issue! Thank you for reporting it. The data source in question had a minor linting fix about 2 months ago but has basically remained unchanged for a while. That make me wonder if something upstream has changed. The token code has also remained very-little changed. Let us know if you find out anything more.

mattjamesaus commented 3 years ago

So after doing some more digging it appears this maybe tied to us running this inside a module (at least for my case).

We have a parent module (VPC etc) and then the kubernetes and EKS resources inside a single module together.

I've found that if i throw the token returned from aws_eks_cluster_auth up from the child module to the parent as an output (by selectively targeting so the plan doesn't fail) i was able to then hardcode that token back into the child module instead of using the data call and the whole plan worked.

So somehow the token value that ends up getting inputted into the kubernetes provider isn't correct when done inside a module. I'm not sure if it's like a race condition and the token is being set to NULL or some other strange side affect.

decnorton commented 3 years ago

I'm also baffled by this. I've tried @mattjamesaus's suggestion with no luck.

The bulk of our configuration is in modules. The main configuration simply calls two separate modules; one for creating and setting up the cluster, and another for configuring individual tenants.

We use data.aws_eks_cluster_auth.cluster.token a total of 3 times, all from modules. I've tried getting the token once and then passing it out of the module and into another but it remains unauthorised.

mattjamesaus commented 3 years ago

@decnorton we came across this again the other day and I believe terraform refresh solved it - there's something definitely awry.

The fella that spotted it is off for the next couple of days but I'll get him to throw a comment in.

Perhaps we can get some simplified examples of how we're calling this to see if there's a pattern for the team to repo maintainers to fix. I have a suspicion it's either the multiple calls or some data/state cache that's doing it.

Super annoying tho so feel ya pain.

mjrlee commented 3 years ago

I'm also seeing this - worked up until midday yesterday (in EU-WEST-2) and then stopped working with a similar error.

A refresh didn't solve it here.

Edit to add - tried again with parallelism=1 - didn't fix it, not sure if that narrows down the search.

mjrlee commented 3 years ago

Here's a thing I've found, the OIDC provider thumbprint changed sometime in the last few days. I wonder if there's some certificate change inside of AWS which is causing this, perhaps with something persisted that's failing when the thumbprint changes?

federico-cuello-solarisbank commented 3 years ago

We are facing the same problem.... we created 4 different clusters at the same time in 4 different AWS accounts and all of them are now failing at the same time, and were fine yesterday. Could it be that the token being used is expired?

federico-cuello-solarisbank commented 3 years ago

Ok, more info:

In our aws_eks_cluster resource:

+  enabled_cluster_log_types = [
+    "api", "audit", "authenticator", "controllerManager", "scheduler"
+  ]

, this is what caused the "system:anonymous" error.

Doing an apply first to the cluster (aws provider) then the k8s provider works

dcopestake commented 3 years ago

I just encountered this same issue. I'm using:

It happened whilst I was updating an aws_eks_cluster resource, specifically it's vpc_config.public_access_cidrs attribute.

Looking at the Terraform debug logs I couldn't see anything obvious other than the fact all the K8s API requests were being made without a authorization header and no bearer token (hence the 403 responses), almost as if the aws_eks_cluster_auth wasn't being used at all.

The workaround from @federico-cuello-solarisbank to target the cluster to apply the change first worked for me.

verschmelzen commented 2 years ago

I was able to fix the issue by referencing the cluster directly via literal string (without reference to aws_eks_cluster resource). My other observation was that adding depends_on to aws_eks_cluster_auth caused the error to appear again.

Looks like the dependency on cluster resource causes aws_eks_cluster_auth to not being evaluated.

Also I noticed with export TF_LOG=DEBUG that when I add depends_on the following disappears from logs compared to case when there is no dependency from aws_eks_cluster_auth and aws_eks_cluster:

2021-09-30T18:17:53.467+0500 [WARN]  Provider "provider[\"registry.terraform.io/hashicorp/aws\"]" produced an unexpected new value for data.aws_eks_cluster_auth.eks_cluster.
      - .token: inconsistent values for sensitive attribute
brunosimsenhor commented 2 years ago

I'm seeing something very similar when I try to insert some tags into resources used by the AWS EKS cluster, I get the same unauthorized error, the KMS key and the CloudWatch Log Group, in this case.

I tried to tag these resources manually through the AWS console and the error occurs without a change into the Terraform code.

After some time, I tried to change the Kubernetes provider to authenticate from token to exec and it worked! :nerd_face:

My terraform version:

Terraform v1.0.10
on linux_amd64
+ provider registry.terraform.io/hashicorp/aws v3.66.0
+ provider registry.terraform.io/hashicorp/helm v2.4.1
+ provider registry.terraform.io/hashicorp/kubernetes v2.6.1
+ provider registry.terraform.io/hashicorp/random v3.1.0
+ provider registry.terraform.io/hashicorp/template v2.2.0
+ provider registry.terraform.io/hashicorp/tls v2.2.0

Your version of Terraform is out of date! The latest version
is 1.0.11. You can update by downloading from https://www.terraform.io/downloads.html

Then:

provider "kubernetes" {
  host                   = aws_eks_cluster.main.endpoint
  cluster_ca_certificate = base64decode(one(aws_eks_cluster.main.certificate_authority[*].data))
  token                  = data.aws_eks_cluster_auth.control_plane.token
}

Now:

provider "kubernetes" {
  host                   = aws_eks_cluster.main.endpoint
  cluster_ca_certificate = base64decode(one(aws_eks_cluster.main.certificate_authority[*].data))

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    args        = ["eks", "get-token", "--cluster-name", aws_eks_cluster.main.name]
    command     = "aws"
  }
}
andrleite commented 1 year ago

I have a similar issue when I add tags block in the aws_eks_cluster resource. I'm using AWS provider version 4.64.0

User "system:anonymous" cannot get resource "configmaps" in API group "" in the namespace "kube-system"
akefirad commented 6 months ago

Experiencing the same issue In my stack using CDKTF. Can't say for sure why or how, but playing with explicit dependency on the cluster and/or default node group randomly throws the error (not consistently):

    const auth = new DataAwsEksClusterAuth(this, "main-cluster-auth", {
      name: this.cluster.name,
      dependsOn: [this.cluster, this.defaultNodeGroup],
    });

Update: Now it started to fail again 😞

chaitanya0619 commented 4 months ago

any update on this issue?

Terraform v1.4.7 on darwin_amd64

znedw commented 1 month ago

I've just had the same issue, fixed using @brunosimsenhor's method, with api_version = "client.authentication.k8s.io/v1beta1"

thumbprint_list for my aws_iam_openid_connect_provider had changed, so it seems like it's related...