terraform-aws-modules / terraform-aws-eks

Terraform module to create Amazon Elastic Kubernetes (EKS) resources πŸ‡ΊπŸ‡¦
https://registry.terraform.io/modules/terraform-aws-modules/eks/aws
Apache License 2.0
4.41k stars 4.05k forks source link

`Error: The configmap "aws-auth" does not exist` when deploying an EKS cluster with `manage_aws_auth_configmap = true` #2009

Closed ezzatron closed 2 years ago

ezzatron commented 2 years ago

Description

When deploying an EKS cluster using manage_aws_auth_configmap = true, the deploy fails with the error:

Error: The configmap "aws-auth" does not exist

Versions

Reproduction Code [Required]

data "aws_availability_zones" "available" {}

module "vpc_example" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 3.14.0"

  name = "example"
  cidr = "10.10.0.0/16"

  azs                = slice(data.aws_availability_zones.available.names, 0, 3)
  enable_nat_gateway = true

  private_subnets  = ["10.10.1.0/24", "10.10.2.0/24", "10.10.3.0/24"]
  public_subnets   = ["10.10.11.0/24", "10.10.12.0/24", "10.10.13.0/24"]
  database_subnets = ["10.10.21.0/24", "10.10.22.0/24", "10.10.23.0/24"]

  public_subnet_tags = {
    "kubernetes.io/role/elb" = 1
  }

  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = 1
  }
}

data "aws_iam_roles" "sso_breakglass" {
  name_regex  = "AWSReservedSSO_BreakGlass_.*"
  path_prefix = "/aws-reserved/sso.amazonaws.com/"
}
data "aws_iam_roles" "sso_readall" {
  name_regex  = "AWSReservedSSO_ReadAll_.*"
  path_prefix = "/aws-reserved/sso.amazonaws.com/"
}

module "eks_main" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 18.20.1"

  cluster_name                    = "main"
  cluster_enabled_log_types       = ["api", "audit", "authenticator", "controllerManager", "scheduler"]
  cluster_endpoint_private_access = true

  vpc_id     = module.vpc_example.vpc_id
  subnet_ids = module.vpc_example.private_subnets

  eks_managed_node_groups = {
    spot = {
      create_launch_template = false
      launch_template_name   = ""

      capacity_type = "SPOT"
      instance_types = [
        "m4.large",
        "m5.large",
        "m5a.large",
        "m6i.large",
        "t2.large",
        "t3.large",
        "t3a.large",
      ],
    }
  }

  manage_aws_auth_configmap = true

  aws_auth_roles = [
    {
      rolearn  = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${one(data.aws_iam_roles.sso_breakglass.names)}"
      username = "sso-breakglass:{{SessionName}}"
      groups   = ["sso-breakglass"]
    },
    {
      rolearn  = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${one(data.aws_iam_roles.sso_readall.names)}"
      username = "sso-readall:{{SessionName}}"
      groups   = ["sso-readall"]
    },
  ]
}

Steps to reproduce the behavior:

In our case, we're using Terraform Cloud, but I'm unsure if that actually affects anything here. We were simply trying to create a new EKS cluster, and noticed this new setting. In the past we've had to use complex hacks to manage the aws-auth ConfigMap, so this seemed like a better approach, but it doesn't seem to work.

It's worth noting that running another apply doesn't fix the issue, so I don't think it's a timing issue.

Expected behavior

The cluster is created without error, and the aws-auth ConfigMap contains the expected content.

Actual behavior

The above error.

Additional context

Screen Shot 2022-04-12 at 09 53 54
bryantbiggs commented 2 years ago

thats odd that its stating that the aws-auth configmap doesn't exist when you are using an EKS managed node group - EKS managed node groups automatically create/update the configmap with the role used by the node group (same with Fargate profiles). However, self managed node groups do not create the configmap so we do have a variable to handle this

https://github.com/terraform-aws-modules/terraform-aws-eks/blob/69a815c7dfe3c44feddfaa129c36953656d5f123/examples/self_managed_node_group/main.tf#L61-L63

create_aws_auth_configmap = true
manage_aws_auth_configmap = true
ezzatron commented 2 years ago

Yeah, I assumed based on the docs that we shouldn't be setting create_aws_auth_configmap = true because we're only using managed node groups. Should enable that setting as well?

bryantbiggs commented 2 years ago

yes, its safe to enable this.

question: if the configmap doesn't exist, are your nodes connected to the control plane?

ezzatron commented 2 years ago

As far as I can tell, the ConfigMap genuinely doesn't exist. In the console, the node group shows up under the cluster, which (I think) means that the nodes are connected to the control plane:

Screen Shot 2022-04-12 at 10 35 47

But since the config map doesn't exist, I can't see any of the resources inside the cluster of course:

Screen Shot 2022-04-12 at 10 35 06
bryantbiggs commented 2 years ago

hmm, somethings off. if the configmap doesn't exist then the nodes won't register because they lack authorization to do so

bryantbiggs commented 2 years ago

let me try your repro and see

ezzatron commented 2 years ago

Thanks for taking a look, let me know if I can help out with any other info. In the meantime I'm going to try enabling create_aws_auth_configmap = true, and also seeing if the spot instance config has something to do with it (that's another thing we changed at the same time from a working cluster config).

ezzatron commented 2 years ago

FWIW, adding create_aws_auth_configmap = true did change the error we get, but it didn't help us understand what's going on:

Error: Post "http://localhost/api/v1/namespaces/kube-system/configmaps": dial tcp 127.0.0.1:80: connect: connection refused
Screen Shot 2022-04-12 at 10 48 31
bryantbiggs commented 2 years ago

hmm, it is there but its not recognizing or patching it. I'll have to file an issue with the upstream Kubernetes provider in the morning to have them take a look

ezzatron commented 2 years ago

No worries, thanks for your help πŸ™

lrstanley commented 2 years ago

Experiencing this as well, using 18.20.1 against Kubernetes 1.21. Fixed itself after another plan and apply. Wonder if this is another quirky EKS "feature" πŸ€¦πŸ»β€β™‚οΈ where it says the cluster is ready, but it's actually not yet, and some restart/propagation still needs to happen as soon as the cluster is created, for aws-auth is populated?

β•·
β”‚ Error: The configmap "aws-auth" does not exist
β”‚ 
β”‚   with module.eks.kubernetes_config_map_v1_data.aws_auth[0],
β”‚   on .terraform/modules/eks/main.tf line 428, in resource "kubernetes_config_map_v1_data" "aws_auth":
β”‚  428: resource "kubernetes_config_map_v1_data" "aws_auth" {
β”‚ 
β•΅
+ provider registry.terraform.io/hashicorp/aws v3.74.1
+ provider registry.terraform.io/hashicorp/cloudinit v2.2.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.10.0
+ provider registry.terraform.io/hashicorp/local v2.2.2
+ provider registry.terraform.io/hashicorp/null v3.1.1
+ provider registry.terraform.io/hashicorp/random v3.1.2
+ provider registry.terraform.io/hashicorp/template v2.2.0
+ provider registry.terraform.io/hashicorp/tls v3.3.0

on Terraform 1.1.6.

lrstanley commented 2 years ago

~Actually, nevermind. It succeeded, however it didn't actually apply it.~

EDIT: it's because I didn't have the manage field set. With that enabled, I now get:

β•·
β”‚ Error: configmaps "aws-auth" already exists
β”‚ 
β”‚   with module.eks.kubernetes_config_map.aws_auth[0],
β”‚   on .terraform/modules/eks/main.tf line 411, in resource "kubernetes_config_map" "aws_auth":
β”‚  411: resource "kubernetes_config_map" "aws_auth" {
β”‚ 
β•΅
MadsRC commented 2 years ago

I had the same issue, both the config map does not exist when managing the config map and the connection refused 127.0.0.1 when attempting to create it.

I'm using managed node groups as well.

The way I solved it was to add a kubernetes provider. This here should be enough:

/*
The following 2 data resources are used get around the fact that we have to wait
for the EKS cluster to be initialised before we can attempt to authenticate.
*/

data "aws_eks_cluster" "default" {
  name = module.eks.cluster_id
}

data "aws_eks_cluster_auth" "default" {
  name = module.eks.cluster_id
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}

It's also a great way to authenticate to the EKS cluster instead of the example in the repo that forces the use of awscli

bryantbiggs commented 2 years ago

I had the same issue, both the config map does not exist when managing the config map and the connection refused 127.0.0.1 when attempting to create it.

I'm using managed node groups as well.

The way I solved it was to add a kubernetes provider. This here should be enough:

/*
The following 2 data resources are used get around the fact that we have to wait
for the EKS cluster to be initialised before we can attempt to authenticate.
*/

data "aws_eks_cluster" "default" {
  name = module.eks.cluster_id
}

data "aws_eks_cluster_auth" "default" {
  name = module.eks.cluster_id
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}

It's also a great way to authenticate to the EKS cluster instead of the example in the repo that forces the use of awscli

oy, good spot on not providing the provider and creds. I'll file a ticket for better error reporting on that upstream.

Regarding the token vs exec - exec is what is recommended by the provider itself https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs#exec-plugins

bryantbiggs commented 2 years ago

yep, was able to confirm that was the issue and this now works as expected @ezzatron - we just forgot to put the provider auth

provider "aws" {
  region = "us-east-1"
}

provider "kubernetes" {
  host                   = module.eks_main.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks_main.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks_main.cluster_id]
  }
}

data "aws_availability_zones" "available" {}

data "aws_caller_identity" "current" {}

module "vpc_example" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 3.14.0"

  name = "example"
  cidr = "10.10.0.0/16"

  azs                = slice(data.aws_availability_zones.available.names, 0, 3)
  enable_nat_gateway = true

  private_subnets  = ["10.10.1.0/24", "10.10.2.0/24", "10.10.3.0/24"]
  public_subnets   = ["10.10.11.0/24", "10.10.12.0/24", "10.10.13.0/24"]
  database_subnets = ["10.10.21.0/24", "10.10.22.0/24", "10.10.23.0/24"]

  public_subnet_tags = {
    "kubernetes.io/role/elb" = 1
  }

  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = 1
  }
}

module "eks_main" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 18.20.1"

  cluster_name                    = "main"
  cluster_enabled_log_types       = ["api", "audit", "authenticator", "controllerManager", "scheduler"]
  cluster_endpoint_private_access = true

  vpc_id     = module.vpc_example.vpc_id
  subnet_ids = module.vpc_example.private_subnets

  eks_managed_node_groups = {
    spot = {
      create_launch_template = false
      launch_template_name   = ""

      capacity_type = "SPOT"
      instance_types = [
        "m4.large",
        "m5.large",
        "m5a.large",
        "m6i.large",
        "t2.large",
        "t3.large",
        "t3a.large",
      ],
    }
  }

  manage_aws_auth_configmap = true

  aws_auth_roles = [
    {
      rolearn  = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/foo"
      username = "sso-breakglass:{{SessionName}}"
      groups   = ["sso-breakglass"]
    },
    {
      rolearn  = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/bar"
      username = "sso-readall:{{SessionName}}"
      groups   = ["sso-readall"]
    },
  ]
}
bryantbiggs commented 2 years ago

@ezzatron are we able to close this with the solution posted above?

narenaryan commented 2 years ago

:heavy_check_mark: For us, following setup worked while migrating EKS module from 17.22.0 to 18.20.2

data "aws_eks_cluster" "default" {
  name = local.cluster_name
}

data "aws_eks_cluster_auth" "default" {
  name = local.cluster_name
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}

For data source fields 'aws_eks_cluster' and 'aws_eks_cluster_auth', name: module.eks.cluster_id didn't work for some reason, and was throwing connection errors as specified in the title of this ticket.

We got the provider block from HashiCorp terraform provider git: https://github.com/hashicorp/terraform-provider-kubernetes/blob/main/_examples/eks/kubernetes-config/main.tf

Versions:

| Terraform | 1.1.8 | | EKS Module | 18.20.2 | | Kubernetes Provider | 2.10.0 |

bryantbiggs commented 2 years ago

@narenaryan just be mindful that using that route the token can expire. The provider recommends the exec route if you can (requires awscli to be available where Terraform is execute) https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs#exec-plugins

magnusseptim commented 2 years ago

It seems to still be some issue.

Tried both @bryantbiggs and @narenaryan propostions, and yes, it sometimes works as intended.

Unfortunately from time to time I have :

Error: Get "https://***.***.eks.amazonaws.com/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp <ip address here>:443: i/o timeout

on terraform plan stage.

May be that it is not a eks module / kubernetes provider resource, but rather some issue with my deployment machine, but this is what I got for now.

Was unable to find yet what was the underlying reason.

bryantbiggs commented 2 years ago

@magnusseptim please feel free to investigate, add a πŸ‘πŸ½ , or post a new issue upstream as there are well known issues with the Kubernetes provider https://github.com/hashicorp/terraform-provider-kubernetes/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc+localhost

ncjones commented 2 years ago

When configuring the kubernetes provider using exec and no region, as per the advice above, I got the following error:

module.eks.kubernetes_config_map.aws_auth[0]: Creating...
β”‚ Error: Post "https://9266dd6a08gr7.us-west-2.eks.amazonaws.com/api/v1/namespaces/kube-system/configmaps": getting credentials: exec: executable aws failed with exit code 255
β”‚ 
β”‚   with module.eks.kubernetes_config_map.aws_auth[0],
β”‚   on .terraform/modules/eks/main.tf line 414, in resource "kubernetes_config_map" "aws_auth":
β”‚  414: resource "kubernetes_config_map" "aws_auth" {
β”‚ 
Error: Process completed with exit code 1.

I think it's because it assumes the AWS CLI is configured with a region which is not true in my environment. If the region is not set then the AWS CLI will attempt to contact the instance metadata service (IMDS) to detect the region. The IMDS call also fails in my CI/CD environment.

Using the aws_eks_cluster_auth data source solves this issue.

data "aws_eks_cluster_auth" "default" {
  name = "my-eks-cluster-name"
}

provider "kubernetes" {
  host = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
  token = data.aws_eks_cluster_auth.default.token
}
bryantbiggs commented 2 years ago

The exec command is sent to the awscli so you can set the region:

provider "kubernetes" {
  host                   = module.eks_main.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks_main.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks_main.cluster_id, "--region", "us-east-1"]
  }
}
Hokwang commented 2 years ago

in my case, I have several profiles, so I need to add "--profile" option.

magnusseptim commented 2 years ago

So, eventually I was able to handle my issue.

Result :

In my case, a combination of cluster_endpoint_private_access = true, -parallelism=1 (in a form of env variable on my devops agent) and coredns = { resolve_conflicts = "OVERWRITE" } (but that one only cause of planned usage of fargate) worked.

How I was able to figure that out :

Overall, I think -parallelism=1 was crucial, as when I tried to dig-out what was the issue, I tried to 'reproduce' aws-auth applying procedure with older version of module (18.19.0) and kubernetes provider (1.13 / 1.19) on my own.

I did something similar to : https://github.com/terraform-aws-modules/terraform-aws-eks/issues/1901#issuecomment-1060428753 and https://github.com/terraform-aws-modules/terraform-aws-eks/issues/1744#issuecomment-1046810281 which ended with localhost-kind of issues, like in https://github.com/hashicorp/terraform-provider-kubernetes/issues/1028

Which lead me to -parallelism=1.

During tests also noticed that if both cluster_endpoint_private_access=true and cluster_endpoint_public_access=false are set, updating of cluster is 'unstable' (which I allow myself to name like that cause of lack of better word).

Setting those flags like that ends with random issues happening for both token and exe kubernetes provider approach, mostly failing as a 30+ sec. timeouts from config map endpoint api/v1/namespaces/kube-system/configmaps

Not sure about underlying issue, but I considered it worth mentioning here

UPDATE :

At the very end I was able to have both cluster_endpoint_private_access=true and cluster_endpoint_public_access=false by using exe kubernetes provider approach without timeouts.

| Terraform | 1.1.8 | | EKS Module | 18.20.2 | | Kubernetes Provider | 2.10.0 |

ezzatron commented 2 years ago

@ezzatron are we able to close this with the solution posted above?

Thanks for the help - I'm pretty sure I wasn't using that type of auth so it seems like it would have fixed the original issue. I ended up using https://github.com/aws-ia/terraform-aws-eks-blueprints in the meantime, which means I'm no longer manually adding to aws-auth.

I'll close the issue in any case πŸ‘

talalashraf commented 2 years ago

For any Terragrunt users, this block solves it.

generate "provider" {
  path      = "provider.tf"
  if_exists = "overwrite"
  contents = <<EOF
provider "kubernetes" {
  host                   = aws_eks_cluster.this[0].endpoint
  cluster_ca_certificate = base64decode(aws_eks_cluster.this[0].certificate_authority[0].data)
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    args        = ["eks", "get-token", "--cluster-name", var.cluster_name]
    command     = "aws"
  }
}
EOF
}
kaykhancheckpoint commented 2 years ago

I ran into this issue aswell today and it was because i had multiple profiles on my local machine, setting the profile worked for me .

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id, "--profile", locals.aws_profile]
  }
}
guitmz commented 2 years ago

EDIT: I was able to get it working, somehow I was getting the wrong profile name in the terragrunt (cache perhaps?)

I'm still getting an error (terragrunt) with module 18.20.5

tculp commented 2 years ago

@bryantbiggs

So the terraform docs (https://www.terraform.io/language/providers/configuration) says that

You can use [expressions](https://www.terraform.io/language/expressions) in the 
values of these configuration arguments, but can only reference values that are 
known before the configuration is applied. This means you can safely reference 
input variables, but not attributes exported by resources (with an exception for 
resource arguments that are specified directly in the configuration).

However, the workaround with configuring the kubernetes provider from the output of the module is counter to this. Is the terraform docs just being overly paranoid, or can there sometimes be issues with the workaround?

If there are potential issues with this method, then I think that the aws_auth_configmap_yaml output should be un-deprecated

FernandoMiguel commented 2 years ago

@bryantbiggs

So the terraform docs (https://www.terraform.io/language/providers/configuration) says that

You can use [expressions](https://www.terraform.io/language/expressions) in the 
values of these configuration arguments, but can only reference values that are 
known before the configuration is applied. This means you can safely reference 
input variables, but not attributes exported by resources (with an exception for 
resource arguments that are specified directly in the configuration).

However, the workaround with configuring the kubernetes provider from the output of the module is counter to this. Is the terraform docs just being overly paranoid, or can there sometimes be issues with the workaround?

If there are potential issues with this method, then I think that the aws_auth_configmap_yaml output should be un-deprecated

My experience with this shows that the kube provider does not know the cluster endpoint or valid token. It's weird cause I was able to use those details in a local template. I guess the provider is initialised early in the graph and lacks the correct values.

flowinh2o commented 2 years ago

For any Terragrunt users, this block solves it.

generate "provider" {
  path      = "provider.tf"
  if_exists = "overwrite"
  contents = <<EOF
provider "kubernetes" {
  host                   = aws_eks_cluster.this[0].endpoint
  cluster_ca_certificate = base64decode(aws_eks_cluster.this[0].certificate_authority[0].data)
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    args        = ["eks", "get-token", "--cluster-name", var.cluster_name]
    command     = "aws"
  }
}
EOF
}

Thank you @kaykhancheckpoint This worked for me as well.

dm-sumup commented 2 years ago

Looks like this issue is closed, but none of the fixes provided here work.

Here is my tf file:

data "aws_eks_cluster" "default" {
  name = module.eks.cluster_id
}

data "aws_eks_cluster_auth" "default" {
  name = module.eks.cluster_id
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    args        = ["eks", "get-token", "--cluster-name", var.cluster_name, "--profile", var.customer-var.environment]
    command     = "aws"
  }
  # token                  = data.aws_eks_cluster_auth.default.token
}

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 18.0"

  cluster_name    = var.cluster_name
  cluster_version = var.cluster_version

  cluster_endpoint_private_access = true
  cluster_endpoint_public_access  = false

  cluster_enabled_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]

  cluster_addons = {
    coredns = {
      resolve_conflicts = "OVERWRITE"
    }
    kube-proxy = {}
    vpc-cni = {
      resolve_conflicts = "OVERWRITE"
    }
  }

  vpc_id  = var.vpc_id
  subnet_ids = var.subnet_ids

  cluster_encryption_config = [{
    provider_key_arn = var.kms_key_id
    resources        = ["secrets"]
  }]

  # EKS Managed Node Group(s)
  eks_managed_node_group_defaults = {
    disk_size              = 50
    instance_types         = ["c5.large"]
  }

  eks_managed_node_groups = {
    "${var.ng1_name}" = {
      min_size     = var.ng1_min_size
      max_size     = var.ng1_max_size
      desired_size = var.ng1_desired_size

      instance_types = var.ng1_instance_types
      capacity_type  = "ON_DEMAND"

      update_config = {
        max_unavailable_percentage = 50
      }

      tags = var.tags
    }
  }

  node_security_group_additional_rules = var.ng1_additional_sg_rules

  # aws-auth configmap
  manage_aws_auth_configmap = true

  tags = var.tags
}

I still get

β”‚ Error: The configmap "aws-auth" does not exist
β”‚ 
β”‚   with module.eks-cluster.module.eks.kubernetes_config_map_v1_data.aws_auth[0],
β”‚   on .terraform/modules/eks-cluster.eks/main.tf line 431, in resource "kubernetes_config_map_v1_data" "aws_auth":
β”‚  431: resource "kubernetes_config_map_v1_data" "aws_auth" {

Any ideas?

Using: eks modue ~> 18.0 tls v3.4.0 cloudinit v2.2.0 kubernetes v2.11.0 aws v3.75.1

dempti commented 2 years ago

@bryantbiggs I am still faicing the same issue while setting manage_aws_auth_configmap=true it fails for managed node group, but it set false it passes

aws.com/id/B91CD86D1023F225A6AFAD3DD6515EBF]

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # kubernetes_config_map_v1_data.aws_auth[0] will be created
  + resource "kubernetes_config_map_v1_data" "aws_auth" {
      + data  = {
          + "mapAccounts" = jsonencode([])
          + "mapRoles"    = <<-EOT
                - "groups":
                  - "system:bootstrappers"
                  - "system:nodes"
                  "rolearn": "arn:aws:iam::{{aws_id}}}:role/eks-managed-node-group"
                  "username": "system:node:{{EC2PrivateDNSName}}"
                - "groups":
                  - "system:masters"
                  "rolearn": "admin_role"
                  "username": "admin"
            EOT
          + "mapUsers"    = jsonencode([])
        }
      + force = true
      + id    = (known after apply)

      + metadata {
          + name      = "aws-auth"
          + namespace = "kube-system"
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.
kubernetes_config_map_v1_data.aws_auth[0]: Creating...
β•·
β”‚ Error: The configmap "aws-auth" does not exist
β”‚
β”‚   with kubernetes_config_map_v1_data.aws_auth[0],
β”‚   on main.tf line 431, in resource "kubernetes_config_map_v1_data" "aws_auth":
β”‚  431: resource "kubernetes_config_map_v1_data" "aws_auth" {
β”‚
β•΅
Releasing state lock. This may take a few moments...
ERRO[0079] Module /LEARN/modules/eks has finished with an error: 1 error occurred:
    * exit status 1
  prefix=[/LEARN/modules/eks]
ERRO[0079] 1 error occurred:
    * exit status 1

Environment Information

eks -> 1.22
eks-module -> 18.21.0
bryantbiggs commented 2 years ago

@dempti without a reproduction, I can't really help

dempti commented 2 years ago

@dempti without a reproduction, I can't really help

@bryantbiggs thank you for your help and being active even during the weekends. :bow:

AndreKR commented 2 years ago

I'm getting this error as well. Here's my tf file:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "4.15.1"
    }
  }
}

provider "aws" {
  region = "eu-central-1"
}

data "aws_availability_zones" "available" {}

module "vpc" {
  source  = "registry.terraform.io/terraform-aws-modules/vpc/aws"
  version = "3.2.0"

  name            = "my-vpc"
  cidr            = "10.0.0.0/16"
  azs             = data.aws_availability_zones.available.names
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]

  enable_nat_gateway = true
  single_nat_gateway = true
}

module "eks" {
  source  = "registry.terraform.io/terraform-aws-modules/eks/aws"
  version = "~> 18.0"

  cluster_name = "my-cluster"

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  eks_managed_node_groups = {
    main = {
      instance_types = ["t2.small"]
      capacity_type  = "SPOT"
    }
  }

  manage_aws_auth_configmap = true

  aws_auth_users = [
    {
      userarn  = "arn:aws:iam::9999999999999:user/foo"
      username = "foo"
      groups   = ["system:masters"]
    }
  ]
}

You need to change userarn and set the AWS_... environment variables.

bryantbiggs commented 2 years ago

@AndreKR you need to add an authenticated Kubernetes provider such as:

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}
AndreKR commented 2 years ago

That works, thanks.

That exec awscli thing felt like a massive crutch though, so I did a quick search and I found this, which seems to work as well:

data "aws_eks_cluster_auth" "eks_auth" {
  name = module.eks.cluster_id
}

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
  token                  = data.aws_eks_cluster_auth.eks_auth.token
}
bryantbiggs commented 2 years ago

That works as well but the exec method is what is recommended by the Kubernetes provider. If you exceed the token's lifetime, you will have to run a terraform refresh, whereas the exec method handles this for you

orkramer commented 2 years ago

Suddenly I had the exact same error. It was not before. For me it was because the SSO token has expired.

I saw it running: aws eks get-token --cluster-name CLUSTER

"The SSO session associated with this profile has expired."

Its good now after I refreshed the token.

ptwohig commented 2 years ago

I'm getting this with the module version ~> 18.23 and EKS version 1.22. Maybe I"m just missing something, but what's the solution to this?

I can create the cluster just fine, but can't modify it later because I ran into another issue were addons are stuck in the "CREATING" state.

bsakweson commented 2 years ago

This is really strange, I have tried all suggestions on this page but issue persists. Is there a better fix for this? I upgraded my version to v18.24.1, to no avail.

mconigliaro commented 2 years ago

I think the configmap stuff belongs in a totally separate module. The kubernetes provider documentation even warns about this:

When using interpolation to pass credentials to the Kubernetes provider from other resources, these resources SHOULD NOT be created in the same Terraform module where Kubernetes provider resources are also used.

hirokistring commented 2 years ago

FYI: I got same error message on this issue: Error: The configmap "aws-auth" does not exist

But I found the cause is that I did NOT have the access to the private API endpoint of the EKS cluster. I fixed the problem by adding cluster_security_group_additional_rules to allow ingress access from the place I run Terraform to the private API endpoint.

All discussions on this issue were really helpful. Thanks a lot for all.

ctroyp commented 2 years ago

@hirokistring would you mind posting the code segments that you used for the ingress rule (masking your ip address of course)? I'm having the same issue.

flomsk commented 2 years ago

confirming its working with token and not with exec under provider configuration

provider "kubernetes" {
  alias                  = "eu-west-1"
  host                   = data.aws_eks_cluster.cluster_eu.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster_eu.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster_eu.token
  # exec {
  #   api_version = "client.authentication.k8s.io/v1beta1"
  #   args        = ["eks", "get-token", "--cluster-name", data.aws_eks_cluster.cluster_eu.name]
  #   command     = "aws"
  # }
}
duclm2609 commented 2 years ago
data "aws_eks_cluster_auth" "eks_auth" {
  name = module.eks.cluster_id
}

Me too. Using exec does not work.

ricardo6142dh commented 2 years ago

Still not working here :-(

Screenshot 2022-07-03 at 23 55 59

`module "eks" {

https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest

source = "terraform-aws-modules/eks/aws" version = "18.26.2"

cluster_name = local.cluster_name cluster_version = "1.21"

vpc_id = module.vpc.vpc_id subnet_ids = module.vpc.public_subnets

enable_irsa = true

create_cluster_security_group = false create_node_security_group = false

manage_aws_auth_configmap = true

aws_auth_users = [ { userarn = "arn:aws:iam::my_aws_account:user/my_user" username = "lopes_becker" groups = ["system:masters"] } ]`

zeevmoney commented 2 years ago

in my case, I have several profiles, so I need to add "--profile" option.

Same thing happened to me, make sure you use the right profile.

jeunii commented 2 years ago

Im not sure what im doing wrong here. But my config looks like

provider "aws" {
  assume_role {
    role_arn = "arn:aws:iam::${var.aws_account_id}:role/pe-gitlab-assume_role"
  }
}

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}

data "terraform_remote_state" "vpc" {
  backend = "http"
  config = {
    address = "https://gitlab.com/api/v4/projects/37469473/terraform/state/core-net"
   }
}

module "eks" {
  source  = "terraform-aws-modules/eks/aws"

  cluster_name    = "build"
  cluster_version = "1.22"

  cluster_endpoint_private_access = true
  cluster_endpoint_public_access  = true

  cluster_addons = {
    coredns = {}
    kube-proxy = {}
    vpc-cni = {}
  }

  vpc_id     = data.terraform_remote_state.vpc.outputs.vpc_id
  control_plane_subnet_ids = data.terraform_remote_state.vpc.outputs.dev_cp_subnet_ids

  # EKS Managed Node Group(s)
  eks_managed_node_group_defaults = {
    disk_size      = 50
    instance_types = ["t3.medium"]
    subnet_ids = data.terraform_remote_state.vpc.outputs.dev_ng_subnet_ids
  }

  eks_managed_node_groups = {
    core = {
      min_size     = 2
      max_size     = 10
      desired_size = 2

      instance_types = ["t3.2xlarge"]
      capacity_type  = "SPOT"
    }
  }

  tags = {
    ManagedBy = "Terraform"
    Infra = "eks"
  }

  # aws-auth configmap
  create_aws_auth_configmap = true

}

But when I run this, I get

β”‚ Error: Unauthorized
β”‚ 
β”‚   with module.eks.kubernetes_config_map.aws_auth[0],
β”‚   on .terraform/modules/eks/main.tf line 453, in resource "kubernetes_config_map" "aws_auth":
β”‚  453: resource "kubernetes_config_map" "aws_auth" {
β”‚ 
β•΅
Cleaning up project directory and file based variables

If I use just manage_aws_auth_configmap, I get

β”‚ Error: The configmap "aws-auth" does not exist
β”‚ 
β”‚   with module.eks.kubernetes_config_map_v1_data.aws_auth[0],
β”‚   on .terraform/modules/eks/main.tf line 470, in resource "kubernetes_config_map_v1_data" "aws_auth":
β”‚  470: resource "kubernetes_config_map_v1_data" "aws_auth" {