terraform-aws-modules / terraform-aws-eks

Terraform module to create Amazon Elastic Kubernetes (EKS) resources 🇺🇦
https://registry.terraform.io/modules/terraform-aws-modules/eks/aws
Apache License 2.0
4.45k stars 4.06k forks source link

╷ │ Error: Post "https://xxxxxx/api/v1/namespaces/kube-system/configmaps": dial tcp 192.168.20.250:443: i/o timeout #2369

Closed stratocloud closed 1 year ago

stratocloud commented 1 year ago

Description

While I am trying to provision a net new cluster, I am getting

╷ │ Error: Post "https://xxxxxx/api/v1/namespaces/kube-system/configmaps": dial tcp 192.168.20.250:443: i/o timeout and

│ with module.eks.kubernetes_config_map.aws_auth[0], │ on .terraform/modules/eks/main.tf line 414, in resource "kubernetes_config_map" "aws_auth": │ 414: resource "kubernetes_config_map" "aws_auth" { │

If your request is for a new feature, please use the Feature request template.

⚠️ Note

Before you submit an issue, please perform the following first:

  1. Remove the local .terraform directory (! ONLY if state is stored remotely, which hopefully you are following that best practice!): rm -rf .terraform/
  2. Re-initialize the project root to pull down modules: terraform init
  3. Re-attempt your terraform plan or apply and check if the issue still persists

Versions

Reproduction Code [Required]

data "aws_eks_cluster" "eks" { name = module.eks.cluster_id }

data "aws_eks_cluster_auth" "eks" { name = module.eks.cluster_id }

locals { cluster_name = var.cluster_name[var.env] cluster_version = var.cluster_version tags = var.tags }

################################################################################

EKS Module

################################################################################ module "eks" { source = "terraform-aws-modules/eks/aws" version = "18.20.2" cluster_name = local.cluster_name cluster_version = local.cluster_version vpc_id = module.eks_vpc.vpc_id subnet_ids = module.eks_vpc.intra_subnets cluster_endpoint_private_access = true cluster_endpoint_public_access = false

cluster_enabled_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"] cluster_addons = { coredns = { resolve_conflicts = "OVERWRITE" }

aws-ebs-csi-driver = {}

kube-proxy = {
  resolve_conflicts        = "OVERWRITE"
  service_account_role_arn = module.eks_vpc_cni_irsa.iam_role_arn
}

}

cluster_encryption_config = [ { provider_key_arn = aws_kms_key.eks.arn resources = ["secrets"] } ]

cluster_tags = merge({ Name = local.cluster_name }, local.tags )

aws-auth configmap

create_aws_auth_configmap = true manage_aws_auth_configmap = true aws_auth_roles = [ { rolearn = var.cluster_admin_arn, username = "adminsso" groups = ["system:masters"] } ] aws_auth_users = [ { rolearn = var.cluster_admin_tf, username = "terraform" groups = ["system:masters"] }, ]

Extend cluster security group rules

cluster_security_group_additional_rules = { egress_nodes_ephemeral_ports_tcp = { description = "To node 1025-65535" protocol = "tcp" from_port = 1025 to_port = 65535 type = "egress" source_node_security_group = true } }

Extend node-to-node security group rules

node_security_group_additional_rules = { ingress_self_all = { description = "Node to node all ports/protocols" protocol = "-1" from_port = 0 to_port = 0 type = "ingress" self = true } egress_all = { description = "Node all egress" protocol = "-1" from_port = 0 to_port = 0 type = "egress" cidr_blocks = ["0.0.0.0/0"] ipv6_cidr_blocks = ["::/0"] } }

eks_managed_node_group_defaults = { ami_type = "AL2_x86_64" disk_size = 50 instance_types = ["t3.medium", "t3a.medium", "t3.large"] iam_role_attach_cni_policy = true }

eks_managed_node_groups = { for node_group in var.eks_node_groups : node_group["name"] => { name = node_group["name"] use_name_prefix = true subnet_ids = module.eks_vpc.private_subnets min_size = node_group["min_size"] max_size = node_group["max_size"] desired_size = node_group["desired_size"] ami_id = "ami-0c84934009677b6d5" enable_bootstrap_user_data = true bootstrap_extra_args = "--container-runtime containerd --kubelet-extra-args '--max-pods=20'"

  pre_bootstrap_user_data = <<-EOT
                export CONTAINER_RUNTIME="containerd"
                export USE_MAX_PODS=false
                EOT

  capacity_type   = node_group["capacity_type"]
  disk_size       = node_group["disk_size"]
  instance_types  = node_group["instance_types"]
  labels          = node_group["labels"]
  taints          = node_group["taints"]
  create_iam_role = true
  iam_role_name   = "sc-eks-managed-node-group-role"
}

} tags = local.tags }

Steps to reproduce the behavior:

Expected behavior

Actual behavior

Screenshot 2022-12-25 at 2 52 46 PM

Terminal Output Screenshot(s)

╷ │ Error: Post "https://xxxxxx/api/v1/namespaces/kube-system/configmaps": dial tcp 192.168.20.250:443: i/o timeout and

│ with module.eks.kubernetes_config_map.aws_auth[0], │ on .terraform/modules/eks/main.tf line 414, in resource "kubernetes_config_map" "aws_auth": │ 414: resource "kubernetes_config_map" "aws_auth" { │

Additional context

stratocloud commented 1 year ago

After commenting this out:

create_aws_auth_configmap = true

now I am getting ╷ │ Error: The configmap "aws-auth" does not exist │ │ with module.eks.kubernetes_config_map_v1_data.aws_auth[0], │ on .terraform/modules/eks/main.tf line 431, in resource "kubernetes_config_map_v1_data" "aws_auth": │ 431: resource "kubernetes_config_map_v1_data" "aws_auth" {

stratocloud commented 1 year ago

https://github.com/terraform-aws-modules/terraform-aws-eks/issues/2009

I have tried all possible discussions, none is working using exec plugin

bryantbiggs commented 1 year ago

is that your complete configuration? where is the kubernetes provider that you are using?

stratocloud commented 1 year ago

@bryantbiggs

provider "kubernetes" { host = module.eks.cluster_endpoint cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

exec { api_version = "client.authentication.k8s.io/v1alpha1" command = "aws"

This requires the awscli to be installed locally where Terraform is executed

args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]

} }

juanmatias commented 1 year ago

Hello, everyone.

Seeing the The configmap "aws-auth" does not exist error is still a thing here... I´m getting this error and can not figure out how to fix it consistently.

Error is not consistent, but usually, in the first run it fails:

module.cluster.module.eks_managed_node_group["spot"].aws_eks_node_group.this[0]: Still creating... [5m0s elapsed]
module.cluster.module.eks_managed_node_group["spot"].aws_eks_node_group.this[0]: Creation complete after 5m1s [id=bb-apps-devstg-eks-demoapps:spot-2023020318211575120000000f]
module.cluster.kubernetes_config_map_v1_data.aws_auth[0]: Creating...
╷
╷
│ Error: The configmap "aws-auth" does not exist
│ 
│   with module.cluster.kubernetes_config_map_v1_data.aws_auth[0],
│   on .terraform/modules/cluster/main.tf line 474, in resource "kubernetes_config_map_v1_data" "aws_auth":
│  474: resource "kubernetes_config_map_v1_data" "aws_auth" {
│ 
╵
Error: Process completed with exit code 1.

And in the second run I have the aws_auth created:

Note: Objects have changed outside of Terraform

Terraform detected the following changes made outside of Terraform since the last "terraform apply" which may have affected this plan:

  # module.cluster.module.eks_managed_node_group["spot"].aws_eks_node_group.this[0] has changed
  ~ resource "aws_eks_node_group" "this" {
        id                     = "bb-apps-devstg-eks-demoapps:spot-2023020318211575120000000f"
      + labels                 = {}
        tags                   = {
            "Environment"                                           = "apps-devstg"
            "Name"                                                  = "spot"
            "Project"                                               = "bb"
            "Terraform"                                             = "true"
            "k8s.io/cluster-autoscaler/bb-apps-devstg-eks-demoapps" = "owned"
            "k8s.io/cluster-autoscaler/enabled"                     = "TRUE"
        }
        # (15 unchanged attributes hidden)

        # (4 unchanged blocks hidden)
    }

Unless you have made equivalent changes to your configuration, or ignored the relevant attributes using ignore_changes, the following plan may include actions to undo or respond to these changes.

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # module.cluster.kubernetes_config_map_v1_data.aws_auth[0] will be created
  + resource "kubernetes_config_map_v1_data" "aws_auth" {
      + data  = {
          + "mapAccounts" = jsonencode([])
          + "mapRoles"    = <<-EOT
                - "groups":
                  - "system:bootstrappers"
                  - "system:nodes"
                  "rolearn": "arn:aws:iam::***:role/spot-eks-node-group-20230203180914825900000005"
                  "username": "system:node:{{EC2PrivateDNSName}}"
                - "groups":
                  - "system:masters"
                  "rolearn": "arn:aws:iam::***:role/DevOps"
                  "username": "DevOps"
            EOT
          + "mapUsers"    = jsonencode([])
        }
      + force = true
      + id    = (known after apply)

      + metadata {
          + name      = "aws-auth"
          + namespace = "kube-system"
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.
module.cluster.kubernetes_config_map_v1_data.aws_auth[0]: Creating...
module.cluster.kubernetes_config_map_v1_data.aws_auth[0]: Creation complete after 0s [id=kube-system/aws-auth]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

I suspect it is a matter of some resource rising a done flag but not being actually ready (the spot node group?), but I couldn´t debug it.

Could you please give me some clues on how to debug it? Thanks!


Regarding the providers, I have:

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
}
data "aws_eks_cluster" "cluster" {
  name = module.cluster.cluster_id
}

data "aws_eks_cluster_auth" "cluster" {
  name = module.cluster.cluster_id
}
github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

garudabgh commented 1 year ago

Hello, everyone.

Seeing the The configmap "aws-auth" does not exist error is still a thing here... I´m getting this error and can not figure out how to fix it consistently.

I'm having the same issue. Did you find any solution to this?

Zynlink commented 1 year ago

I ran into the same issue. I was able to fix it by modifying the cluster's security group and allowing inbound communication on port 443 from where I was running terraform.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

flowinh2o commented 1 year ago

Just for anyone else out there that is using terragrunt and using this module standalone I was able to fix the issue by adding the following block to the terragrunt.hcl

generate "provider" {
  path      = "provider.tf"
  if_exists = "overwrite_terragrunt"
  contents  = <<EOF
provider "kubernetes" {
  host                   = aws_eks_cluster.this[0].endpoint
  cluster_ca_certificate = base64decode(aws_eks_cluster.this[0].certificate_authority[0].data)

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", aws_eks_cluster.this[0].id]
  }
}
EOF
github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] commented 1 year ago

This issue was automatically closed because of stale in 10 days

deasydoesit commented 1 year ago

I just upgraded terraform-aws-eks from 18.x.x to 19.15.3 and are now running into the i/o timeout issue when trying to run kubectl commands.

From my perspective, there was an undocumented breaking change made in 19.x.x by defaulting cluster_endpoint_public_access from true to false.

github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.