terraform-aws-modules / terraform-aws-eks

Terraform module to create Amazon Elastic Kubernetes (EKS) resources 🇺🇦
https://registry.terraform.io/modules/terraform-aws-modules/eks/aws
Apache License 2.0
4.47k stars 4.08k forks source link

Not able to run pods after upgrading eks cluster with terraform version v18.26.6 #2310

Closed akash123-eng closed 1 year ago

akash123-eng commented 1 year ago

Yesterday i upgraded our terraform eks module version from v17.20.0 to v18.26.6 then followed steps given on https://github.com/clowdhaus/eks-v17-v18-migrate for migration and then did terraform apply after which new nodes were created with required IAM policies and all also kube-proxy add on was enabled on exising eks cluster which was our main requirement after that i deleted old nodes which caused all workload to move on to new nodes.. but now when i try to create a new pod or deployment its failing with error Error from server (Timeout): Timeout: request did not complete within requested timeout - context deadline exceeded Initially i thought this may be due to resource cruch so i increased nodes for that but even that didn't solved the issue.

I am able to get details from any pods of the cluster also able to edit the config of resources in the cluster but new pods are not being created and failing with above error Error from server (Timeout): Timeout: request did not complete within requested timeout - context deadline exceeded

please give inputs what could be wrong in this. our kubernetes version in 1.22 PFB below my terraform manifest files for eks.

main.tf

module "eks" {
  source          = "terraform-aws-modules/eks/aws"
  version         = "18.26.6"
  cluster_name    = var.cluster_name
  cluster_version = var.eks_cluster_version
  subnet_ids      = var.private_subnets
  enable_irsa     = true

  vpc_id = var.vpc_id

  cluster_enabled_log_types = var.cluster_enabled_log_types

  prefix_separator                   = ""
  iam_role_name                      = var.cluster_name
  cluster_security_group_name        = var.cluster_name
  cluster_security_group_description = "EKS cluster security group."
  iam_role_additional_policies = [
  "arn:aws:iam::aws:policy/AmazonEKSServicePolicy"
]

  cluster_addons = {
    kube-proxy = {
      addon_name    = "kube-proxy"
      addon_version = "v1.21.2-eksbuild.2"
      resolve_conflicts = "OVERWRITE"
    }
  }

  cluster_encryption_config = [
    {
      provider_key_arn = var.kms_arn
      resources        = ["secrets"]
    }
  ]

  cluster_endpoint_public_access_cidrs = var.vpn_cidr
  cluster_endpoint_private_access      = true
  cluster_endpoint_public_access       = true
  cluster_security_group_additional_rules = {
    ingress = {
      description                = "To node 1025-65535"
      type                       = "ingress"
      from_port                  = 1025
      to_port                    = 65535
      protocol                   = "TCP"
      cidr_blocks                = ["0.0.0.0/0"]
      source_node_security_group = false
    }
  }

  manage_aws_auth_configmap = true
  aws_auth_users            = var.map_users
  aws_auth_roles            = var.map_roles
  aws_auth_accounts         = var.map_accounts
  tags                      = var.tags

  eks_managed_node_groups = {
    monitors = {
      desired_size   = var.eks_worker_asg_desired_capacity
      max_size       = var.eks_worker_asg_max_capacity
      min_size       = var.eks_worker_asg_min_capacity
      name           = "monitoring"
      ami_type       = var.monitors_group_ami_type
      instance_types = [var.eks_monitor_worker_instance_type]
      capacity_type  = "ON_DEMAND"
      labels = {
        GithubRepo = "terraform-aws-eks"
        GithubOrg  = "terraform-aws-modules"
      }
      tags = var.tags
      taints = [{
        key    = "taint"
        value  = "monitoring"
        effect = "NO_EXECUTE"
      }]
      version = var.eks_cluster_version
    },

    workers = {
      desired_size   = var.eks_worker_asg_desired_capacity
      max_size       = var.eks_worker_asg_max_capacity
      min_size       = var.eks_worker_asg_min_capacity
      name           = "worker-group"
      ami_type       = var.workers_group_ami_type
      instance_types = [var.eks_worker_instance_type]
      capacity_type  = "ON_DEMAND"
      labels = {
        GithubRepo = "terraform-aws-eks"
        GithubOrg  = "terraform-aws-modules"
        LaunchType = "ondemand"
      }
      tags    = var.tags
      version = var.eks_cluster_version
    },

  }

  eks_managed_node_group_defaults = {
    root_volume_type = var.eks_worker_volume_type
    enabled_metrics  = ["GroupMinSize", "GroupMaxSize", "GroupDesiredCapacity", "GroupTotalInstances", "GroupInServiceInstances", "GroupTerminatingInstances"]
  }

}

provider.tf


terraform {
  backend "s3" {}
  required_providers {
    helm = {
      version = "2.2.0"
    }
    kubernetes = {
      version = "2.10.0"
    }
    aws = {
      source = "hashicorp/aws"
    }
  }
}

data "aws_eks_cluster" "cluster" {
   name = module.eks.cluster_id
}

data "aws_eks_cluster_auth" "cluster" {
   name = module.eks.cluster_id
 }

data "aws_caller_identity" "current" {}

provider "aws" {
  region  = var.aws_region
#  profile = var.aws_profile

  ignore_tags {
    key_prefixes = ["kubernetes.io/"]
  }
}

provider "helm" {
  kubernetes {
    host                   = data.aws_eks_cluster.cluster.endpoint
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
    token                  = data.aws_eks_cluster_auth.cluster.token

  }

}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.cluster.token

  }
akash123-eng commented 1 year ago

This was fixed after adding ephemeral port range in node/worker security group ingress

github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.