rancher / eks-operator

Apache License 2.0
22 stars 37 forks source link

Add support for AWS EKS custom vpc cni due to IP Exhaustion #256

Open bennysp opened 1 year ago

bennysp commented 1 year ago

Is your feature request related to a problem? Please describe.

There is a well known issue with many clusters or large clusters utilizing AWS EKS as a downstream cluster, where every node adds many secondary routable IP address and before you know it, you have exhausted your VPC routable subnets.

There are a few different options that AWS offers and one of those options is "custom networking": https://aws.github.io/aws-eks-best-practices/networking/custom-networking/

This is a non-routable CG-NAT space set of subnets (ie 100.64.0.0/10) and is NOT applied to the EKS cluster, but rather an aws-node configuration that has to be updated at time of provisioning in order to avoid post provision Node rotation.

The request is to add support for the VPC Custom CNI.

Describe the solution you'd like

The request is to add support for creating the EKS clusters through the Rancher provider using the EKS cluster-addons as shown in this example from the AWS Terraform provider: https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/examples/vpc-cni-custom-networking/main.tf

cluster_addons = {
    coredns    = {}
    kube-proxy = {}
    vpc-cni = {
      # Specify the VPC CNI addon should be deployed before compute to ensure
      # the addon is configured before data plane compute resources are created
      # See README for further details
      before_compute = true
      most_recent    = true # To ensure access to the latest settings provided
      configuration_values = jsonencode({
        env = {
          # Reference https://aws.github.io/aws-eks-best-practices/reliability/docs/networkmanagement/#cni-custom-networking
          AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG = "true"
          ENI_CONFIG_LABEL_DEF               = "topology.kubernetes.io/zone"

          # Reference docs https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
          ENABLE_PREFIX_DELEGATION = "true"
          WARM_PREFIX_TARGET       = "1"
        }
      })
    }
  }
resource "kubectl_manifest" "eni_config" {
  for_each = zipmap(local.azs, slice(module.vpc.private_subnets, 3, 6))

  yaml_body = yamlencode({
    apiVersion = "crd.k8s.amazonaws.com/v1alpha1"
    kind       = "ENIConfig"
    metadata = {
      name = each.key
    }
    spec = {
      securityGroups = [
        module.eks.cluster_primary_security_group_id,
        module.eks.node_security_group_id,
      ]
      subnet = each.value
    }
  })
}

As you will notice, the subnet is NOT added at runtime in the above AWS example (the routable subnets are sliced and then the non-routable are applied on the ENIConfig as a post cluster step), so using the existing Rancher eks_config_v2 will not work as is without some changes to it.

Describe alternatives you've considered

I have done a manual POC of the steps described in this Terraform and I am currently considering implementing the code outlined in this blog: https://medium.com/webstep/dont-let-your-eks-clusters-eat-up-all-your-ip-addresses-1519614e9daa

Since this works in a manual setup, I believe it will work as a "post processing" set of Terraform, however, it is really inefficient to have to go and "roll" every node at the end of provisioning in order for this to work (I confirmed this is needed at the end of my manual POC).

Additional context

I have met with AWS vendor regarding this as well. Using this custom CNI solution was one of the easier recommended solutions and they provided the link to their Terraform example as source.

kkaempf commented 8 months ago

was SURE-6465 dropping milestone, there's currently no market request for this feature.