terraform-aws-modules / terraform-aws-eks

Terraform module to create Amazon Elastic Kubernetes (EKS) resources 🇺🇦
https://registry.terraform.io/modules/terraform-aws-modules/eks/aws
Apache License 2.0
4.44k stars 4.06k forks source link

Windows Managed Node Group support #2350

Closed bryantbiggs closed 8 months ago

bryantbiggs commented 1 year ago

Is your request related to a new offering from AWS?

Is your request related to a problem? Please describe.

Describe the solution you'd like.

Describe alternatives you've considered.

Additional context

bryantbiggs commented 1 year ago

Requires the Terraform aws-sdk version to be updated https://github.com/hashicorp/terraform-provider-aws/issues/28438

chandrasekharkolla commented 1 year ago

Any update on this?

bryantbiggs commented 1 year ago

There aren't any code changes required so you can in theory use it today, but we will be adding an example and checking to see how it aligns with the rest of the Linux AL2 and Bottlerocket OS usage

sebas-w commented 1 year ago

If you want to create a windows managed node group using this module, I can confirm that on version 18.31.2 you can specify the following for a windows eks managed node group as long as the following requirements are fulfilled.

Requirements

Example

eks_managed_node_groups = {
  windows = {
    min_size          = 1
    desired_size      = 1
    max_size          = 5
    platform          = "windows"
    ami_type          = "WINDOWS_CORE_2019_x86_64"
    capacity_type     = "SPOT"
    enable_monitoring = true
    disk_size         = "100"
    use_name_prefix   = true
    cluster_version   = var.aws_eks_cluster_version
    instance_types    = ["m5d.xlarge", "m5ad.xlarge"]
    taints = [
      {
        key    = "os"
        value  = "windows"
        effect = "NO_SCHEDULE"
      }
    ]
  },
},
bryantbiggs commented 1 year ago

thank you for sharing @sebas-w !

enver commented 1 year ago

@sebas-w Thank you for sharing an example! I was able to create windows managed node pool as you described above and run a test pod on it. However, I'm unable to connect to any pod via the cluster's internal network. Access to other resources in VPC or the internet works without issue (except for obvious DNS resolution problems). Did you have such problems?

aamoctz commented 1 year ago

@sebas-w This does indeed work unless you set var.manage_aws_auth_configmap = true. If that var is enabled then the module overwrites aws-auth configmap values set by EKS and in the process removes the eks:kube-proxy-windows line from the Windows node group in the aws-auth configmap.

local.node_iam_role_arns_windows currently does not look at module.eks_managed_node_groups to determine if platform == "windows". So the module assumes MNGs are Linux or Bottlerocket and that line in the config is removed.

When var.manage_aws_auth_configmap = false:

mapRoles: |
  - "groups":
    - "eks:kube-proxy-windows"
    - "system:bootstrappers"
    - "system:nodes"
    "rolearn": "<windows_mng_role_arn>"
    "username": "system:node:{{EC2PrivateDNSName}}"

When var.manage_aws_auth_configmap = true:

mapRoles: |
  - "groups":
    - "system:bootstrappers"
    - "system:nodes"
    "rolearn": "<windows_mng_role_arn>"
    "username": "system:node:{{EC2PrivateDNSName}}"
aamoctz commented 1 year ago

Has any work started related to this issue? I have some changes I can contribute to at least resolve the issue with manage_aws_auth_configmap removing eks:kube-proxy-windows, but if there's already work in progress I would rather not step on anyone's toes on this.

noamgreen commented 1 year ago

https://github.com/terraform-aws-modules/terraform-aws-eks/pull/2477

see this PR if someone can help push it pls

trippinnik commented 1 year ago

If you want to create a windows managed node group using this module, I can confirm that on version 18.31.2 you can specify the following for a windows eks managed node group as long as the following requirements are fulfilled.

Requirements

  • Your AWS Terraform Provider is at least version v4.48.0 to allow you to pass in the correct AMI_TYPE for Windows EKS Managed Node Group Instances.
  • You already Have a linux EKS Node Group and nodes on your cluster. I confirmed with AWS Support you're not able to run just a windows EKS Cluster so you need to already have a linux node in place to launch any windows nodes via the Managed Node Group option.
  • Your EKS node Role has the policy AmazonEKSVPCResourceController, which it should if you use this module since it's here;
  • You have enabled the Windows support by adding the configmap:
apiVersion: v1
data:
  enable-windows-ipam: "true"
immutable: false
kind: ConfigMap
metadata:
  name: amazon-vpc-cni
  namespace: kube-system

Example

eks_managed_node_groups = {
  windows = {
    min_size          = 1
    desired_size      = 1
    max_size          = 5
    platform          = "windows"
    ami_type          = "WINDOWS_CORE_2019_x86_64"
    capacity_type     = "SPOT"
    enable_monitoring = true
    disk_size         = "100"
    use_name_prefix   = true
    cluster_version   = var.aws_eks_cluster_version
    instance_types    = ["m5d.xlarge", "m5ad.xlarge"]
    taints = [
      {
        key    = "os"
        value  = "windows"
        effect = "NO_SCHEDULE"
      }
    ]
  },
},

I'm following this example but the vpc-admission controller is not created. I see the AmazonEKSVPCResourceController role on the clusterrole that was created.

Am I missing something else?

robertobandini commented 1 year ago

Hi, I want to thank @sebas-w and @aamoctz, i was facing the same problems.

I started from version 18.31.2, already having Linux managed node groups, EKS 1.22, platform version eks.10." Then I set the AWS Terraform provider to 4.48 version and I created the amazon-vpc-cni configMap.

resource "kubernetes_config_map" "amazon_vpc_cni" {
  metadata {
    name      = "amazon-vpc-cni"
    namespace = "kube-system"
  }

  data = {
    enable-windows-ipam = "true"
  }
}

In the definition of the node group I just specified the platform and the ami:

myManagedNodeGroup =  {
      name         = "my-managed-node-group"
      platform     = "windows"
      ami_type     = "WINDOWS_CORE_2019_x86_64"
      ...
}

The node group was created, then I made changes to the module that builds EKS to correctly update the auth-conf configMap. I then later saw that @aamoctz has already proposed them here: https://github.com/terraform-aws-modules/terraform-aws-eks/pull/2477

In main.tf

 ...
 node_iam_role_arns_non_windows = distinct(
    compact(
      concat(
        [for group in module.eks_managed_node_group : group.iam_role_arn if group.platform != "windows"],
        [for group in module.self_managed_node_group : group.iam_role_arn if group.platform != "windows"],
        var.aws_auth_node_iam_role_arns_non_windows,
      )
    )
  )

  node_iam_role_arns_windows = distinct(
    compact(
      concat(
        [for group in module.eks_managed_node_group : group.iam_role_arn if group.platform == "windows"],
        [for group in module.self_managed_node_group : group.iam_role_arn if group.platform == "windows"],
        var.aws_auth_node_iam_role_arns_windows,
      )
    )
  )
  ...

In modules/eks-managed-node-group/outputs.tf

output "platform" {
  description = "Identifies if the OS platform is `bottlerocket`, `linux`, or `windows` based"
  value       = var.platform
}

If it can be useful I add that to avoid the "failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address" error when scheduling a pod it is also important to set the appropriate nodeSelector:

nodeSelector:
     kubernetes.io/os: windows

I confirm that in this way I was able to correctly create a Windows node group, apply a test deployment and automatically scale the replicas and therefore the number of nodes.

Surely as soon as the module supports the mentioned modifications it will be very useful.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

davidedmondsMPG commented 1 year ago

Is there anything that can be done to help get the associated PR reviewed and merged? It looks like it should solve this issue, which is a reasonably big impediment to working with working with windows nodes in EKS.

mlschindler commented 1 year ago

Bump for updates... Can we get this PR merged?

Is there anything that can be done to help get the associated PR reviewed and merged? It looks like it should solve this issue, which is a reasonably big impediment to working with working with windows nodes in EKS.

bryantbiggs commented 1 year ago

https://github.com/terraform-aws-modules/terraform-aws-eks/pull/2477#issuecomment-1570706923

mlschindler commented 1 year ago

With the merge of #2477 does this make it possible to have the module provision EKS managed windows nodes?

bryantbiggs commented 1 year ago

you can deploy Windows nodes with this module - but you will need to use the default launch template provided by EKS or provide your own launch template or user data when using a custom launch template. As I stated here, #2477 only addresses one small part of this, which is maintaining the IAM role mapping in the aws-auth configmap

The Windows node support currently does not match that of AL2 and Bottlerocket in terms of native custom launch template and user data support

antonbabenko commented 8 months ago

This issue has been resolved in version 20.0.0 :tada:

github-actions[bot] commented 7 months ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.