sighupio / furyctl

furyctl is the KFD (Kubernetes Fury Distribution) lifecycle manager
https://sighup.io
Apache License 2.0
33 stars 4 forks source link

Cannot schedule `coredns` pods #279

Open g-iannelli opened 1 year ago

g-iannelli commented 1 year ago
      - name: infra
        # This optional map defines a different AMI to use for the instances
        #ami:
        #  id: ami-0123456789abcdef0
        #  owner: "123456789012"
        # This map defines the max and min number of nodes in the nodepool autoscaling group
        size:
          min: 3
          max: 3
        # This map defines the characteristics of the instance that will be used in the node pool
        instance:
          # The instance type
          type: t3.xlarge
          spot: true
          # The instance disk size in GB
          volumeSize: 50
        # This optional array defines additional target groups to attach to the instances in the node pool
        #attachedTargetGroups:
        #  - arn:aws:elasticloadbalancing:eu-west-1:123456789012:targetgroup/example-external-nginx/0123456789abcdee
        #  - arn:aws:elasticloadbalancing:eu-west-1:123456789012:targetgroup/example-internal-nginx/0123456789abcdef
        # Kubernetes labels that will be added to the nodes
        labels:
          nodepool: infra
          node.kubernetes.io/role: infra
        # Kubernetes taints that will be added to the nodes
        taints:
          - node.kubernetes.io/role=infra:NoSchedule
        # AWS tags that will be added to the ASG and EC2 instances, the example shows the labels needed by cluster autoscaler
        tags:
          k8s.io/cluster-autoscaler/node-template/label/nodepool: "infra"
          k8s.io/cluster-autoscaler/node-template/label/node.kubernetes.io/role: "infra"
      - name: ingress
        # This optional map defines a different AMI to use for the instances
        #ami:
        #  id: ami-0123456789abcdef0
        #  owner: "123456789012"
        # This map defines the max and min number of nodes in the nodepool autoscaling group
        size:
          min: 2
          max: 2
        # This map defines the characteristics of the instance that will be used in the node pool
        instance:
          # The instance type
          type: t3.micro
          # If the instance is a spot instance
          spot: true
          # The instance disk size in GB
          volumeSize: 20
        # This optional array defines additional target groups to attach to the instances in the node pool
        #attachedTargetGroups:
        #  - arn:aws:elasticloadbalancing:eu-west-1:123456789012:targetgroup/example-external-nginx/0123456789abcdee
        #  - arn:aws:elasticloadbalancing:eu-west-1:123456789012:targetgroup/example-internal-nginx/0123456789abcdef
        # Kubernetes labels that will be added to the nodes
        labels:
          nodepool: ingress
          node.kubernetes.io/role: ingress
        # Kubernetes taints that will be added to the nodes
        taints:
          - node.kubernetes.io/role=ingress:NoSchedule
        # AWS tags that will be added to the ASG and EC2 instances, the example shows the labels needed by cluster autoscaler
        tags:
          k8s.io/cluster-autoscaler/node-template/label/nodepool: "ingress"
          k8s.io/cluster-autoscaler/node-template/label/node.kubernetes.io/role: "ingress"
      - name: default
        # This optional map defines a different AMI to use for the instances
        #ami:
        #  id: ami-0123456789abcdef0
        #  owner: "123456789012"
        # This map defines the max and min number of nodes in the nodepool autoscaling group
        size:
          min: 0
          max: 3
        # This map defines the characteristics of the instance that will be used in the node pool
        instance:
          # The instance type
          type: t3.xlarge
          spot: true
          # The instance disk size in GB
          volumeSize: 50
        # This optional array defines additional target groups to attach to the instances in the node pool
        #attachedTargetGroups:
        #  - arn:aws:elasticloadbalancing:eu-west-1:123456789012:targetgroup/example-external-nginx/0123456789abcdee
        #  - arn:aws:elasticloadbalancing:eu-west-1:123456789012:targetgroup/example-internal-nginx/0123456789abcdef
        # Kubernetes labels that will be added to the nodes
        labels:
          nodepool: default
          node.kubernetes.io/role: default
        # Kubernetes taints that will be added to the nodes
        taints:
          - node.kubernetes.io/role=default:NoSchedule
        # AWS tags that will be added to the ASG and EC2 instances, the example shows the labels needed by cluster autoscaler
        tags:
          k8s.io/cluster-autoscaler/node-template/label/nodepool: "default"
          k8s.io/cluster-autoscaler/node-template/label/node.kubernetes.io/role: "default"

When create a cluster with the default nodeppol with min 0 nodes and the other nodepools are tainted as shown above, it's not possible to schedule coredns and cluster installation fails.

QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 CriticalAddonsOnly op=Exists
                             node-role.kubernetes.io/master:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  44m                    default-scheduler  no nodes available to schedule pods
  Warning  FailedScheduling  39m                    default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/role: ingress}. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  14m (x5 over 34m)      default-scheduler  0/3 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/role: infra}, 2 Too many pods, 2 node(s) had untolerated taint {node.kubernetes.io/role: ingress}. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  4m16s (x2 over 9m16s)  default-scheduler  0/5 nodes are available: 2 Too many pods, 2 node(s) had untolerated taint {node.kubernetes.io/role: ingress}, 3 node(s) had untolerated taint {node.kubernetes.io/role: infra}. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling.
g-iannelli commented 1 year ago

Removing the taints from the nodepool default. coredns` was scheduled on it.

for node in `kubectl get po -n kube-system -ojsonpath='{.items[*].spec.nodeName}' -l k8s-app=kube-dns`; do kubectl get node $node -L node.kubernetes.io/role;done
NAME                                         STATUS   ROLES    AGE   VERSION                ROLE
ip-10-10-21-105.eu-west-1.compute.internal   Ready    <none>   76m   v1.24.10-eks-48e63af   default
NAME                                         STATUS   ROLES    AGE   VERSION                ROLE
ip-10-10-21-105.eu-west-1.compute.internal   Ready    <none>   76m   v1.24.10-eks-48e63af   default