kubernetes / kops

Kubernetes Operations (kOps) - Production Grade k8s Installation, Upgrades and Management
https://kops.sigs.k8s.io/
Apache License 2.0
15.97k stars 4.65k forks source link

Kops cluster upgrade from 1.28.7 to 1.29.2 - warmpool instances join cluster and remain in notReady state #16871

Open denihot opened 1 month ago

denihot commented 1 month ago

/kind bug

1. What kops version are you running? The command kops version, will display this information. 1.29.2 (git-v1.29.2) 2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag. v1.29.6 3. What cloud provider are you using? AWS 4. What commands did you run? What is the simplest way to reproduce this issue? After editing kops config with the new k8s version I ran the following commands: kops get assets --copy --state $KOPS_REMOTE_STATE kops update cluster $CLUSTER_NAME --state $KOPS_REMOTE_STATE --allow-kops-downgrade kops update cluster $CLUSTER_NAME --yes --state $KOPS_REMOTE_STATE kops rolling-update cluster $CLUSTER_NAME --state $KOPS_REMOTE_STATE kops rolling-update cluster $CLUSTER_NAME --yes --state $KOPS_REMOTE_STATE --post-drain-delay 75s --drain-timeout 30m

5. What happened after the commands executed?

The cluster initiation of the upgrade went smoothly. The master nodes were successfully updated; however, an issue arose during the update process of the warmPool autoscaling groups. The update became stuck as instances were being added to the cluster instead of simply undergoing warming up and subsequent powering off.

The following error was appearing in the kops update logs:

I1002 12:02:19.415658 31 instancegroups.go:565] Cluster did not pass validation, will retry in "30s": node "i-04b854ec78e845f96" of role "node" is not ready, system-node-critical pod "aws-node-4chll" is pending, system-node-critical pod "ebs-csi-node-wcz74" is pending, system-node-critical pod "efs-csi-node-7q2j8" is pending, system-node-critical pod "kube-proxy-i-04b854ec78e845f96" is pending, system-node-critical pod "node-local-dns-mdvq7" is pending.

Those nodes in the Kubernetes cluster were displayed as 'NotReady,SchedulingDisabled' when using the 'kubectl get nodes' command. I waited for 10 minutes, but there was no progress. Subsequently, I resorted to manually deleting the problematic nodes. This action successfully resolved the issue, allowing the cluster upgrade process to resume smoothly.

After completing the upgrade, I conducted another test by manually removing warmed-up nodes from the AWS console. This action led to the creation of new warmup nodes, which were subsequently added to the k8s cluster. These newly added nodes remained in a 'NotReady, SchedulingDisabled' state until I removed them manually.

Autoscaler logs for one of those nodes:

1002 13:02:34.149584 1 pre_filtering_processor.go:57] Node i-0cfcda3548f955e05 should not be processed by cluster autoscaler (no node group config)

And the relevant log line from the kops-controler:

E1002 13:02:10.796429 1 controller.go:329] "msg"="Reconciler error" "error"="error identifying node \"i-0cfcda3548f955e05\": found instance \"i-0cfcda3548f955e05\", but state is \"stopped\"" "Node"={"name":"i-0cfcda3548f955e05"} "controller"="node" "controllerGroup"="" "controllerKind"="Node" "name"="i-0cfcda3548f955e05" "namespace"="" "reconcileID"="b532008b-db8f-4273-90ad-f0bf9d40858c"

Also kube-system pods are pending to be created on those nodes for some reason:

NAMESPACE     NAME                                        READY   STATUS              RESTARTS   AGE
kube-system   aws-node-2dflq                              0/2     Init:0/1            0          52m
kube-system   aws-node-58x6z                              0/2     Init:0/1            0          46m
kube-system   aws-node-cmdrr                              0/2     Init:0/1            0          54m
kube-system   aws-node-sw7dv                              0/2     Init:0/1            0          50m
kube-system   ebs-csi-node-fbg7j                          0/3     ContainerCreating   0          50m
kube-system   ebs-csi-node-k5nx5                          0/3     ContainerCreating   0          52m
kube-system   ebs-csi-node-l82xf                          0/3     ContainerCreating   0          48m
kube-system   ebs-csi-node-qfg4w                          0/3     ContainerCreating   0          54m
kube-system   ebs-csi-node-ws7j2                          0/3     ContainerCreating   0          46m
kube-system   efs-csi-node-dwk4s                          0/3     ContainerCreating   0          46m
kube-system   efs-csi-node-g5bq8                          0/3     ContainerCreating   0          52m
kube-system   efs-csi-node-qg5qb                          0/3     ContainerCreating   0          54m
kube-system   efs-csi-node-tgcxj                          0/3     ContainerCreating   0          50m
kube-system   kube-proxy-i-0480ae46ad3230afc              0/1     Terminating         0          52m
kube-system   kube-proxy-i-04bb59a89abc8b937              0/1     Terminating         0          50m
kube-system   kube-proxy-i-0742a7e208af5b1ac              0/1     Terminating         0          46m
kube-system   kube-proxy-i-0ae3c43b10efef605              0/1     Terminating         0          54m
kube-system   node-local-dns-77r8p                        0/1     ContainerCreating   0          52m
kube-system   node-local-dns-tlcwg                        0/1     ContainerCreating   0          54m
kube-system   node-local-dns-vc4z2                        0/1     ContainerCreating   0          50m

6. What did you expect to happen? I anticipate the warmup nodes to be activated and subsequently shut down without being integrated into the cluster.

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: null
  generation: 4
  name: develop.company.com
spec:
  api:
    loadBalancer:
      class: Network
      sslCertificate: arn:aws:acm:eu-west-1:1234:certificate/1111
      type: Internal
  assets:
    containerProxy: public.ecr.aws/12344
    fileRepository: https://bucket.s3.eu-west-1.amazonaws.com/
  authentication:
    aws: {}
  authorization:
    rbac: {}
  certManager:
    defaultIssuer: selfsigned
    enabled: true
  channel: stable
  cloudLabels:
    Prometheus: "true"
    aws-region: eu-west-1
  cloudProvider: aws
  configBase: s3://tf-remotestate-eu-west-1-123456/kops/develop.company.com
  dnsZone: ###
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - instanceGroup: master-eu-west-1a
      name: eu-west-1a
    - instanceGroup: master-eu-west-1b
      name: eu-west-1b
    - instanceGroup: master-eu-west-1c
      name: eu-west-1c
    manager:
      env:
      - name: ETCD_LISTEN_METRICS_URLS
        value: http://0.0.0.0:8081
      - name: ETCD_METRICS
        value: basic
    memoryRequest: 100Mi
    name: main
    version: 3.4.13
  - cpuRequest: 100m
    etcdMembers:
    - instanceGroup: master-eu-west-1a
      name: eu-west-1a
    - instanceGroup: master-eu-west-1b
      name: eu-west-1b
    - instanceGroup: master-eu-west-1c
      name: eu-west-1c
    memoryRequest: 100Mi
    name: events
    version: 3.4.13
  externalPolicies:
    master:
    - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
    node:
    - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
    - arn:aws:iam::1234:policy/nodes-extra.develop.company.com
  fileAssets:
  - content: |
      # https://raw.githubusercontent.com/kubernetes/website/master/content/en/examples/audit/audit-policy.yaml
      apiVersion: audit.k8s.io/v1 # This is required.
      kind: Policy
      # Don't generate audit events for all requests in RequestReceived stage.
      omitStages:
        - "RequestReceived"
      rules:
        # Log pod changes at RequestResponse level
        - level: RequestResponse
          resources:
          - group: ""
            # Resource "pods" doesn't match requests to any subresource of pods,
            # which is consistent with the RBAC policy.
            resources: ["pods"]
        # Log "pods/log", "pods/status" at Metadata level
        - level: Metadata
          resources:
          - group: ""
            resources: ["pods/log", "pods/status"]
        # Don't log requests to a configmap called "controller-leader"
        - level: None
          resources:
          - group: ""
            resources: ["configmaps"]
            resourceNames: ["controller-leader"]
        # Don't log watch requests by the "system:kube-proxy" on endpoints or services
        - level: None
          users: ["system:kube-proxy"]
          verbs: ["watch"]
          resources:
          - group: "" # core API group
            resources: ["endpoints", "services"]
        # Don't log authenticated requests to certain non-resource URL paths.
        - level: None
          userGroups: ["system:authenticated"]
          nonResourceURLs:
          - "/api*" # Wildcard matching.
          - "/version"
        # Log the request body of configmap changes in kube-system.
        - level: Request
          resources:
          - group: "" # core API group
            resources: ["configmaps"]
          # This rule only applies to resources in the "kube-system" namespace.
          # The empty string "" can be used to select non-namespaced resources.
          namespaces: ["kube-system"]
        # Log configmap and secret changes in all other namespaces at the Metadata level.
        - level: Metadata
          resources:
          - group: "" # core API group
            resources: ["secrets", "configmaps"]
        # Log all other resources in core and extensions at the Request level.
        - level: Request
          resources:
          - group: "" # core API group
          - group: "extensions" # Version of group should NOT be included.
        # A catch-all rule to log all other requests at the Metadata level.
        - level: Metadata
          # Long-running requests like watches that fall under this rule will not
          # generate an audit event in RequestReceived.
          omitStages:
            - "RequestReceived"
    name: kubernetes-audit.yaml
    path: /srv/kubernetes/assets/audit.yaml
    roles:
    - Master
  iam:
    allowContainerRegistry: true
    legacy: false
    serviceAccountExternalPermissions:
    - aws:
        policyARNs:
        - arn:aws:iam::1234:policy/dub-company-aws-efs-csi-driver
      name: efs-csi-controller-sa
      namespace: kube-system
    - aws:
        policyARNs:
        - arn:aws:iam::1234:policy/dub-company-aws-lb-controller
      name: aws-lb-controller-aws-load-balancer-controller
      namespace: kube-system
    - aws:
        policyARNs:
        - arn:aws:iam::1234:policy/dub-company-cluster-autoscaler
      name: cluster-autoscaler-aws-cluster-autoscaler
      namespace: kube-system
  kubeAPIServer:
    authenticationTokenWebhookConfigFile: /srv/kubernetes/aws-iam-authenticator/kubeconfig.yaml
    runtimeConfig:
      autoscaling/v2beta1: "true"
  kubeControllerManager:
    horizontalPodAutoscalerCpuInitializationPeriod: 20s
    horizontalPodAutoscalerDownscaleDelay: 5m0s
    horizontalPodAutoscalerDownscaleStabilization: 5m0s
    horizontalPodAutoscalerInitialReadinessDelay: 20s
    horizontalPodAutoscalerSyncPeriod: 5s
    horizontalPodAutoscalerTolerance: 100m
    horizontalPodAutoscalerUpscaleDelay: 3m0s
  kubeDNS:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kops.k8s.io/instancegroup
              operator: In
              values:
              - workers-misc
    externalCoreFile: |
      amazonaws.com:53 {
            errors
            log . {
                class denial error
            }
            health :8084
            prometheus :9153
            forward . 169.254.169.253 {
            }
            cache 30
        }
        .:53 {
            errors
            health :8080
            ready :8181
            autopath @kubernetes
            kubernetes cluster.local {
                pods verified
                fallthrough in-addr.arpa ip6.arpa
            }
            prometheus :9153
            forward . 169.254.169.253
            cache 300
        }
    nodeLocalDNS:
      cpuRequest: 25m
      enabled: true
      memoryRequest: 5Mi
    provider: CoreDNS
    tolerations:
    - effect: NoSchedule
      operator: Exists
  kubeProxy:
    metricsBindAddress: 0.0.0.0
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
    maxPods: 35
    resolvConf: /etc/resolv.conf
  kubernetesApiAccess:
  - 10.0.0.0/8
  kubernetesVersion: 1.29.6
  masterPublicName: api.develop.company.com
  networkCIDR: 10.0.128.0/20
  networkID: vpc-1234
  networking:
    amazonvpc:
      env:
      - name: WARM_IP_TARGET
        value: "5"
      - name: MINIMUM_IP_TARGET
        value: "8"
      - name: DISABLE_METRICS
        value: "true"
  nonMasqueradeCIDR: 100.64.0.0/10
  podIdentityWebhook:
    enabled: true
  rollingUpdate:
    maxSurge: 100%
  serviceAccountIssuerDiscovery:
    discoveryStore: s3://infra-eu-west-1-discovery
    enableAWSOIDCProvider: true
  sshAccess:
  - 10.0.0.0/8
  subnets:
  - cidr: 10.0.128.0/22
    id: subnet-123
    name: eu-west-1a
    type: Private
    zone: eu-west-1a
  - cidr: 10.0.132.0/22
    id: subnet-123
    name: eu-west-1b
    type: Private
    zone: eu-west-1b
  - cidr: 10.0.136.0/22
    id: subnet-132
    name: eu-west-1c
    type: Private
    zone: eu-west-1c
  - cidr: 10.0.140.0/24
    id: subnet-1123
    name: utility-eu-west-1a
    type: Utility
    zone: eu-west-1a
  - cidr: 10.0.141.0/24
    id: subnet-132
    name: utility-eu-west-1b
    type: Utility
    zone: eu-west-1b
  - cidr: 10.0.142.0/24
    id: subnet-123
    name: utility-eu-west-1c
    type: Utility
    zone: eu-west-1c
  topology:
    dns:
      type: Public

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-10-02T10:12:50Z"
  labels:
    kops.k8s.io/cluster: develop.company.com
  name: master-eu-west-1a
spec:
  additionalSecurityGroups:
  - sg-1234
  cloudLabels:
    k8s.io/cluster-autoscaler/develop.company.com: ""
    k8s.io/cluster-autoscaler/disabled: ""
    k8s.io/cluster-autoscaler/master-template/label: ""
  image: ami-09634b5569ee59efb
  machineType: t3.large
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: masters
    kops.k8s.io/spotinstance: "false"
    on-demand: "true"
  role: Master
  rootVolumeType: gp3
  subnets:
  - eu-west-1a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-10-02T10:12:50Z"
  labels:
    kops.k8s.io/cluster: develop.company.com
  name: master-eu-west-1b
spec:
  additionalSecurityGroups:
  - sg-123
  cloudLabels:
    k8s.io/cluster-autoscaler/develop.company.com: ""
    k8s.io/cluster-autoscaler/disabled: ""
    k8s.io/cluster-autoscaler/master-template/label: ""
  image: ami-09634b5569ee59efb
  machineType: t3.large
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: masters
    kops.k8s.io/spotinstance: "false"
    on-demand: "true"
  role: Master
  rootVolumeType: gp3
  subnets:
  - eu-west-1b

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-10-02T10:12:51Z"
  labels:
    kops.k8s.io/cluster: develop.company.com
  name: master-eu-west-1c
spec:
  additionalSecurityGroups:
  - sg-123
  cloudLabels:
    k8s.io/cluster-autoscaler/develop.company.com: ""
    k8s.io/cluster-autoscaler/disabled: ""
    k8s.io/cluster-autoscaler/master-template/label: ""
  image: ami-09634b5569ee59efb
  machineType: t3.large
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: masters
    kops.k8s.io/spotinstance: "false"
    on-demand: "true"
  role: Master
  rootVolumeType: gp3
  subnets:
  - eu-west-1c

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-10-02T10:12:51Z"
  generation: 2
  labels:
    kops.k8s.io/cluster: develop.company.com
  name: workers-app
spec:
  additionalSecurityGroups:
  - sg-132
  - sg-3322
  additionalUserData:
  - content: |
      #!/bin/bash
      echo "Starting additionalUserData"
      echo "This script will execute before nodeup.sh because cloud-init executes scripts in alphabetic order by name"
      export DEBIAN_FRONTEND=noninteractive
      apt-get update
      # Install some tools
      apt install -y nfs-common   # Required to make EFS volume mount
      apt install -y containerd   # Required for nerdctl to work, container not installed until nodeup runs
      echo $(containerd --version)
      wget https://github.com/containerd/nerdctl/releases/download/v1.7.2/nerdctl-1.7.2-linux-amd64.tar.gz -O /tmp/nerdctl.tar.gz
      tar -C /usr/local/bin/ -xzf /tmp/nerdctl.tar.gz
      echo $(nerdctl version)
      apt install -y awscli
      echo $(aws --version)
      # Get some temporary aws ecr credentials
      DOCKER_PASSWORD=$(aws ecr get-login-password --region eu-west-1)
      DOCKER_USER=AWS
      DOCKER_REGISTRY=1234.dkr.ecr.eu-west-1.amazonaws.com
      PASSWD=$(echo "$DOCKER_USER:$DOCKER_PASSWORD" | tr -d '\n' | base64 -i -w 0)
      CONFIG="\
        {\n
            \"auths\": {\n
                \"$DOCKER_REGISTRY\": {\n
                    \"auth\": \"$PASSWD\"\n
                }\n
            }\n
        }\n"
      mkdir -p ~/.docker
      printf "$CONFIG" > ~/.docker/config.json
      echo "Pulling images from ECR"
      nerdctl pull --namespace k8s.io 1234.dkr.ecr.eu-west-1.amazonaws.com/fluent-bit:2.2.2
      nerdctl pull --namespace k8s.io 1234.dkr.ecr.eu-west-1.amazonaws.com/nginx-prometheus-exporter:0.9.0
      nerdctl pull --namespace k8s.io public.ecr.aws/1234545/dns/k8s-dns-node-cache:1.23.0
      nerdctl pull --namespace k8s.io public.ecr.aws/1234545/amazon-k8s-cni-init:v1.18.1
      nerdctl pull --namespace k8s.io public.ecr.aws/1234545/amazon-k8s-cni:v1.18.1
      nerdctl pull --namespace k8s.io public.ecr.aws/1234545/kube-proxy:v1.28.11
      nerdctl pull --namespace k8s.io public.ecr.aws/1234545/ebs-csi-driver/aws-ebs-csi-driver:v1.30.0
      nerdctl pull --namespace k8s.io public.ecr.aws/1234545/eks-distro/kubernetes-csi/node-driver-registrar:v2.10.0-eks-1-29-5
      nerdctl pull --namespace k8s.io public.ecr.aws/1234545/kubernetes-csi/livenessprobe:v2.12.0-eks-1-29-5
      echo "Remove and unmask containerd so it can be reinstalled by nodeup and configured how it wants it."
      apt remove -y containerd
      systemctl unmask containerd
      echo "Finishing additionalUserData"
    name: all-images.sh
    type: text/x-shellscript
  cloudLabels:
    k8s.io/cluster-autoscaler/develop.company.com: ""
    k8s.io/cluster-autoscaler/enabled: ""
    k8s.io/cluster-autoscaler/node-template/label: ""
  image: ami-09634b5569ee59efb
  instanceMetadata:
    httpPutResponseHopLimit: 1
    httpTokens: required
  machineType: c5.18xlarge
  maxSize: 10
  minSize: 1
  nodeLabels:
    Environment: company-develop
    Group: company-develop-app
    Name: company-develop-infra-app
    Service: company
    kops.k8s.io/instancegroup: workers-app
    kops.k8s.io/spotinstance: "false"
    on-demand: "true"
  role: Node
  rootVolumeType: gp3
  subnets:
  - eu-west-1a
  - eu-west-1b
  - eu-west-1c
  suspendProcesses:
  - AZRebalance
  warmPool:
    enableLifecycleHook: true
    maxSize: 10
    minSize: 5

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?

denihot commented 1 month ago

Hi,

I attempted to troubleshoot the issue by performing the following steps:

aramhakobyan commented 1 month ago

We have the same issue!!!

aramhakobyan commented 2 weeks ago

Hi @hakman, @johngmyers

sorry for the direct message, just last time you helped to solve the issue quickly :).

We are heavily relying on Kops(having 40+ clusters) and using Warmpool. In the recent releases of 1.29 the Warpools have been changed with the following PRs, which brought the mentioned issue.

Would appreciate to take a look and fix them! If there is any way we can support you in making it happen quickly, please let us know.