rancher / rke

Rancher Kubernetes Engine (RKE), an extremely simple, lightning fast Kubernetes distribution that runs entirely within containers.
Apache License 2.0
3.22k stars 583 forks source link

Cannot create new control node after deleting a control node. #3336

Closed tfon23 closed 11 months ago

tfon23 commented 1 year ago

RKE version: release v1.4.5 RKE1

Docker version: (docker version,docker info preferred) 20.10

Operating system and kernel: (cat /etc/os-release, uname -r preferred) Ubuntu 22 Canonical

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO) Azure Virtual Machine

cluster.yml file:

apiVersion: management.cattle.io/v3
kind: Cluster
metadata:
  annotations:
    authz.management.cattle.io/creator-role-bindings: '{"created":["cluster-owner"],"required":["cluster-owner"]}'
    field.cattle.io/creatorId: user-x9xns
    field.cattle.io/overwriteAppAnswers: >-
      {"answers":{"operator-init.enabled":"true","exporter-node.enabled":"true","exporter-node.ports.metrics.port":"9796","exporter-kubelets.https":"true","exporter-node.resources.limits.cpu":"200m","exporter-node.resources.limits.memory":"200Mi","operator.resources.limits.memory":"500Mi","prometheus.retention":"12h","grafana.persistence.enabled":"true","prometheus.persistence.enabled":"false","prometheus.persistence.storageClass":"default","grafana.persistence.storageClass":"default","grafana.persistence.size":"10Gi","prometheus.persistence.size":"50Gi","prometheus.resources.core.requests.cpu":"750m","prometheus.resources.core.limits.cpu":"1000m","prometheus.resources.core.requests.memory":"750Mi","prometheus.resources.core.limits.memory":"1000Mi","prometheus.persistent.useReleaseName":"true"},"version":"0.2.0"}
    lifecycle.cattle.io/create.cluster-agent-controller-cleanup: 'true'
    lifecycle.cattle.io/create.cluster-provisioner-controller: 'true'
    lifecycle.cattle.io/create.cluster-scoped-gc: 'true'
    lifecycle.cattle.io/create.mgmt-cluster-rbac-remove: 'true'
    networking.management.cattle.io/enable-network-policy: 'true'
    provisioner.cattle.io/encrypt-migrated: 'true'
    provisioner.cattle.io/ke-driver-update: updated
  creationTimestamp: '2023-06-20T18:38:51Z'
  finalizers:
    - controller.cattle.io/cluster-agent-controller-cleanup
    - controller.cattle.io/cluster-scoped-gc
    - controller.cattle.io/cluster-provisioner-controller
    - controller.cattle.io/mgmt-cluster-rbac-remove
    - wrangler.cattle.io/mgmt-cluster-remove
  generateName: c-
  generation: 14210
  labels:
    cattle.io/creator: norman
    fleet.cattle.io/cluster-name: c-m6grn
    provider.cattle.io: rke
    rancher.cattle.io/claimed-by-name: c-m6grn
    rancher.cattle.io/claimed-by-namespace: fleet-default
  managedFields:
    - apiVersion: management.cattle.io/v3
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:labels:
            f:fleet.cattle.io/cluster-name: {}
            f:rancher.cattle.io/claimed-by-name: {}
            f:rancher.cattle.io/claimed-by-namespace: {}
      manager: rancher-operator
      operation: Update
      time: '2021-12-14T15:09:04Z'
    - apiVersion: management.cattle.io/v3
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            f:authz.management.cattle.io/creator-role-bindings: {}
            f:field.cattle.io/overwriteAppAnswers: {}
            f:lifecycle.cattle.io/create.cluster-agent-controller-cleanup: {}
            f:lifecycle.cattle.io/create.cluster-provisioner-controller: {}
            f:lifecycle.cattle.io/create.cluster-scoped-gc: {}
            f:lifecycle.cattle.io/create.mgmt-cluster-rbac-remove: {}
            f:networking.management.cattle.io/enable-network-policy: {}
            f:provisioner.cattle.io/encrypt-migrated: {}
            f:provisioner.cattle.io/ke-driver-update: {}
          f:finalizers:
            .: {}
            v:"wrangler.cattle.io/mgmt-cluster-remove": {}
          f:labels:
            f:provider.cattle.io: {}
        f:spec: {}
        f:status:
          .: {}
          f:agentImage: {}
          f:allocatable:
            f:cpu: {}
            f:memory: {}
            f:pods: {}
          f:appliedAgentEnvVars: {}
          f:appliedSpec:
            f:clusterSecrets:
              f:aadClientSecret: {}
            f:clusterTemplateRevisionName: {}
            f:rancherKubernetesEngineConfig:
              f:cloudProvider:
                f:azureCloudProvider:
                  f:loadBalancerSku: {}
              f:enableCriDockerd: {}
              f:ingress:
                f:defaultIngressClass: {}
              f:kubernetesVersion: {}
              f:monitoring:
                f:provider: {}
                f:replicas: {}
              f:nodes: {}
              f:services:
                f:kubeApi:
                  f:serviceNodePortRange: {}
              f:systemImages:
                f:aciCniDeployContainer: {}
                f:aciControllerContainer: {}
                f:aciGbpServerContainer: {}
                f:aciHostContainer: {}
                f:aciMcastContainer: {}
                f:aciOpflexContainer: {}
                f:aciOpflexServerContainer: {}
                f:aciOvsContainer: {}
                f:alpine: {}
                f:calicoCni: {}
                f:calicoControllers: {}
                f:calicoCtl: {}
                f:calicoFlexVol: {}
                f:calicoNode: {}
                f:canalCni: {}
                f:canalControllers: {}
                f:canalFlannel: {}
                f:canalFlexVol: {}
                f:canalNode: {}
                f:certDownloader: {}
                f:coredns: {}
                f:corednsAutoscaler: {}
                f:dnsmasq: {}
                f:etcd: {}
                f:ingress: {}
                f:kubedns: {}
                f:kubednsAutoscaler: {}
                f:kubednsSidecar: {}
                f:kubernetes: {}
                f:kubernetesServicesSidecar: {}
                f:metricsServer: {}
                f:nginxProxy: {}
                f:nodelocal: {}
                f:podInfraContainer: {}
                f:windowsPodInfraContainer: {}
          f:capacity:
            f:cpu: {}
            f:memory: {}
            f:pods: {}
          f:certificatesExpiration:
            f:kube-apiserver:
              f:expirationDate: {}
            f:kube-etcd-10-217-43-4:
              f:expirationDate: {}
            f:kube-etcd-10-217-43-5:
              f:expirationDate: {}
          f:conditions: {}
          f:failedSpec:
            .: {}
            f:agentImageOverride: {}
            f:answers: {}
            f:clusterSecrets:
              .: {}
              f:aadClientSecret: {}
            f:clusterTemplateName: {}
            f:clusterTemplateRevisionName: {}
            f:description: {}
            f:desiredAgentImage: {}
            f:desiredAuthImage: {}
            f:displayName: {}
            f:dockerRootDir: {}
            f:enableClusterAlerting: {}
            f:enableClusterMonitoring: {}
            f:enableNetworkPolicy: {}
            f:fleetWorkspaceName: {}
            f:internal: {}
            f:localClusterAuthEndpoint:
              .: {}
              f:enabled: {}
            f:rancherKubernetesEngineConfig:
              .: {}
              f:addonJobTimeout: {}
              f:authentication:
                .: {}
                f:strategy: {}
              f:authorization: {}
              f:bastionHost: {}
              f:cloudProvider:
                .: {}
                f:azureCloudProvider:
                  .: {}
                  f:aadClientCertPassword: {}
                  f:aadClientCertPath: {}
                  f:aadClientId: {}
                  f:aadClientSecret: {}
                  f:cloud: {}
                  f:cloudProviderBackoff: {}
                  f:cloudProviderBackoffDuration: {}
                  f:cloudProviderBackoffExponent: {}
                  f:cloudProviderBackoffJitter: {}
                  f:cloudProviderBackoffRetries: {}
                  f:cloudProviderRateLimit: {}
                  f:cloudProviderRateLimitBucket: {}
                  f:cloudProviderRateLimitQPS: {}
                  f:loadBalancerSku: {}
                  f:location: {}
                  f:maximumLoadBalancerRuleCount: {}
                  f:primaryAvailabilitySetName: {}
                  f:primaryScaleSetName: {}
                  f:resourceGroup: {}
                  f:routeTableName: {}
                  f:securityGroupName: {}
                  f:subnetName: {}
                  f:subscriptionId: {}
                  f:tenantId: {}
                  f:useInstanceMetadata: {}
                  f:useManagedIdentityExtension: {}
                  f:vmType: {}
                  f:vnetName: {}
                  f:vnetResourceGroup: {}
                f:name: {}
              f:enableCriDockerd: {}
              f:ignoreDockerVersion: {}
              f:ingress:
                .: {}
                f:defaultBackend: {}
                f:defaultIngressClass: {}
                f:provider: {}
              f:kubernetesVersion: {}
              f:monitoring:
                .: {}
                f:provider: {}
                f:replicas: {}
              f:network:
                .: {}
                f:plugin: {}
              f:nodes: {}
              f:restore: {}
              f:rotateEncryptionKey: {}
              f:services:
                .: {}
                f:etcd:
                  .: {}
                  f:backupConfig:
                    .: {}
                    f:enabled: {}
                    f:intervalHours: {}
                    f:retention: {}
                    f:s3BackupConfig: {}
                    f:timeout: {}
                  f:creation: {}
                  f:extraArgs:
                    .: {}
                    f:election-timeout: {}
                    f:heartbeat-interval: {}
                  f:retention: {}
                  f:snapshot: {}
                f:kubeApi:
                  .: {}
                  f:serviceNodePortRange: {}
                f:kubeController:
                  .: {}
                  f:extraArgs:
                    .: {}
                    f:cluster-name: {}
                f:kubelet: {}
                f:kubeproxy: {}
                f:scheduler: {}
              f:sshAgentAuth: {}
              f:systemImages:
                .: {}
                f:aciCniDeployContainer: {}
                f:aciControllerContainer: {}
                f:aciGbpServerContainer: {}
                f:aciHostContainer: {}
                f:aciMcastContainer: {}
                f:aciOpflexContainer: {}
                f:aciOpflexServerContainer: {}
                f:aciOvsContainer: {}
                f:alpine: {}
                f:calicoCni: {}
                f:calicoControllers: {}
                f:calicoCtl: {}
                f:calicoFlexVol: {}
                f:calicoNode: {}
                f:canalCni: {}
                f:canalControllers: {}
                f:canalFlannel: {}
                f:canalFlexVol: {}
                f:canalNode: {}
                f:certDownloader: {}
                f:coredns: {}
                f:corednsAutoscaler: {}
                f:dnsmasq: {}
                f:etcd: {}
                f:flannel: {}
                f:flannelCni: {}
                f:ingress: {}
                f:ingressBackend: {}
                f:ingressWebhook: {}
                f:kubedns: {}
                f:kubednsAutoscaler: {}
                f:kubednsSidecar: {}
                f:kubernetes: {}
                f:kubernetesServicesSidecar: {}
                f:metricsServer: {}
                f:nginxProxy: {}
                f:nodelocal: {}
                f:podInfraContainer: {}
                f:weaveCni: {}
                f:weaveNode: {}
                f:windowsPodInfraContainer: {}
              f:upgradeStrategy:
                .: {}
                f:drain: {}
                f:maxUnavailableControlplane: {}
                f:maxUnavailableWorker: {}
                f:nodeDrainInput:
                  .: {}
                  f:gracePeriod: {}
                  f:ignoreDaemonSets: {}
                  f:timeout: {}
            f:scheduledClusterScan: {}
            f:windowsPreferedCluster: {}
          f:limits:
            f:cpu: {}
            f:memory: {}
          f:linuxWorkerCount: {}
          f:nodeCount: {}
          f:nodeVersion: {}
          f:requested:
            f:cpu: {}
            f:memory: {}
            f:pods: {}
      manager: rancher
      operation: Update
      time: '2023-08-11T18:09:20Z'
    - apiVersion: management.cattle.io/v3
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:field.cattle.io/creatorId: {}
          f:generateName: {}
          f:labels:
            .: {}
            f:cattle.io/creator: {}
        f:spec:
          f:clusterTemplateRevisionName: {}
          f:rancherKubernetesEngineConfig:
            f:cloudProvider:
              f:azureCloudProvider:
                f:loadBalancerSku: {}
            f:enableCriDockerd: {}
            f:ingress:
              f:defaultIngressClass: {}
            f:kubernetesVersion: {}
            f:monitoring:
              f:provider: {}
              f:replicas: {}
            f:services:
              f:kubeApi:
                f:serviceNodePortRange: {}
      manager: Go-http-client
      operation: Update
      time: '2023-08-14T20:10:23Z'
  name: c-m6grn
  resourceVersion: '21675733'
  uid: 98a4bcad-3afa-431c-ab02-1b565a757292
spec:
  agentImageOverride: ''
  answers: {}
  clusterSecrets:
    aadClientSecret: cluster-secret-5t89g
  clusterTemplateName: cattle-global-data:ct-n7k8j
  clusterTemplateRevisionName: cattle-global-data:ctr-fvv78
  description: ''
  desiredAgentImage: ''
  desiredAuthImage: ''
  displayName: infra
  dockerRootDir: /var/lib/docker
  enableClusterAlerting: false
  enableClusterMonitoring: false
  enableNetworkPolicy: true
  fleetWorkspaceName: fleet-default
  internal: false
  localClusterAuthEndpoint:
    enabled: false
  rancherKubernetesEngineConfig:
    addonJobTimeout: 45
    authentication:
      strategy: x509
    authorization: {}
    bastionHost: {}
    cloudProvider:
      azureCloudProvider:
        aadClientCertPassword: ''
        aadClientCertPath: ''
        aadClientId: IDGOESSHERE
        aadClientSecret: ''
        cloud: AzurePublicCloud
        cloudProviderBackoff: false
        cloudProviderBackoffDuration: 0
        cloudProviderBackoffExponent: 0
        cloudProviderBackoffJitter: 0
        cloudProviderBackoffRetries: 0
        cloudProviderRateLimit: false
        cloudProviderRateLimitBucket: 0
        cloudProviderRateLimitQPS: 0
        loadBalancerSku: standard
        location: eastus
        maximumLoadBalancerRuleCount: 0
        primaryAvailabilitySetName: ''
        primaryScaleSetName: ''
        resourceGroup: HRAO-CTR
        routeTableName: ''
        securityGroupName: ''
        subnetName: ''
        subscriptionId: SUBIDGOESHERE
        tenantId: TENANTIDGOESHERE
        useInstanceMetadata: true
        useManagedIdentityExtension: false
        vmType: ''
        vnetName: rmgmt-network
        vnetResourceGroup: HRAO-CTR
      name: azure
    enableCriDockerd: true
    ignoreDockerVersion: false
    ingress:
      defaultBackend: true
      defaultIngressClass: true
      provider: none
    kubernetesVersion: v1.24.13-rancher2-1
    monitoring:
      provider: metrics-server
      replicas: 1
    network:
      plugin: canal
    restore: {}
    rotateEncryptionKey: false
    services:
      etcd:
        backupConfig:
          enabled: false
          intervalHours: 12
          retention: 6
          s3BackupConfig: null
          timeout: 300
        creation: 12h
        extraArgs:
          election-timeout: '5000'
          heartbeat-interval: '500'
        retention: 72h
        snapshot: false
      kubeApi:
        serviceNodePortRange: 30000-32767
      kubeController:
        extraArgs:
          cluster-name: rke-infra
      kubelet: {}
      kubeproxy: {}
      scheduler: {}
    sshAgentAuth: false
    systemImages: {}
    upgradeStrategy:
      drain: false
      maxUnavailableControlplane: '1'
      maxUnavailableWorker: 10%%
      nodeDrainInput:
        gracePeriod: -1
        ignoreDaemonSets: true
        timeout: 120
  scheduledClusterScan: {}
  windowsPreferedCluster: false
status:
  agentFeatures:
    embedded-cluster-api: false
    fleet: false
    monitoringv1: false
    multi-cluster-management: false
    multi-cluster-management-agent: true
    provisioningv2: false
    rke2: false
  agentImage: rancher/rancher-agent:v2.6.13
  aksStatus:
    privateRequiresTunnel: null
    rbacEnabled: null
    upstreamSpec: null
  allocatable:
    cpu: '72'
    memory: 492741384Ki
    pods: '990'
  apiEndpoint: https://10.217.43.4:6443
  appliedAgentEnvVars:
    - name: CATTLE_SERVER_VERSION
      value: v2.6.13
    - name: CATTLE_INSTALL_UUID
      value: 43a554f6-a81f-46ca-8b6f-140d434ce530
    - name: CATTLE_INGRESS_IP_DOMAIN
      value: sslip.io
  appliedEnableNetworkPolicy: true
  appliedPodSecurityPolicyTemplateId: ''
  appliedSpec:
    agentImageOverride: ''
    answers: {}
    clusterSecrets:
      aadClientSecret: cluster-secret-5t89g
    clusterTemplateName: cattle-global-data:ct-n7k8j
    clusterTemplateRevisionName: cattle-global-data:ctr-fvv78
    description: ''
    desiredAgentImage: ''
    desiredAuthImage: ''
    displayName: infra
    dockerRootDir: /var/lib/docker
    enableClusterAlerting: false
    enableClusterMonitoring: false
    enableNetworkPolicy: true
    fleetWorkspaceName: fleet-default
    internal: false
    localClusterAuthEndpoint:
      enabled: false
    rancherKubernetesEngineConfig:
      addonJobTimeout: 45
      authentication:
        strategy: x509
      authorization: {}
      bastionHost: {}
      cloudProvider:
        azureCloudProvider:
          aadClientCertPassword: ''
          aadClientCertPath: ''
          aadClientId: IDGOESHERE
          aadClientSecret: ''
          cloud: AzurePublicCloud
          cloudProviderBackoff: false
          cloudProviderBackoffDuration: 0
          cloudProviderBackoffExponent: 0
          cloudProviderBackoffJitter: 0
          cloudProviderBackoffRetries: 0
          cloudProviderRateLimit: false
          cloudProviderRateLimitBucket: 0
          cloudProviderRateLimitQPS: 0
          loadBalancerSku: standard
          location: eastus
          maximumLoadBalancerRuleCount: 0
          primaryAvailabilitySetName: ''
          primaryScaleSetName: ''
          resourceGroup: HRAO-CTR
          routeTableName: ''
          securityGroupName: ''
          subnetName: ''
          subscriptionId: SUBIDGOESHERE
          tenantId: TENANTIDGOESHERE
          useInstanceMetadata: true
          useManagedIdentityExtension: false
          vmType: ''
          vnetName: rmgmt-network
          vnetResourceGroup: HRAO-CTR
        name: azure
      enableCriDockerd: true
      ignoreDockerVersion: false
      ingress:
        defaultBackend: true
        defaultIngressClass: true
        provider: none
      kubernetesVersion: v1.24.13-rancher2-1
      monitoring:
        provider: metrics-server
        replicas: 1
      network:
        plugin: canal
      nodes:
        - address: 10.217.43.4
          hostnameOverride: rke-infra-control-3
          labels:
            cattle.io/creator: norman
          nodeName: c-m6grn:m-5z9dq
          port: '22'
          role:
            - etcd
            - controlplane
          user: docker-user
        - address: 10.217.43.15
          hostnameOverride: rke-infra-worker-e8ads-6
          labels:
            cattle.io/creator: norman
          nodeName: c-m6grn:m-6jk5d
          port: '22'
          role:
            - worker
          user: docker-user
        - address: 10.217.43.6
          hostnameOverride: rke-infra-worker-default-1
          labels:
            cattle.io/creator: norman
          nodeName: c-m6grn:m-8s9j2
          port: '22'
          role:
            - worker
          user: docker-user
        - address: 10.217.43.7
          hostnameOverride: rke-infra-worker-default-3
          labels:
            cattle.io/creator: norman
          nodeName: c-m6grn:m-97d7f
          port: '22'
          role:
            - worker
          user: docker-user
        - address: 10.217.43.8
          hostnameOverride: rke-infra-worker-e8ads-1
          labels:
            cattle.io/creator: norman
          nodeName: c-m6grn:m-c6gfc
          port: '22'
          role:
            - worker
          user: docker-user
        - address: 10.217.43.5
          hostnameOverride: rke-infra-control-2
          labels:
            cattle.io/creator: norman
          nodeName: c-m6grn:m-fvh7q
          port: '22'
          role:
            - etcd
            - controlplane
          user: docker-user
        - address: 10.217.43.12
          hostnameOverride: rke-infra-worker-default-4
          labels:
            cattle.io/creator: norman
          nodeName: c-m6grn:m-jbh8p
          port: '22'
          role:
            - worker
          user: docker-user
        - address: 10.217.43.10
          hostnameOverride: rke-infra-worker-e8ads-2
          labels:
            cattle.io/creator: norman
          nodeName: c-m6grn:m-lghwf
          port: '22'
          role:
            - worker
          user: docker-user
        - address: 10.217.43.13
          hostnameOverride: rke-infra-worker-e8ads-5
          labels:
            cattle.io/creator: norman
          nodeName: c-m6grn:m-qkxcv
          port: '22'
          role:
            - worker
          user: docker-user
        - address: 10.217.43.14
          hostnameOverride: rke-infra-worker-e8ads-3
          labels:
            cattle.io/creator: norman
          nodeName: c-m6grn:m-wgcwl
          port: '22'
          role:
            - worker
          user: docker-user
        - address: 10.217.43.19
          hostnameOverride: rke-infra-worker-e8ads-7
          labels:
            cattle.io/creator: norman
          nodeName: c-m6grn:m-z9ftp
          port: '22'
          role:
            - worker
          user: docker-user
      restore: {}
      rotateEncryptionKey: false
      services:
        etcd:
          backupConfig:
            enabled: false
            intervalHours: 12
            retention: 6
            s3BackupConfig: null
            timeout: 300
          creation: 12h
          extraArgs:
            election-timeout: '5000'
            heartbeat-interval: '500'
          retention: 72h
          snapshot: false
        kubeApi:
          serviceNodePortRange: 30000-32767
        kubeController:
          extraArgs:
            cluster-name: rke-infra
        kubelet: {}
        kubeproxy: {}
        scheduler: {}
      sshAgentAuth: false
      systemImages:
        aciCniDeployContainer: noiro/cnideploy:5.2.3.6.1d150da
        aciControllerContainer: noiro/aci-containers-controller:5.2.3.6.1d150da
        aciGbpServerContainer: noiro/gbp-server:5.2.3.6.1d150da
        aciHostContainer: noiro/aci-containers-host:5.2.3.6.1d150da
        aciMcastContainer: noiro/opflex:5.2.3.6.1d150da
        aciOpflexContainer: noiro/opflex:5.2.3.6.1d150da
        aciOpflexServerContainer: noiro/opflex-server:5.2.3.6.1d150da
        aciOvsContainer: noiro/openvswitch:5.2.3.6.1d150da
        alpine: rancher/rke-tools:v0.1.88
        calicoCni: rancher/calico-cni:v3.22.5-rancher1
        calicoControllers: rancher/mirrored-calico-kube-controllers:v3.22.5
        calicoCtl: rancher/mirrored-calico-ctl:v3.22.5
        calicoFlexVol: rancher/mirrored-calico-pod2daemon-flexvol:v3.22.5
        calicoNode: rancher/mirrored-calico-node:v3.22.5
        canalCni: rancher/calico-cni:v3.22.5-rancher1
        canalControllers: rancher/mirrored-calico-kube-controllers:v3.22.5
        canalFlannel: rancher/mirrored-flannelcni-flannel:v0.17.0
        canalFlexVol: rancher/mirrored-calico-pod2daemon-flexvol:v3.22.5
        canalNode: rancher/mirrored-calico-node:v3.22.5
        certDownloader: rancher/rke-tools:v0.1.88
        coredns: rancher/mirrored-coredns-coredns:1.9.3
        corednsAutoscaler: rancher/mirrored-cluster-proportional-autoscaler:1.8.5
        dnsmasq: rancher/mirrored-k8s-dns-dnsmasq-nanny:1.21.1
        etcd: rancher/mirrored-coreos-etcd:v3.5.4
        flannel: rancher/mirrored-coreos-flannel:v0.15.1
        flannelCni: rancher/flannel-cni:v0.3.0-rancher6
        ingress: rancher/nginx-ingress-controller:nginx-1.5.1-rancher2
        ingressBackend: rancher/mirrored-nginx-ingress-controller-defaultbackend:1.5-rancher1
        ingressWebhook: rancher/mirrored-ingress-nginx-kube-webhook-certgen:v1.1.1
        kubedns: rancher/mirrored-k8s-dns-kube-dns:1.21.1
        kubednsAutoscaler: rancher/mirrored-cluster-proportional-autoscaler:1.8.5
        kubednsSidecar: rancher/mirrored-k8s-dns-sidecar:1.21.1
        kubernetes: rancher/hyperkube:v1.24.13-rancher2
        kubernetesServicesSidecar: rancher/rke-tools:v0.1.88
        metricsServer: rancher/mirrored-metrics-server:v0.6.2
        nginxProxy: rancher/rke-tools:v0.1.88
        nodelocal: rancher/mirrored-k8s-dns-node-cache:1.21.1
        podInfraContainer: rancher/mirrored-pause:3.7
        weaveCni: weaveworks/weave-npc:2.8.1
        weaveNode: weaveworks/weave-kube:2.8.1
        windowsPodInfraContainer: rancher/mirrored-pause:3.7
      upgradeStrategy:
        drain: false
        maxUnavailableControlplane: '1'
        maxUnavailableWorker: 10%%
        nodeDrainInput:
          gracePeriod: -1
          ignoreDaemonSets: true
          timeout: 120
    scheduledClusterScan: {}
    windowsPreferedCluster: false
  authImage: ''
  caCert: >-
    CACERTGOESHERE
  capabilities:
    ingressCapabilities:
      - ingressProvider: none
    loadBalancerCapabilities:
      enabled: true
      healthCheckSupported: true
      protocolsSupported:
        - TCP
        - UDP
      provider: Azure L4 LB
    nodePoolScalingSupported: true
    nodePortRange: 30000-32767
    taintSupport: true
  capacity:
    cpu: '72'
    memory: 493662984Ki
    pods: '990'
  certificatesExpiration:
    kube-apiserver:
      expirationDate: '2033-08-11T17:27:57Z'
    kube-apiserver-proxy-client:
      expirationDate: '2031-03-23T23:16:01Z'
    kube-ca:
      expirationDate: '2031-03-23T23:15:59Z'
    kube-controller-manager:
      expirationDate: '2031-03-23T23:16:00Z'
    kube-etcd-10-217-43-4:
      expirationDate: '2033-08-11T17:27:57Z'
    kube-etcd-10-217-43-5:
      expirationDate: '2033-08-11T17:27:57Z'
    kube-node:
      expirationDate: '2031-03-23T23:16:00Z'
    kube-proxy:
      expirationDate: '2031-03-23T23:16:00Z'
    kube-scheduler:
      expirationDate: '2031-03-23T23:16:00Z'
  conditions:
    - lastUpdateTime: ''
      status: 'True'
      type: Pending
    - lastUpdateTime: '2021-03-25T23:20:49Z'
      status: 'True'
      type: Provisioned
    - lastUpdateTime: '2021-03-25T23:22:59Z'
      status: 'True'
      type: Waiting
    - lastUpdateTime: '2021-03-25T23:11:09Z'
      status: 'True'
      type: BackingNamespaceCreated
    - lastUpdateTime: '2021-03-25T23:11:09Z'
      status: 'True'
      type: DefaultProjectCreated
    - lastUpdateTime: '2021-03-25T23:11:09Z'
      status: 'True'
      type: SystemProjectCreated
    - lastUpdateTime: '2021-03-25T23:11:09Z'
      status: 'True'
      type: InitialRolesPopulated
    - lastUpdateTime: '2021-03-25T23:11:10Z'
      status: 'True'
      type: CreatorMadeOwner
    - lastUpdateTime: '2021-03-25T23:11:10Z'
      status: 'True'
      type: NoDiskPressure
    - lastUpdateTime: '2021-03-25T23:11:10Z'
      status: 'True'
      type: NoMemoryPressure
    - lastUpdateTime: '2021-03-25T23:20:50Z'
      status: 'False'
      type: AlertingEnabled
    - lastUpdateTime: '2021-03-25T23:20:55Z'
      status: 'True'
      type: SystemAccountCreated
    - lastUpdateTime: '2021-03-25T23:20:55Z'
      status: 'True'
      type: AgentDeployed
    - lastUpdateTime: '2021-04-29T19:06:23Z'
      status: 'False'
      type: PrometheusOperatorDeployed
    - lastUpdateTime: '2023-08-14T18:11:37Z'
      message: waiting for rke-infra-control-1 to finish provisioning
      reason: Provisioning
      status: Unknown
      type: Updated
    - lastUpdateTime: '2021-03-25T23:21:08Z'
      status: 'True'
      type: ServiceAccountMigrated
    - lastUpdateTime: '2021-03-25T23:21:13Z'
      status: 'True'
      type: GlobalAdminsSynced
    - lastUpdateTime: '2023-08-15T18:45:46Z'
      status: 'True'
      type: Ready
    - lastUpdateTime: '2021-04-29T19:06:22Z'
      status: 'False'
      type: MonitoringEnabled
    - lastUpdateTime: '2023-08-14T01:47:13Z'
      status: 'True'
      type: Connected
    - lastUpdateTime: '2023-08-01T23:15:37Z'
      status: 'True'
      type: Upgraded
    - lastUpdateTime: '2023-03-21T14:53:03Z'
      status: 'True'
      type: SecretsMigrated
    - lastUpdateTime: '2023-03-21T14:53:03Z'
      status: 'True'
      type: ServiceAccountSecretsMigrated
    - lastUpdateTime: '2023-03-21T14:53:04Z'
      status: 'True'
      type: RKESecretsMigrated
    - lastUpdateTime: '2023-07-12T14:00:23Z'
      status: 'True'
      type: ACISecretsMigrated
  driver: rancherKubernetesEngine
  eksStatus:
    managedLaunchTemplateID: ''
    managedLaunchTemplateVersions: null
    privateRequiresTunnel: null
    securityGroups: null
    subnets: null
    upstreamSpec: null
    virtualNetwork: ''
  failedSpec:
    agentImageOverride: ''
    answers: {}
    clusterSecrets:
      aadClientSecret: cluster-aadclientsecret-5t89g
    clusterTemplateName: cattle-global-data:ct-n7k8j
    clusterTemplateRevisionName: cattle-global-data:ctr-fvv78
    description: ''
    desiredAgentImage: ''
    desiredAuthImage: ''
    displayName: infra
    dockerRootDir: /var/lib/docker
    enableClusterAlerting: false
    enableClusterMonitoring: false
    enableNetworkPolicy: true
    fleetWorkspaceName: fleet-default
    internal: false
    localClusterAuthEndpoint:
      enabled: false
    rancherKubernetesEngineConfig:
      addonJobTimeout: 45
      authentication:
        strategy: x509
      authorization: {}
      bastionHost: {}
      cloudProvider:
        azureCloudProvider:
          aadClientCertPassword: ''
          aadClientCertPath: ''
          aadClientId: 74b8186b-3d9a-4f06-babf-bcf21800bb1e
          aadClientSecret: ''
          cloud: AzurePublicCloud
          cloudProviderBackoff: false
          cloudProviderBackoffDuration: 0
          cloudProviderBackoffExponent: 0
          cloudProviderBackoffJitter: 0
          cloudProviderBackoffRetries: 0
          cloudProviderRateLimit: false
          cloudProviderRateLimitBucket: 0
          cloudProviderRateLimitQPS: 0
          loadBalancerSku: standard
          location: eastus
          maximumLoadBalancerRuleCount: 0
          primaryAvailabilitySetName: ''
          primaryScaleSetName: ''
          resourceGroup: HRAO-CTR
          routeTableName: ''
          securityGroupName: ''
          subnetName: ''
          subscriptionId: a0105f2a-8613-4569-8524-50ea38325e8f
          tenantId: 9bc542c5-7e5c-4dba-9a1f-5b2f4a3f9d7c
          useInstanceMetadata: true
          useManagedIdentityExtension: false
          vmType: ''
          vnetName: rmgmt-network
          vnetResourceGroup: HRAO-CTR
        name: azure
      enableCriDockerd: true
      ignoreDockerVersion: false
      ingress:
        defaultBackend: true
        defaultIngressClass: true
        provider: none
      kubernetesVersion: v1.24.13-rancher2-1
      monitoring:
        provider: metrics-server
        replicas: 1
      network:
        plugin: canal
      nodes:
        - address: 10.217.43.4
          hostnameOverride: rke-infra-control-3
          labels:
            cattle.io/creator: norman
          nodeName: c-m6grn:m-5z9dq
          port: '22'
          role:
            - etcd
            - controlplane
          user: docker-user
        - address: 10.217.43.15
          hostnameOverride: rke-infra-worker-e8ads-6
          labels:
            cattle.io/creator: norman
          nodeName: c-m6grn:m-6jk5d
          port: '22'
          role:
            - worker
          user: docker-user
        - address: 10.217.43.6
          hostnameOverride: rke-infra-worker-default-1
          labels:
            cattle.io/creator: norman
          nodeName: c-m6grn:m-8s9j2
          port: '22'
          role:
            - worker
          user: docker-user
        - address: 10.217.43.7
          hostnameOverride: rke-infra-worker-default-3
          labels:
            cattle.io/creator: norman
          nodeName: c-m6grn:m-97d7f
          port: '22'
          role:
            - worker
          user: docker-user
        - address: 10.217.43.8
          hostnameOverride: rke-infra-worker-e8ads-1
          labels:
            cattle.io/creator: norman
          nodeName: c-m6grn:m-c6gfc
          port: '22'
          role:
            - worker
          user: docker-user
        - address: 10.217.43.11
          hostnameOverride: rke-infra-control-4
          labels:
            cattle.io/creator: norman
          nodeName: c-m6grn:m-fdjt5
          port: '22'
          role:
            - etcd
            - controlplane
          user: docker-user
        - address: 10.217.43.5
          hostnameOverride: rke-infra-control-2
          labels:
            cattle.io/creator: norman
          nodeName: c-m6grn:m-fvh7q
          port: '22'
          role:
            - etcd
            - controlplane
          user: docker-user
        - address: 10.217.43.12
          hostnameOverride: rke-infra-worker-default-4
          labels:
            cattle.io/creator: norman
          nodeName: c-m6grn:m-jbh8p
          port: '22'
          role:
            - worker
          user: docker-user
        - address: 10.217.43.10
          hostnameOverride: rke-infra-worker-e8ads-2
          labels:
            cattle.io/creator: norman
          nodeName: c-m6grn:m-lghwf
          port: '22'
          role:
            - worker
          user: docker-user
        - address: 10.217.43.13
          hostnameOverride: rke-infra-worker-e8ads-5
          labels:
            cattle.io/creator: norman
          nodeName: c-m6grn:m-qkxcv
          port: '22'
          role:
            - worker
          user: docker-user
        - address: 10.217.43.14
          hostnameOverride: rke-infra-worker-e8ads-3
          labels:
            cattle.io/creator: norman
          nodeName: c-m6grn:m-wgcwl
          port: '22'
          role:
            - worker
          user: docker-user
        - address: 10.217.43.19
          hostnameOverride: rke-infra-worker-e8ads-7
          labels:
            cattle.io/creator: norman
          nodeName: c-m6grn:m-z9ftp
          port: '22'
          role:
            - worker
          user: docker-user
      restore: {}
      rotateEncryptionKey: false
      services:
        etcd:
          backupConfig:
            enabled: false
            intervalHours: 12
            retention: 6
            s3BackupConfig: null
            timeout: 300
          creation: 12h
          extraArgs:
            election-timeout: '5000'
            heartbeat-interval: '500'
          retention: 72h
          snapshot: false
        kubeApi:
          serviceNodePortRange: 30000-32767
        kubeController:
          extraArgs:
            cluster-name: rke-infra
        kubelet: {}
        kubeproxy: {}
        scheduler: {}
      sshAgentAuth: false
      systemImages:
        aciCniDeployContainer: noiro/cnideploy:5.2.3.6.1d150da
        aciControllerContainer: noiro/aci-containers-controller:5.2.3.6.1d150da
        aciGbpServerContainer: noiro/gbp-server:5.2.3.6.1d150da
        aciHostContainer: noiro/aci-containers-host:5.2.3.6.1d150da
        aciMcastContainer: noiro/opflex:5.2.3.6.1d150da
        aciOpflexContainer: noiro/opflex:5.2.3.6.1d150da
        aciOpflexServerContainer: noiro/opflex-server:5.2.3.6.1d150da
        aciOvsContainer: noiro/openvswitch:5.2.3.6.1d150da
        alpine: rancher/rke-tools:v0.1.88
        calicoCni: rancher/calico-cni:v3.22.5-rancher1
        calicoControllers: rancher/mirrored-calico-kube-controllers:v3.22.5
        calicoCtl: rancher/mirrored-calico-ctl:v3.22.5
        calicoFlexVol: rancher/mirrored-calico-pod2daemon-flexvol:v3.22.5
        calicoNode: rancher/mirrored-calico-node:v3.22.5
        canalCni: rancher/calico-cni:v3.22.5-rancher1
        canalControllers: rancher/mirrored-calico-kube-controllers:v3.22.5
        canalFlannel: rancher/mirrored-flannelcni-flannel:v0.17.0
        canalFlexVol: rancher/mirrored-calico-pod2daemon-flexvol:v3.22.5
        canalNode: rancher/mirrored-calico-node:v3.22.5
        certDownloader: rancher/rke-tools:v0.1.88
        coredns: rancher/mirrored-coredns-coredns:1.9.3
        corednsAutoscaler: rancher/mirrored-cluster-proportional-autoscaler:1.8.5
        dnsmasq: rancher/mirrored-k8s-dns-dnsmasq-nanny:1.21.1
        etcd: rancher/mirrored-coreos-etcd:v3.5.4
        flannel: rancher/mirrored-coreos-flannel:v0.15.1
        flannelCni: rancher/flannel-cni:v0.3.0-rancher6
        ingress: rancher/nginx-ingress-controller:nginx-1.5.1-rancher2
        ingressBackend: rancher/mirrored-nginx-ingress-controller-defaultbackend:1.5-rancher1
        ingressWebhook: rancher/mirrored-ingress-nginx-kube-webhook-certgen:v1.1.1
        kubedns: rancher/mirrored-k8s-dns-kube-dns:1.21.1
        kubednsAutoscaler: rancher/mirrored-cluster-proportional-autoscaler:1.8.5
        kubednsSidecar: rancher/mirrored-k8s-dns-sidecar:1.21.1
        kubernetes: rancher/hyperkube:v1.24.13-rancher2
        kubernetesServicesSidecar: rancher/rke-tools:v0.1.88
        metricsServer: rancher/mirrored-metrics-server:v0.6.2
        nginxProxy: rancher/rke-tools:v0.1.88
        nodelocal: rancher/mirrored-k8s-dns-node-cache:1.21.1
        podInfraContainer: rancher/mirrored-pause:3.7
        weaveCni: weaveworks/weave-npc:2.8.1
        weaveNode: weaveworks/weave-kube:2.8.1
        windowsPodInfraContainer: rancher/mirrored-pause:3.7
      upgradeStrategy:
        drain: false
        maxUnavailableControlplane: '1'
        maxUnavailableWorker: 10%%
        nodeDrainInput:
          gracePeriod: -1
          ignoreDaemonSets: true
          timeout: 120
    scheduledClusterScan: {}
    windowsPreferedCluster: false
  gkeStatus:
    privateRequiresTunnel: null
    upstreamSpec: null
  limits:
    cpu: 33600m
    memory: 299838Mi
    pods: '0'
  linuxWorkerCount: 9
  nodeCount: 12
  nodeVersion: 8
  provider: rke
  requested:
    cpu: 34600m
    memory: 260590Mi
    pods: '208'
  serviceAccountTokenSecret: cluster-serviceaccounttoken-j86j4
  version:
    buildDate: '2023-04-12T12:08:36Z'
    compiler: gc
    gitCommit: 49433308be5b958856b6949df02b716e0a7cf0a3
    gitTreeState: clean
    gitVersion: v1.24.13
    goVersion: go1.19.8
    major: '1'
    minor: '24'
    platform: linux/amd64

Steps to Reproduce: Deleted our kube-controller-manager leader node. Trying to upgrade from Ubuntu 18 to 22. Spinning up a new control node and it gits stuck running the first docker container. "rancher/rancher-agent:v2.6.13 "run.sh --server htt…"" The logs are as follows:

INFO: Arguments: --server https://rancher.nameofplatform.com/ --token REDACTED -r -n m-ccnpb
INFO: Environment: CATTLE_ADDRESS=10.217.43.18 CATTLE_AGENT_CONNECT=true CATTLE_INTERNAL_ADDRESS= CATTLE_NODE_NAME=m-ccnpb CATTLE_SERVER=https://rancher.nameofplatform.com/ CATTLE_TOKEN=REDACTED
INFO: Using resolv.conf: nameserver 127.0.0.53 options edns0 trust-ad search 3xjoa1i2lshejpt0ninj35opia.bx.internal.cloudapp.net
WARN: Loopback address found in /etc/resolv.conf, please refer to the documentation how to configure your cluster to resolve DNS properly
INFO: https://rancher.nameofplatform.com/ping is accessible
INFO: rancher.nameofplatform.com resolves to 10.217.42.5
time="2023-08-17T15:57:24Z" level=info msg="Listening on /tmp/log.sock"
time="2023-08-17T15:57:24Z" level=info msg="Rancher agent version v2.6.13 is starting"
time="2023-08-17T15:57:24Z" level=info msg="Option etcd=false"
time="2023-08-17T15:57:24Z" level=info msg="Option controlPlane=false"
time="2023-08-17T15:57:24Z" level=info msg="Option worker=false"
time="2023-08-17T15:57:24Z" level=info msg="Option requestedHostname=m-ccnpb"
time="2023-08-17T15:57:24Z" level=info msg="Option dockerInfo={FUN4:2SZJ:AXCV:IW6F:JUDO:MHCD:WIJT:QGAB:PAE4:2VWN:GZCI:SGDX 1 1 0 0 1 overlay2 [[Backing Filesystem extfs] [Supports d_type true] [Native Overlay Diff false] [userxattr false]] [] {[local] [bridge host ipvlan macvlan null overlay] [] [awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog]} true true false false true true true true true true true true false 32 false 40 2023-08-17T15:57:24.266091701Z json-file systemd 2 0 5.15.0-1041-azure Ubuntu 22.04.3 LTS 22.04 linux x86_64 https://index.docker.io/v1/ 0xc001296150 2 8324927488 [] /var/lib/docker    rke-infra-control-4 [provider=azure] false 20.10.24   map[io.containerd.runc.v2:{runc [] <nil>} io.containerd.runtime.v1.linux:{runc [] <nil>} runc:{runc [] <nil>}] runc {  inactive false  [] 0 0 <nil> []} false  docker-init {8165feabfdfe38c65b599c4993d227328c231fca 8165feabfdfe38c65b599c4993d227328c231fca} {v1.1.8-0-g82f18fe v1.1.8-0-g82f18fe} {de40ad0 de40ad0} [name=apparmor name=seccomp,profile=default name=cgroupns]  [] []}"
time="2023-08-17T15:57:24Z" level=info msg="Option customConfig=map[address:10.217.43.18 internalAddress: label:map[] roles:[] taints:[]]"
time="2023-08-17T15:57:24Z" level=info msg="Connecting to wss://rancher.nameofplatform.com/v3/connect with token starting with 945w6gmwldmfpsadasda42knk6r"
time="2023-08-17T15:57:24Z" level=info msg="Connecting to proxy" url="wss://rancher.nameofplatform.com/v3/connect"
time="2023-08-17T15:57:24Z" level=info msg="Requesting kubelet certificate regeneration"
time="2023-08-17T15:57:24Z" level=info msg="Starting plan monitor, checking every 15 seconds"
time="2023-08-17T15:57:39Z" level=info msg="Requesting kubelet certificate regeneration"
time="2023-08-17T15:57:39Z" level=info msg="Plan monitor checking 120 seconds"

Results: Stuck waiting for the new node to register. Stuck in an updating cluster state. It is waiting for control-node-1 to provision but there is no longer a control-node-1 it has been deleted. It also still sees control-node-1 as our kube-controller-manager leader.

tfon23 commented 1 year ago

bump.

github-actions[bot] commented 11 months ago

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.