kubernetes / kops

Kubernetes Operations (kOps) - Production Grade k8s Installation, Upgrades and Management
https://kops.sigs.k8s.io/
Apache License 2.0
15.94k stars 4.65k forks source link

Cluster-autoscaler pods on are in CrashLoopBackOff due to failure during GCE Manager creation #14693

Closed fbozic closed 1 year ago

fbozic commented 1 year ago

/kind bug

I'm trying to set up a new cluster on GCE with cluster-autoscaler enabled. I couldn't find specific cluster-autoscaler GCE docs, so I've tried just to enable it. Let me know if cluster-autoscaler is not supported for GCE and this is not actually a bug.

1. What kops version are you running? The command kops version, will display this information. Client version: 1.25.3 (git-v1.25.3)

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag.

Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.4", GitCommit:"872a965c6c6526caa949f0c6ac028ef7aff3fb78", GitTreeState:"clean", BuildDate:"2022-11-09T13:36:36Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.4", GitCommit:"872a965c6c6526caa949f0c6ac028ef7aff3fb78", GitTreeState:"clean", BuildDate:"2022-11-09T13:29:58Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"linux/amd64"}

3. What cloud provider are you using?

GCE

4. What commands did you run? What is the simplest way to reproduce this issue?

kops create -f kops.yaml
kops update cluster --name my-fake-name.k8s.local --yes
kops export kubecfg --admin
kops validate cluster --wait 10m

5. What happened after the commands executed? Cluster never becomes healthy because cluster-autoscaler pods are in CrashLoopBackOff.

6. What did you expect to happen? Cluster becomes healthy and all pods deployed by kops are running.

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: null
  name: my-fake-name.k8s.local
spec:
  api:
    loadBalancer:
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudConfig: {}
  cloudProvider: gce
  configBase: gs://my-state-store/my-fake-name.k8s.local
  clusterAutoscaler:
    enabled: true
  etcdClusters:
    - cpuRequest: 200m
      etcdMembers:
        - instanceGroup: master-europe-west3-a
          name: a
        - instanceGroup: master-europe-west3-b
          name: b
        - instanceGroup: master-europe-west3-c
          name: c
      memoryRequest: 100Mi
      name: main
    - cpuRequest: 100m
      etcdMembers:
        - instanceGroup: master-europe-west3-a
          name: a
        - instanceGroup: master-europe-west3-b
          name: b
        - instanceGroup: master-europe-west3-c
          name: c
      memoryRequest: 100Mi
      name: events
    - cpuRequest: 100m
      etcdMembers:
        - instanceGroup: master-europe-west3-a
          name: a
        - instanceGroup: master-europe-west3-b
          name: b
        - instanceGroup: master-europe-west3-c
          name: c
      manager:
        env:
          - name: ETCD_AUTO_COMPACTION_MODE
            value: revision
          - name: ETCD_AUTO_COMPACTION_RETENTION
            value: "2500"
      memoryRequest: 100Mi
      name: cilium
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeProxy:
    enabled: false
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
  kubernetesApiAccess:
    - 0.0.0.0/0
    - ::/0
  kubernetesVersion: 1.25.4
  masterPublicName: api.my-fake-name.k8s.local
  metricsServer:
    enabled: true
    insecure: true
  networkID: my-network
  networking:
    cilium:
      enableNodePort: true
      etcdManaged: true
  nonMasqueradeCIDR: 100.64.0.0/10
  project: my-project
  subnets:
    - cidr: 10.0.32.0/20
      name: europe-west3
      region: europe-west3
      type: Private
  topology:
    dns:
      type: Public
    masters: private
    nodes: private

---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: my-fake-name.k8s.local
  name: master-europe-west3-a
spec:
  image: ubuntu-os-cloud/ubuntu-2004-focal-v20221018
  machineType: n1-standard-8
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
    - europe-west3
  zones:
    - europe-west3-a

---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: my-fake-name.k8s.local
  name: master-europe-west3-b
spec:
  image: ubuntu-os-cloud/ubuntu-2004-focal-v20221018
  machineType: n1-standard-8
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
    - europe-west3
  zones:
    - europe-west3-b

---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: my-fake-name.k8s.local
  name: master-europe-west3-c
spec:
  image: ubuntu-os-cloud/ubuntu-2004-focal-v20221018
  machineType: n1-standard-8
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
    - europe-west3
  zones:
    - europe-west3-c

---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: my-fake-name.k8s.local
  name: nodes-europe-west3
spec:
  cloudLabels:
    k8s.io/cluster-autoscaler/enabled: "1"
    k8s.io/cluster-autoscaler/my-fake-name.k8s.local: "1"
    my-label: "common"
  image: ubuntu-os-cloud/ubuntu-2004-focal-v20221018
  machineType: n1-standard-8
  maxSize: 6
  minSize: 3
  role: Node
  subnets:
    - europe-west3
  zones:
    - europe-west3-a
    - europe-west3-b
    - europe-west3-c

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here. I've only attached info about cluster-autoscaler pods because cluster provisioning without cluster-autoscaler works. Pods are running on master nodes. I've also noticed that pods have env var AWS_REGION set to europe-west3. Location is correct but it is not AWS region.

Pod status:

❯ kgp -A -l 'app=cluster-autoscaler' -owide
NAMESPACE     NAME                                  READY   STATUS             RESTARTS        AGE   IP            NODE                         NOMINATED NODE   READINESS GATES
kube-system   cluster-autoscaler-656574b474-9cd9j   0/1     CrashLoopBackOff   9 (4m25s ago)   29m   100.96.1.77   master-europe-west3-b-h453   <none>           <none>
kube-system   cluster-autoscaler-656574b474-9s7nz   0/1     CrashLoopBackOff   9 (4m45s ago)   29m   100.96.0.54   master-europe-west3-c-c1zx   <none>           <none>

Pod logs:

❯ k logs cluster-autoscaler-656574b474-9cd9j
I1130 09:56:14.091297       1 flags.go:57] FLAG: --add-dir-header="false"
I1130 09:56:14.091371       1 flags.go:57] FLAG: --address=":8085"
I1130 09:56:14.091377       1 flags.go:57] FLAG: --alsologtostderr="false"
I1130 09:56:14.091381       1 flags.go:57] FLAG: --aws-use-static-instance-list="false"
I1130 09:56:14.091385       1 flags.go:57] FLAG: --balance-similar-node-groups="false"
I1130 09:56:14.091388       1 flags.go:57] FLAG: --balancing-ignore-label="[]"
I1130 09:56:14.091391       1 flags.go:57] FLAG: --balancing-label="[]"
I1130 09:56:14.091395       1 flags.go:57] FLAG: --cloud-config=""
I1130 09:56:14.091398       1 flags.go:57] FLAG: --cloud-provider="gce"
I1130 09:56:14.091402       1 flags.go:57] FLAG: --cloud-provider-gce-l7lb-src-cidrs="130.211.0.0/22,35.191.0.0/16"
I1130 09:56:14.091410       1 flags.go:57] FLAG: --cloud-provider-gce-lb-src-cidrs="130.211.0.0/22,209.85.152.0/22,209.85.204.0/22,35.191.0.0/16"
I1130 09:56:14.091419       1 flags.go:57] FLAG: --cluster-name=""
I1130 09:56:14.091424       1 flags.go:57] FLAG: --clusterapi-cloud-config-authoritative="false"
I1130 09:56:14.091429       1 flags.go:57] FLAG: --cordon-node-before-terminating="true"
I1130 09:56:14.091435       1 flags.go:57] FLAG: --cores-total="0:320000"
I1130 09:56:14.091440       1 flags.go:57] FLAG: --daemonset-eviction-for-empty-nodes="false"
I1130 09:56:14.091446       1 flags.go:57] FLAG: --daemonset-eviction-for-occupied-nodes="true"
I1130 09:56:14.091451       1 flags.go:57] FLAG: --debugging-snapshot-enabled="false"
I1130 09:56:14.091458       1 flags.go:57] FLAG: --emit-per-nodegroup-metrics="false"
I1130 09:56:14.091463       1 flags.go:57] FLAG: --estimator="binpacking"
I1130 09:56:14.091467       1 flags.go:57] FLAG: --expander="random"
I1130 09:56:14.091471       1 flags.go:57] FLAG: --expendable-pods-priority-cutoff="-10"
I1130 09:56:14.091474       1 flags.go:57] FLAG: --feature-gates=""
I1130 09:56:14.091480       1 flags.go:57] FLAG: --gce-concurrent-refreshes="1"
I1130 09:56:14.091484       1 flags.go:57] FLAG: --gce-expander-ephemeral-storage-support="false"
I1130 09:56:14.091487       1 flags.go:57] FLAG: --gpu-total="[]"
I1130 09:56:14.091492       1 flags.go:57] FLAG: --grpc-expander-cert=""
I1130 09:56:14.091501       1 flags.go:57] FLAG: --grpc-expander-url=""
I1130 09:56:14.091505       1 flags.go:57] FLAG: --ignore-daemonsets-utilization="false"
I1130 09:56:14.091509       1 flags.go:57] FLAG: --ignore-mirror-pods-utilization="false"
I1130 09:56:14.091514       1 flags.go:57] FLAG: --ignore-taint="[]"
I1130 09:56:14.091519       1 flags.go:57] FLAG: --initial-node-group-backoff-duration="5m0s"
I1130 09:56:14.091525       1 flags.go:57] FLAG: --kubeconfig=""
I1130 09:56:14.091532       1 flags.go:57] FLAG: --kubernetes=""
I1130 09:56:14.091538       1 flags.go:57] FLAG: --leader-elect="true"
I1130 09:56:14.091546       1 flags.go:57] FLAG: --leader-elect-lease-duration="15s"
I1130 09:56:14.091553       1 flags.go:57] FLAG: --leader-elect-renew-deadline="10s"
I1130 09:56:14.091559       1 flags.go:57] FLAG: --leader-elect-resource-lock="leases"
I1130 09:56:14.091564       1 flags.go:57] FLAG: --leader-elect-resource-name="cluster-autoscaler"
I1130 09:56:14.091568       1 flags.go:57] FLAG: --leader-elect-resource-namespace=""
I1130 09:56:14.091572       1 flags.go:57] FLAG: --leader-elect-retry-period="2s"
I1130 09:56:14.091577       1 flags.go:57] FLAG: --log-backtrace-at=":0"
I1130 09:56:14.091585       1 flags.go:57] FLAG: --log-dir=""
I1130 09:56:14.091589       1 flags.go:57] FLAG: --log-file=""
I1130 09:56:14.091592       1 flags.go:57] FLAG: --log-file-max-size="1800"
I1130 09:56:14.091596       1 flags.go:57] FLAG: --logtostderr="true"
I1130 09:56:14.091600       1 flags.go:57] FLAG: --max-autoprovisioned-node-group-count="15"
I1130 09:56:14.091603       1 flags.go:57] FLAG: --max-bulk-soft-taint-count="10"
I1130 09:56:14.091608       1 flags.go:57] FLAG: --max-bulk-soft-taint-time="3s"
I1130 09:56:14.091616       1 flags.go:57] FLAG: --max-drain-parallelism="1"
I1130 09:56:14.091639       1 flags.go:57] FLAG: --max-empty-bulk-delete="10"
I1130 09:56:14.091645       1 flags.go:57] FLAG: --max-failing-time="15m0s"
I1130 09:56:14.091650       1 flags.go:57] FLAG: --max-graceful-termination-sec="600"
I1130 09:56:14.091656       1 flags.go:57] FLAG: --max-inactivity="10m0s"
I1130 09:56:14.091661       1 flags.go:57] FLAG: --max-node-group-backoff-duration="30m0s"
I1130 09:56:14.091666       1 flags.go:57] FLAG: --max-node-provision-time="15m0s"
I1130 09:56:14.091672       1 flags.go:57] FLAG: --max-nodegroup-binpacking-duration="10s"
I1130 09:56:14.091680       1 flags.go:57] FLAG: --max-nodes-per-scaleup="1000"
I1130 09:56:14.091693       1 flags.go:57] FLAG: --max-nodes-total="0"
I1130 09:56:14.091699       1 flags.go:57] FLAG: --max-pod-eviction-time="2m0s"
I1130 09:56:14.091704       1 flags.go:57] FLAG: --max-scale-down-parallelism="10"
I1130 09:56:14.091711       1 flags.go:57] FLAG: --max-total-unready-percentage="45"
I1130 09:56:14.091717       1 flags.go:57] FLAG: --memory-total="0:6400000"
I1130 09:56:14.091722       1 flags.go:57] FLAG: --min-replica-count="0"
I1130 09:56:14.091726       1 flags.go:57] FLAG: --namespace="kube-system"
I1130 09:56:14.091731       1 flags.go:57] FLAG: --new-pod-scale-up-delay="0s"
I1130 09:56:14.091740       1 flags.go:57] FLAG: --node-autoprovisioning-enabled="false"
I1130 09:56:14.091752       1 flags.go:57] FLAG: --node-deletion-delay-timeout="2m0s"
I1130 09:56:14.091758       1 flags.go:57] FLAG: --node-group-auto-discovery="[]"
I1130 09:56:14.091763       1 flags.go:57] FLAG: --node-group-backoff-reset-timeout="3h0m0s"
I1130 09:56:14.091768       1 flags.go:57] FLAG: --node-info-cache-expire-time="87600h0m0s"
I1130 09:56:14.091785       1 flags.go:57] FLAG: --nodes="[3:6:nodes-europe-west3]"
I1130 09:56:14.091803       1 flags.go:57] FLAG: --ok-total-unready-count="3"
I1130 09:56:14.091808       1 flags.go:57] FLAG: --one-output="false"
I1130 09:56:14.091813       1 flags.go:57] FLAG: --profiling="false"
I1130 09:56:14.091818       1 flags.go:57] FLAG: --record-duplicated-events="false"
I1130 09:56:14.091823       1 flags.go:57] FLAG: --regional="false"
I1130 09:56:14.091828       1 flags.go:57] FLAG: --scale-down-candidates-pool-min-count="50"
I1130 09:56:14.091832       1 flags.go:57] FLAG: --scale-down-candidates-pool-ratio="0.1"
I1130 09:56:14.091839       1 flags.go:57] FLAG: --scale-down-delay-after-add="10m0s"
I1130 09:56:14.091846       1 flags.go:57] FLAG: --scale-down-delay-after-delete="0s"
I1130 09:56:14.091851       1 flags.go:57] FLAG: --scale-down-delay-after-failure="3m0s"
I1130 09:56:14.091857       1 flags.go:57] FLAG: --scale-down-enabled="true"
I1130 09:56:14.091863       1 flags.go:57] FLAG: --scale-down-gpu-utilization-threshold="0.5"
I1130 09:56:14.091869       1 flags.go:57] FLAG: --scale-down-non-empty-candidates-count="30"
I1130 09:56:14.091875       1 flags.go:57] FLAG: --scale-down-unneeded-time="10m0s"
I1130 09:56:14.091881       1 flags.go:57] FLAG: --scale-down-unready-time="20m0s"
I1130 09:56:14.091890       1 flags.go:57] FLAG: --scale-down-utilization-threshold="0.5"
I1130 09:56:14.091903       1 flags.go:57] FLAG: --scale-up-from-zero="true"
I1130 09:56:14.091909       1 flags.go:57] FLAG: --scan-interval="10s"
I1130 09:56:14.091915       1 flags.go:57] FLAG: --skip-headers="false"
I1130 09:56:14.091919       1 flags.go:57] FLAG: --skip-log-headers="false"
I1130 09:56:14.091923       1 flags.go:57] FLAG: --skip-nodes-with-local-storage="true"
I1130 09:56:14.091927       1 flags.go:57] FLAG: --skip-nodes-with-system-pods="true"
I1130 09:56:14.091932       1 flags.go:57] FLAG: --status-config-map-name="cluster-autoscaler-status"
I1130 09:56:14.091940       1 flags.go:57] FLAG: --stderrthreshold="0"
I1130 09:56:14.091947       1 flags.go:57] FLAG: --unremovable-node-recheck-timeout="5m0s"
I1130 09:56:14.091953       1 flags.go:57] FLAG: --user-agent="cluster-autoscaler"
I1130 09:56:14.091959       1 flags.go:57] FLAG: --v="4"
I1130 09:56:14.091964       1 flags.go:57] FLAG: --vmodule=""
I1130 09:56:14.091970       1 flags.go:57] FLAG: --write-status-configmap="true"
I1130 09:56:14.091978       1 main.go:446] Cluster Autoscaler 1.25.0
I1130 09:56:14.106376       1 leaderelection.go:248] attempting to acquire leader lease kube-system/cluster-autoscaler...
I1130 09:56:14.117458       1 leaderelection.go:258] successfully acquired lease kube-system/cluster-autoscaler
I1130 09:56:14.117640       1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"Lease", Namespace:"kube-system", Name:"cluster-autoscaler", UID:"0d3260d0-738e-4670-848a-97ce530042f3", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"4665", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' cluster-autoscaler-656574b474-9cd9j became leader
I1130 09:56:14.118801       1 reflector.go:221] Starting reflector *v1.Pod (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:188
I1130 09:56:14.118825       1 reflector.go:257] Listing and watching *v1.Pod from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:188
I1130 09:56:14.118864       1 reflector.go:221] Starting reflector *v1.Pod (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:212
I1130 09:56:14.118900       1 reflector.go:257] Listing and watching *v1.Pod from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:212
I1130 09:56:14.118941       1 reflector.go:221] Starting reflector *v1.PodDisruptionBudget (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:309
I1130 09:56:14.118972       1 reflector.go:257] Listing and watching *v1.PodDisruptionBudget from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:309
I1130 09:56:14.119029       1 reflector.go:221] Starting reflector *v1.Job (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:338
I1130 09:56:14.119050       1 reflector.go:257] Listing and watching *v1.Job from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:338
I1130 09:56:14.119027       1 reflector.go:221] Starting reflector *v1.ReplicationController (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:329
I1130 09:56:14.119072       1 reflector.go:257] Listing and watching *v1.ReplicationController from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:329
I1130 09:56:14.119139       1 reflector.go:221] Starting reflector *v1.Node (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:246
I1130 09:56:14.119150       1 reflector.go:257] Listing and watching *v1.Node from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:246
I1130 09:56:14.119161       1 reflector.go:221] Starting reflector *v1.DaemonSet (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:320
I1130 09:56:14.119179       1 reflector.go:257] Listing and watching *v1.DaemonSet from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:320
I1130 09:56:14.119197       1 reflector.go:221] Starting reflector *v1.ReplicaSet (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:347
I1130 09:56:14.119212       1 reflector.go:257] Listing and watching *v1.ReplicaSet from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:347
I1130 09:56:14.119264       1 reflector.go:221] Starting reflector *v1.StatefulSet (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:356
I1130 09:56:14.119281       1 reflector.go:257] Listing and watching *v1.StatefulSet from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:356
I1130 09:56:14.119294       1 reflector.go:221] Starting reflector *v1.Node (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:246
I1130 09:56:14.119304       1 reflector.go:257] Listing and watching *v1.Node from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:246
I1130 09:56:14.134798       1 cloud_provider_builder.go:29] Building gce cloud provider.
I1130 09:56:14.134848       1 gce_manager.go:152] Using default TokenSource &oauth2.reuseTokenSource{new:google.computeSource{account:"", scopes:[]string(nil)}, mu:sync.Mutex{state:0, sema:0x0}, t:(*oauth2.Token)(nil)}
I1130 09:56:14.135115       1 reflector.go:221] Starting reflector *v1.CSIDriver (0s) from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.135146       1 reflector.go:257] Listing and watching *v1.CSIDriver from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.135320       1 reflector.go:221] Starting reflector *v1.CSIStorageCapacity (0s) from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.135345       1 reflector.go:257] Listing and watching *v1.CSIStorageCapacity from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.135467       1 reflector.go:221] Starting reflector *v1.PodDisruptionBudget (0s) from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.135490       1 reflector.go:257] Listing and watching *v1.PodDisruptionBudget from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.135590       1 reflector.go:221] Starting reflector *v1.StatefulSet (0s) from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.135609       1 reflector.go:257] Listing and watching *v1.StatefulSet from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.135617       1 reflector.go:221] Starting reflector *v1.Namespace (0s) from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.135630       1 reflector.go:257] Listing and watching *v1.Namespace from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.135724       1 reflector.go:221] Starting reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.135734       1 reflector.go:257] Listing and watching *v1.Pod from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.135752       1 reflector.go:221] Starting reflector *v1.Service (0s) from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.135763       1 reflector.go:257] Listing and watching *v1.Service from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.135830       1 reflector.go:221] Starting reflector *v1.ReplicaSet (0s) from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.135842       1 reflector.go:257] Listing and watching *v1.ReplicaSet from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.135961       1 reflector.go:221] Starting reflector *v1.PersistentVolume (0s) from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.135970       1 reflector.go:257] Listing and watching *v1.PersistentVolume from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.135998       1 reflector.go:221] Starting reflector *v1.ReplicationController (0s) from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.136009       1 reflector.go:257] Listing and watching *v1.ReplicationController from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.136182       1 reflector.go:221] Starting reflector *v1.StorageClass (0s) from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.136185       1 reflector.go:221] Starting reflector *v1.CSINode (0s) from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.136198       1 reflector.go:257] Listing and watching *v1.CSINode from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.136191       1 reflector.go:257] Listing and watching *v1.StorageClass from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.136784       1 reflector.go:221] Starting reflector *v1.Node (0s) from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.136825       1 reflector.go:257] Listing and watching *v1.Node from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.136951       1 reflector.go:221] Starting reflector *v1.PersistentVolumeClaim (0s) from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.136965       1 reflector.go:257] Listing and watching *v1.PersistentVolumeClaim from k8s.io/client-go/informers/factory.go:134
I1130 09:56:14.139136       1 gce_manager.go:171] GCE projectId=my-project location=europe-west3-b
F1130 09:56:14.139206       1 gce_cloud_provider.go:368] Failed to create GCE Manager: failed to fetch MIGs: failed to parse mig url: nodes-europe-west3 got error: wrong url: expected format https://www.googleapis.com/compute/v1/projects/<project-id>/zones/<zone>/instanceGroups/<name>, got nodes-europe-west3

Pod yaml

apiVersion: v1
kind: Pod
metadata:
  annotations:
    prometheus.io/port: "8085"
    prometheus.io/scrape: "true"
  creationTimestamp: "2022-11-30T09:42:34Z"
  generateName: cluster-autoscaler-656574b474-
  labels:
    app: cluster-autoscaler
    app.kubernetes.io/name: cluster-autoscaler
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
    kops.k8s.io/managed-by: kops
    pod-template-hash: 656574b474
  name: cluster-autoscaler-656574b474-9s7nz
  namespace: kube-system
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: cluster-autoscaler-656574b474
    uid: 0f546e14-babd-4867-8eb8-3c1409ce5eea
  resourceVersion: "8811"
  uid: 62c1d867-bd78-43e9-b027-ba91a0a6824e
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: node-role.kubernetes.io/control-plane
            operator: Exists
        - matchExpressions:
          - key: node-role.kubernetes.io/master
            operator: Exists
  containers:
  - command:
    - ./cluster-autoscaler
    - --balance-similar-node-groups=false
    - --cloud-provider=gce
    - --expander=random
    - --nodes=3:6:nodes-europe-west3
    - --scale-down-utilization-threshold=0.5
    - --skip-nodes-with-local-storage=true
    - --skip-nodes-with-system-pods=true
    - --scale-down-delay-after-add=10m0s
    - --scale-down-unneeded-time=10m0s
    - --scale-down-unready-time=20m0s
    - --new-pod-scale-up-delay=0s
    - --max-node-provision-time=15m0s
    - --cordon-node-before-terminating=true
    - --logtostderr=true
    - --stderrthreshold=info
    - --v=4
    env:
    - name: AWS_REGION
      value: europe-west3
    image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.25.0@sha256:f509ffab618dbd07d129b69ec56963aac7f61aaa792851206b54a2f0bbe046df
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /health-check
        port: http
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: cluster-autoscaler
    ports:
    - containerPort: 8085
      name: http
      protocol: TCP
    resources:
      requests:
        cpu: 100m
        memory: 300Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-bsd2n
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: master-europe-west3-c-c1zx
  preemptionPolicy: PreemptLowerPriority
  priority: 2000000000
  priorityClassName: system-cluster-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: cluster-autoscaler
  serviceAccountName: cluster-autoscaler
  terminationGracePeriodSeconds: 30
  tolerations:
  - key: node-role.kubernetes.io/control-plane
    operator: Exists
  - key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  topologySpreadConstraints:
  - labelSelector:
      matchLabels:
        app: cluster-autoscaler
    maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway
  - labelSelector:
      matchLabels:
        app: cluster-autoscaler
    maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: DoNotSchedule
  volumes:
  - name: kube-api-access-bsd2n
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-11-30T09:43:29Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2022-11-30T10:12:19Z"
    message: 'containers with unready status: [cluster-autoscaler]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2022-11-30T10:12:19Z"
    message: 'containers with unready status: [cluster-autoscaler]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2022-11-30T09:43:29Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://fdd3af3ad0e510b926875f76763e71a8b8d94eb40d92407077f62c39b4f67cf7
    image: sha256:1de4f9066791f663c57865927239ffbf5086c6516042bceb5ea4803ffabafdab
    imageID: registry.k8s.io/autoscaling/cluster-autoscaler@sha256:f509ffab618dbd07d129b69ec56963aac7f61aaa792851206b54a2f0bbe046df
    lastState:
      terminated:
        containerID: containerd://fdd3af3ad0e510b926875f76763e71a8b8d94eb40d92407077f62c39b4f67cf7
        exitCode: 255
        finishedAt: "2022-11-30T10:12:18Z"
        reason: Error
        startedAt: "2022-11-30T10:12:00Z"
    name: cluster-autoscaler
    ready: false
    restartCount: 10
    started: false
    state:
      waiting:
        message: back-off 5m0s restarting failed container=cluster-autoscaler pod=cluster-autoscaler-656574b474-9s7nz_kube-system(62c1d867-bd78-43e9-b027-ba91a0a6824e)
        reason: CrashLoopBackOff
  hostIP: 10.0.32.4
  phase: Running
  podIP: 100.96.0.54
  podIPs:
  - ip: 100.96.0.54
  qosClass: Burstable
  startTime: "2022-11-30T09:43:29Z"

9. Anything else do we need to know?

hakman commented 1 year ago

Thanks for the report. I verified and prepare a fix for it in https://github.com/kubernetes/kops/pull/14700. Would help if you could test that change by manually editing the deployment to confirm that there's nothing else missing.

fbozic commented 1 year ago

Hi, thanks for the quick reply.
I've manually tested it by changing CAS deployment. I've opened GCP console, found MIGs that kOps created and translated that to CAS config.
I've noticed that kOps creates 1 MIG per zone even though I have defined 1 kOps IG with n zones. From the source code it looks like kOps doesn't support regional MIGs (yet), hence 1 MIG per zone. Let me know if I'm breaking the design here, and if I should opt out for 1 kOps IG per zone.

Here is an old configuration generate by kOps. There is only one nodes argument which is wrong because my InstanceGroup should be in 3 zones which means kOps creates 3 zonal MIGs.

containers:
  - command:
    - ./cluster-autoscaler
    - --balance-similar-node-groups=false
    - --cloud-provider=gce
    - --expander=random
    - --nodes=3:6:nodes-europe-west3
    - --scale-down-utilization-threshold=0.5
    - --skip-nodes-with-local-storage=true
    - --skip-nodes-with-system-pods=true
    - --scale-down-delay-after-add=10m0s
    - --scale-down-unneeded-time=10m0s
    - --scale-down-unready-time=20m0s
    - --new-pod-scale-up-delay=0s
    - --max-node-provision-time=15m0s
    - --cordon-node-before-terminating=true
    - --logtostderr=true
    - --stderrthreshold=info
    - --v=4

Here is my manual configuration. Notice I have 3 nodes arguments. Since I have define min=3,max=6 on entire kOps IG, each zonal MIG has min=1,max=2.

containers:
    - command:
      - ./cluster-autoscaler
      - --balance-similar-node-groups=false
      - --cloud-provider=gce
      - --expander=random
      - --nodes=1:2:https://www.googleapis.com/compute/v1/projects/my-project/zones/europe-west3-a/instanceGroups/a-nodes-europe-west3-my-fake-name-k8s-local
      - --nodes=1:2:https://www.googleapis.com/compute/v1/projects/my-project/zones/europe-west3-b/instanceGroups/b-nodes-europe-west3-my-fake-name-k8s-local
      - --nodes=1:2:https://www.googleapis.com/compute/v1/projects/my-project/zones/europe-west3-c/instanceGroups/c-nodes-europe-west3-my-fake-name-k8s-local
      - --scale-down-utilization-threshold=0.5
      - --skip-nodes-with-local-storage=true
      - --skip-nodes-with-system-pods=true
      - --scale-down-delay-after-add=10m0s
      - --scale-down-unneeded-time=10m0s
      - --scale-down-unready-time=20m0s
      - --new-pod-scale-up-delay=0s
      - --max-node-provision-time=15m0s
      - --cordon-node-before-terminating=true
      - --logtostderr=true
      - --stderrthreshold=info
      - --v=4

Events

❯ k get events --sort-by='.lastTimestamp'
LAST SEEN   TYPE      REASON              OBJECT                              MESSAGE
19s         Warning   FailedScheduling    pod/test-fbozic-7c6cf6775c-278vn    0/6 nodes are available: 3 node(s) didn't match pod anti-affinity rules, 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/6 nodes are available: 3 No preemption victims found for incoming pod, 3 Preemption is not helpful for scheduling.
19s         Warning   FailedScheduling    pod/test-fbozic-7c6cf6775c-jfsfm    0/6 nodes are available: 3 node(s) didn't match pod anti-affinity rules, 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/6 nodes are available: 3 No preemption victims found for incoming pod, 3 Preemption is not helpful for scheduling.
19s         Warning   FailedScheduling    pod/test-fbozic-7c6cf6775c-wr8h8    0/6 nodes are available: 3 node(s) didn't match pod anti-affinity rules, 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/6 nodes are available: 3 No preemption victims found for incoming pod, 3 Preemption is not helpful for scheduling.
19s         Normal    SuccessfulCreate    replicaset/test-fbozic-7c6cf6775c   Created pod: test-fbozic-7c6cf6775c-jfsfm
19s         Normal    SuccessfulCreate    replicaset/test-fbozic-7c6cf6775c   Created pod: test-fbozic-7c6cf6775c-wr8h8
19s         Normal    SuccessfulCreate    replicaset/test-fbozic-7c6cf6775c   Created pod: test-fbozic-7c6cf6775c-278vn
19s         Normal    ScalingReplicaSet   deployment/test-fbozic              Scaled up replica set test-fbozic-7c6cf6775c to 6 from 3
5s          Normal    TriggeredScaleUp    pod/test-fbozic-7c6cf6775c-278vn    pod triggered scale-up: [{https://www.googleapis.com/compute/v1/projects/my-project/zones/europe-west3-a/instanceGroups/a-nodes-europe-west3-my-fake-name-k8s-local 1->2 (max: 2)}]
5s          Normal    TriggeredScaleUp    pod/test-fbozic-7c6cf6775c-jfsfm    pod triggered scale-up: [{https://www.googleapis.com/compute/v1/projects/my-project/zones/europe-west3-a/instanceGroups/a-nodes-europe-west3-my-fake-name-k8s-local 1->2 (max: 2)}]
5s          Normal    TriggeredScaleUp    pod/test-fbozic-7c6cf6775c-wr8h8    pod triggered scale-up: [{https://www.googleapis.com/compute/v1/projects/my-project/zones/europe-west3-a/instanceGroups/a-nodes-europe-west3-my-fake-name-k8s-local 1->2 (max: 2)}]
fbozic commented 1 year ago

Regarding regional/zonal MIGs from the previous comment.
Here is a link to the TODO source code comment about migration to the regional MIGs: https://github.com/kubernetes/kops/blob/29dbd14c74e0168be5707f170babadc94d923c4c/pkg/model/gcemodel/autoscalinggroup.go#L207
Here is terraform resource and it supports distribution across zones: https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_region_instance_group_manager

hakman commented 1 year ago

Regarding 1 MIG per zone, that was more or less related to cluster-autoscaler limitations. To actually work well, it needs to control in which MIG to put the instance, otherwise it's random (based on cloud availability).