kube-hetzner / terraform-hcloud-kube-hetzner

Optimized and Maintenance-free Kubernetes on Hetzner Cloud in one command!
MIT License
2.21k stars 345 forks source link

[Bug]: Cluster autoscaler not creating any nodes #1476

Closed p4block closed 1 week ago

p4block commented 1 week ago

Description

A nodepool with autoscaling is created with minimum nodes >0, but no nodes are created.

I tried deploying a fresh cluster with and without the image that fixes the CX11 deprecation problem

Kube.tf file

network_region = "eu-central" # change to `us-east` if location is ash

  control_plane_nodepools = [
    {
      name        = "control-plane-nbg1",
      server_type = "cax11",
      location    = "nbg1",
      labels      = [],
      taints      = [],
      count       = 1

    },
  ]

  agent_nodepools = [
    {
      name        = "workers",
      server_type = "cax11",
      location    = "fsn1",
      labels = [
        "node.kubernetes.io/role=workers"
      ],
      taints = [
      ],
      floating_ip = false
      count = 1
    },
    {
      name        = "ingress",
      server_type = "cax11",
      location    = "fsn1",
      labels = [
        "node.kubernetes.io/role=ingress"
      ],
      taints = [
        "node.kubernetes.io/role=ingress:NoSchedule"
      ],
      floating_ip = false
      count = 1
    },
  ]
  #load_balancer_type     = "lb11"
  load_balancer_type     = "lb11"

  load_balancer_location = "fsn1"

  autoscaler_nodepools = [
    {
      name        = "autoscaled-egress"
      server_type = "cax11"
      location    = "fsn1"
      min_nodes   = 3
      max_nodes   = 5
      labels = {
        "node.kubernetes.io/role" = "egress"
      }
      taints = [
        {
          key    = "node.kubernetes.io/role"
          value  = "egress"
          effect = "NoSchedule"
        }
      ]
    }
  ]

  cluster_autoscaler_image = "docker.io/hetznercloud/cluster-autoscaler"
  cluster_autoscaler_version = "v1.29.4-hcloud1"

Screenshots

│ I0909 18:32:01.723771       1 static_autoscaler.go:291] Starting main loop                                                                                                                                      │
│ I0909 18:32:01.723893       1 hetzner_node_group.go:599] Set node group k3s-autoscaled-egress size from 0 to 0, expected delta 0                                                                                │
│ I0909 18:32:01.724104       1 hetzner_node_group.go:380] Build node group label for k3s-autoscaled-egress                                                                                                       │
│ I0909 18:32:01.724129       1 hetzner_node_group.go:394] k3s-autoscaled-egress nodegroup labels: map[beta.kubernetes.io/instance-type:cax11 csi.hetzner.cloud/location:fsn1 hcloud/node-group:k3s-autoscaled-eg │
│ I0909 18:32:01.724365       1 filter_out_schedulable.go:63] Filtering out schedulables                                                                                                                          │
│ I0909 18:32:01.724391       1 filter_out_schedulable.go:120] 0 pods marked as unschedulable can be scheduled.                                                                                                   │
│ I0909 18:32:01.724410       1 filter_out_schedulable.go:83] No schedulable pods                                                                                                                                 │
│ I0909 18:32:01.724418       1 filter_out_daemon_sets.go:40] Filtering out daemon set pods                                                                                                                       │
│ I0909 18:32:01.724426       1 filter_out_daemon_sets.go:49] Filtered out 0 daemon set pods, 0 unschedulable pods left                                                                                           │
│ I0909 18:32:01.724465       1 static_autoscaler.go:548] No unschedulable pods                                                                                                                                   │
│ I0909 18:32:01.724506       1 static_autoscaler.go:571] Calculating unneeded nodes                                                                                                                              │
│ I0909 18:32:01.724524       1 pre_filtering_processor.go:57] Node k3s-control-plane-nbg1-boo should not be processed by cluster autoscaler (no node group config)                                               │
│ I0909 18:32:01.724538       1 pre_filtering_processor.go:57] Node k3s-ingress-yhe should not be processed by cluster autoscaler (no node group config)                                                          │
│ I0909 18:32:01.724551       1 pre_filtering_processor.go:57] Node k3s-workers-cqp should not be processed by cluster autoscaler (no node group config)                                                          │
│ I0909 18:32:01.724612       1 static_autoscaler.go:614] Scale down status: lastScaleUpTime=2024-09-09 17:26:49.693658128 +0000 UTC m=-3597.837867675 lastScaleDownDeleteTime=2024-09-09 17:26:49.693658128 +000 │
│ I0909 18:32:03.743981       1 reflector.go:800] k8s.io/client-go/informers/factory.go:159: Watch close - *v1.Job total 47 items received                                                                        │
│ I0909 18:32:08.743847       1 reflector.go:800] k8s.io/client-go/informers/factory.go:159: Watch close - *v1.Node total 26 items received                                                                       │
│ I0909 18:32:09.738820       1 reflector.go:800] k8s.io/client-go/informers/factory.go:159: Watch close - *v1.CSIStorageCapacity total 6 items received                                                          │

Platform

Linux

p4block commented 1 week ago

~Just noticed that this may be caused due to account limits...~

Removed half my stuff, still isn't working :/

p4block commented 1 week ago
  cluster_autoscaler_extra_args = [
    "--enforce-node-group-min-size=true",
  ]

As seen in https://github.com/kubernetes/autoscaler/issues/6564 this is intended behavior. The fix is even commented out in the kube.tf

IMO it should be default, default ignore makes no sense, even if it is how upstream works.