kube-hetzner / terraform-hcloud-kube-hetzner

Optimized and Maintenance-free Kubernetes on Hetzner Cloud in one command!
MIT License
2.18k stars 343 forks source link

[Bug]: Autoscaled servers not being removed #1430

Closed dihmeetree closed 1 month ago

dihmeetree commented 1 month ago

Description

I started off with a configuration of 4 minimum autoscaled servers. Those 4 servers were deployed successfully when I did terraform apply, but then, I switched the minimum to 0 and the servers are not being removed. Here are the logs. I don't see any specific errors?

I0727 21:26:44.728567       1 hetzner_node_group.go:567] Set node group draining-node-pool size from 0 to 0, expected delta 0
I0727 21:26:44.728748       1 taints.go:406] Removing autoscaler soft taint when creating template from node
I0727 21:26:44.728820       1 hetzner_node_group.go:348] Build node group label for draining-node-pool
I0727 21:26:44.728830       1 hetzner_node_group.go:362] draining-node-pool nodegroup labels: map[beta.kubernetes.io/instance-type:cx11 csi.hetzner.cloud/location:fsn1 hcloud/node-group:draining-node-pool kubernetes.io/arch:amd64 topology.kubernetes.io/region:fsn1]
I0727 21:26:44.728962       1 filter_out_schedulable.go:63] Filtering out schedulables
I0727 21:26:44.728977       1 filter_out_schedulable.go:120] 0 pods marked as unschedulable can be scheduled.
I0727 21:26:44.728985       1 filter_out_schedulable.go:83] No schedulable pods
I0727 21:26:44.728990       1 filter_out_daemon_sets.go:40] Filtering out daemon set pods
I0727 21:26:44.728994       1 filter_out_daemon_sets.go:49] Filtered out 0 daemon set pods, 0 unschedulable pods left
I0727 21:26:44.729009       1 static_autoscaler.go:565] No unschedulable pods
I0727 21:26:44.729026       1 static_autoscaler.go:588] Calculating unneeded nodes
I0727 21:26:44.729036       1 pre_filtering_processor.go:57] Node k3s-control-plane-qdm should not be processed by cluster autoscaler (no node group config)
I0727 21:26:44.729041       1 pre_filtering_processor.go:57] Node k3s-control-plane-zqh should not be processed by cluster autoscaler (no node group config)
I0727 21:26:44.729049       1 pre_filtering_processor.go:57] Node k3s-storage-bpz should not be processed by cluster autoscaler (no node group config)
I0727 21:26:44.729053       1 pre_filtering_processor.go:57] Node k3s-agent-any should not be processed by cluster autoscaler (no node group config)
I0727 21:26:44.729056       1 pre_filtering_processor.go:57] Node k3s-agent-pkz should not be processed by cluster autoscaler (no node group config)
I0727 21:26:44.729060       1 pre_filtering_processor.go:57] Node k3s-control-plane-fgt should not be processed by cluster autoscaler (no node group config)
I0727 21:26:44.729063       1 pre_filtering_processor.go:57] Node k3s-egress-vkk should not be processed by cluster autoscaler (no node group config)
I0727 21:26:44.729102       1 klogx.go:87] Node k3s-autoscaled-31a71c60c1210905 - memory requested is 0% of allocatable
I0727 21:26:44.729126       1 klogx.go:87] Node k3s-autoscaled-32c200cb2e78e7ff - memory requested is 0% of allocatable
I0727 21:26:44.729151       1 klogx.go:87] Node k3s-autoscaled-43aea29ea1d44fd0 - memory requested is 0% of allocatable
I0727 21:26:44.729172       1 klogx.go:87] Node k3s-autoscaled-703d64d52203b0ac - memory requested is 0% of allocatable
I0727 21:26:44.729186       1 cluster.go:156] Simulating node k3s-autoscaled-31a71c60c1210905 removal
I0727 21:26:44.729207       1 cluster.go:174] node k3s-autoscaled-31a71c60c1210905 may be removed
I0727 21:26:44.729214       1 cluster.go:156] Simulating node k3s-autoscaled-32c200cb2e78e7ff removal
I0727 21:26:44.729220       1 cluster.go:174] node k3s-autoscaled-32c200cb2e78e7ff may be removed
I0727 21:26:44.729225       1 cluster.go:156] Simulating node k3s-autoscaled-43aea29ea1d44fd0 removal
I0727 21:26:44.729230       1 cluster.go:174] node k3s-autoscaled-43aea29ea1d44fd0 may be removed
I0727 21:26:44.729236       1 cluster.go:156] Simulating node k3s-autoscaled-703d64d52203b0ac removal
I0727 21:26:44.729243       1 cluster.go:174] node k3s-autoscaled-703d64d52203b0ac may be removed
I0727 21:26:44.729257       1 nodes.go:84] k3s-autoscaled-31a71c60c1210905 is unneeded since 2024-07-27 21:26:24.338981682 +0000 UTC m=+29.797832167 duration 20.389447596s
I0727 21:26:44.729268       1 nodes.go:84] k3s-autoscaled-32c200cb2e78e7ff is unneeded since 2024-07-27 21:26:24.338981682 +0000 UTC m=+29.797832167 duration 20.389447596s
I0727 21:26:44.729273       1 nodes.go:84] k3s-autoscaled-43aea29ea1d44fd0 is unneeded since 2024-07-27 21:26:24.338981682 +0000 UTC m=+29.797832167 duration 20.389447596s
I0727 21:26:44.729277       1 nodes.go:84] k3s-autoscaled-703d64d52203b0ac is unneeded since 2024-07-27 21:26:24.338981682 +0000 UTC m=+29.797832167 duration 20.389447596s
I0727 21:26:44.729307       1 static_autoscaler.go:631] Scale down status: lastScaleUpTime=2024-07-27 20:26:14.338078644 +0000 UTC m=-3580.203070881 lastScaleDownDeleteTime=2024-07-27 20:26:14.338078644 +0000 UTC m=-3580.203070881 lastScaleDownFailTime=2024-07-27 20:26:14.338078644 +0000 UTC m=-3580.203070881 scaleDownForbidden=false scaleDownInCooldown=false
I0727 21:26:44.729331       1 static_autoscaler.go:652] Starting scale down
I0727 21:26:44.729347       1 nodes.go:126] k3s-autoscaled-32c200cb2e78e7ff was unneeded for 20.389447596s
I0727 21:26:44.729355       1 nodes.go:126] k3s-autoscaled-43aea29ea1d44fd0 was unneeded for 20.389447596s
I0727 21:26:44.729361       1 nodes.go:126] k3s-autoscaled-703d64d52203b0ac was unneeded for 20.389447596s
I0727 21:26:44.729366       1 nodes.go:126] k3s-autoscaled-31a71c60c1210905 was unneeded for 20.389447596s
I0727 21:26:44.729441       1 orchestrator.go:315] ScaleUpToNodeGroupMinSize: NodeGroup draining-node-pool, TargetSize 0, MinSize 0, MaxSize 0
I0727 21:26:44.729456       1 orchestrator.go:315] ScaleUpToNodeGroupMinSize: NodeGroup k3s-autoscaled, TargetSize 4, MinSize 0, MaxSize 10
I0727 21:26:44.729463       1 orchestrator.go:359] ScaleUpToNodeGroupMinSize: scale up not needed

The logs are implying that the servers are going to be removed, but they aren't. (They are still there)

Screenshot 2024-07-27 at 5 30 36 PM

Any ideas what the issue may be?

Kube.tf file

locals {
  hcloud_token = "<redacted>"
}

module "kube-hetzner" {
  providers = {
    hcloud = hcloud
  }
  hcloud_token = var.hcloud_token != "" ? var.hcloud_token : local.hcloud_token

  source = "kube-hetzner/kube-hetzner/hcloud"

  ssh_public_key = file("~/.ssh/id_ed25519.pub")
  ssh_private_key = file("~/.ssh/id_ed25519")

  network_region = "us-east"

  control_plane_nodepools = [
    {
      name        = "control-plane"
      server_type = "cpx21"
      location    = "ash"
      labels      = []
      taints      = []
      count       = 3
    }
  ]

  agent_nodepools = [
    {
      name        = "agent"
      server_type = "cpx21"
      location    = "ash"
      labels      = []
      taints      = []
      count       = 2
    },
    {
      name        = "storage"
      server_type = "cpx31"
      location    = "ash"
      labels      = [
        "node.kubernetes.io/server-usage=storage"
      ]
      taints      = []
      count       = 1
    },
    {
      name        = "egress"
      server_type = "cpx21"
      location    = "ash"
      labels = [
        "node.kubernetes.io/role=egress"
      ]
      taints = [
        "node.kubernetes.io/role=egress:NoSchedule"
      ]
      floating_ip = true
      count = 1
    },
  ]

  load_balancer_type     = "lb11"
  load_balancer_location = "ash"

  load_balancer_algorithm_type = "least_connections"

  load_balancer_health_check_interval = "5s"
  load_balancer_health_check_timeout = "5s"
  load_balancer_health_check_retries = 2

  autoscaler_nodepools = [
    {
      name        = "autoscaled"
      server_type = "cpx21"
      location    = "ash"
      min_nodes   = 0
      max_nodes   = 10
      labels      = {
        "node.kubernetes.io/role": "peak-workloads"
      }
      taints = [{
         key: "node.kubernetes.io/role"
         value: "peak-workloads"
         effect: "NoExecute"
      }]
    }
  ]

  cluster_autoscaler_extra_args = [
    "--ignore-daemonsets-utilization=true",
    "--enforce-node-group-min-size=true",
    "--ok-total-unready-count=1",
  ]

  dns_servers = [
    "1.1.1.1",
    "8.8.8.8",
    "2606:4700:4700::1111",
  ]

  firewall_kube_api_source = ["<redacted>/32"]

  firewall_ssh_source = ["<redacted>/32"]

}

provider "hcloud" {
  token = var.hcloud_token != "" ? var.hcloud_token : local.hcloud_token
}

terraform {
  required_version = ">= 1.5.0"
  required_providers {
    hcloud = {
      source  = "hetznercloud/hcloud"
      version = ">= 1.43.0"
    }
  }
}

output "kubeconfig" {
  value     = module.kube-hetzner.kubeconfig
  sensitive = true
}

variable "hcloud_token" {
  sensitive = true
  default   = ""
}

Screenshots

No response

Platform

Mac/Linux

dihmeetree commented 1 month ago

It seems like specifying the following fixes the issue?

  cluster_autoscaler_image = "registry.k8s.io/autoscaling/cluster-autoscaler"
  cluster_autoscaler_version = "v1.30.2"

Maybe the default is using an old/broken image or something?