kube-hetzner / terraform-hcloud-kube-hetzner

Optimized and Maintenance-free Kubernetes on Hetzner Cloud in one command!
MIT License
2.16k stars 342 forks source link

[Bug]: timed out waiting for the condition on deployments/system-upgrade-controller #1426

Closed lazarivkovic closed 1 month ago

lazarivkovic commented 1 month ago

Description

Hi folks, I am not sure if this bug. I tried all options, changed versions, destroy several times but nothing helps. I lost a lot of time and I need to seek help. In this configuration I am using 1 CAX11 (control plane) and 3 CX32 (agents).

I got this error:

module.kube-hetzner.null_resource.kustomization (remote-exec): error: timed out waiting for the condition on deployments/system-upgrade-controller ╷ │ Error: remote-exec provisioner error │ │ with module.kube-hetzner.null_resource.kustomization, │ on .terraform/modules/kube-hetzner/init.tf line 291, in resource "null_resource" "kustomization": │ 291: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_1230953466.sh": Process exited with status 1

Control plane and agents works fine.

~ # kubectl get pods -n system-upgrade NAME READY STATUS RESTARTS AGE system-upgrade-controller-bb8dd57b4-4rp5t 0/1 Pending 0 18m ~ # kubectl logs -n system-upgrade system-upgrade-controller-bb8dd57b4-4rp5t

(there aren't logs)

Kube.tf file

module "kube-hetzner" {
  source  = "kube-hetzner/kube-hetzner/hcloud"
  version = "2.14.0"

  providers = {
    hcloud = hcloud
  }

  hcloud_token    = var.cluster_hcloud_token
  ssh_public_key  = var.ssh_public_key != "" ? file(var.ssh_public_key) : ""
  ssh_private_key = var.ssh_private_key != "" ? file(var.ssh_private_key) : ""
  ssh_port        = var.ssh_port

  # k3s
  initial_k3s_channel = "v1.29"

  # cluster
  control_plane_nodepools = local.control_plane_nodepools
  agent_nodepools         = local.agent_nodepools
  cluster_name            = "void"

  # fw
  firewall_kube_api_source = var.source_ips

  # network
  load_balancer_type     = "lb11"
  load_balancer_location = "fsn1"
  cni_plugin = "cilium"

  # auto updates
  automatically_upgrade_os  = false
  automatically_upgrade_k3s = false

  # lb
  use_control_plane_lb = true

}

Screenshots

/tmp/terraform_1230953466.sh

!/bin/sh

set -ex sed -i 's/^- |[0-9]+$/- |/g' /var/post_install/kustomization.yaml timeout 360 bash <<EOF until [[ "\$(kubectl get --raw='/readyz' 2> /dev/null)" == "ok" ]]; do echo "Waiting for the cluster to become ready..." sleep 2 done EOF

kubectl apply -k /var/post_install echo 'Waiting for the system-upgrade-controller deployment to become available...' kubectl -n system-upgrade wait --for=condition=available --timeout=360s deployment/system-upgrade-controller sleep 7 kubectl -n system-upgrade apply -f /var/post_install/plans.yaml timeout 360 bash <<EOF until [ -n "\$(kubectl get -n traefik service/traefik --output=jsonpath='{.status.loadBalancer.ingress[0].ip}' 2> /dev/null)" ]; do echo "Waiting for load-balancer to get an IP..." sleep 2 done EOF

Platform

Mac

mysticaltech commented 1 month ago

@lazarivkovic two things, try commenting out firewall_kube_api_source and try again. Also if that does not work, destroy, terraform init -upgrade and try gain.

mysticaltech commented 1 month ago

Closing as stale.