kube-hetzner / terraform-hcloud-kube-hetzner

Optimized and Maintenance-free Kubernetes on Hetzner Cloud in one command!
MIT License
2.4k stars 371 forks source link

[Bug]: Cluster not creating properly #1556

Open herrinternet opened 4 days ago

herrinternet commented 4 days ago

Description

For quite some time i am trying to get the following Cluster-Configuration to work:

My plan ist to have only wireguard ports open in the firewall to access the cluster via vpn-only for security reasons. I have several issues with the cluster-creation first i had a lot of problems with the automatic creation of nodes with selinux, so i deactivated selinux in the kube.tf. After that i get the following error:

module.kube-hetzner.null_resource.agents["0-0-worker-pool"] (remote-exec): Failed to enable unit: Unit iscsid.service does not exist

Since it did not install the package-properly i went in and installed the packages by hand with

transactional-update pkg install open-iscsi
transactional-update apply
reboot

But that did not fix it either.

That leads to the creation of the cluster (3 master-nodes and one worker) but the longhorn and autoscaler pods and services are not created since i think these need open-iscsci to work properly.

I have changed so many options (enable and disable wireguard for example) i redid the cluster-config about 30 times. I even tried the chatgpt-assistant to help me create a valid configuration for my requirenments but it always failed creating the right syntax for the kube.tf (mainly problems with the taints and labels as a string map for example).

Can you help my find out what is wrong in my configuration and what i have to change to get the cluster for the private-cloud to work? Thank you very much.

Additional information:

Thank you very much for your time and effort to create such a wonderfull program - would love to use it best regards Gregor

Kube.tf file

locals {
  hcloud_token = "xxxxxx"
}

module "kube-hetzner" {
  providers = {
    hcloud = hcloud
  }
  hcloud_token = var.hcloud_token != "" ? var.hcloud_token : local.hcloud_token
  source = "kube-hetzner/kube-hetzner/hcloud"
  version = "2.15.4"  # Verwenden Sie die aktuellste Version aus der Dokumentation
  ssh_public_key = file("~/.ssh/id_ed25519.pub")
  ssh_private_key = file("~/.ssh/id_ed25519")
  disable_selinux = true
  network_region = "eu-central"
  enable_klipper_metal_lb = true
  load_balancer_disable_public_network = true
  ingress_controller = "none"

#  enable_wireguard = true

  # Control Plane Konfiguration
  control_plane_nodepools = [
    {
      name        = "control-plane-nbg1",
      server_type = "cx22",
      location    = "nbg1",
      labels      = [],
      taints      = [],
      count       = 1,
      backups     = true
    },
    {
      name        = "control-plane-fsn1",
      server_type = "cx22",
      location    = "nbg1",
      labels      = [],
      taints      = [],
      count       = 1,
      backups     = true
    },
    {
      name        = "control-plane-fsn1-2",
      server_type = "cx22",
      location    = "fsn1",
      labels      = [],
      taints      = [],
      count       = 1,
      backups     = true
    }
  ]

  # Worker-Pool Konfiguration
  agent_nodepools = [
    {
      name        = "worker-pool",
      server_type = "cpx31",
      location    = "fsn1",
      labels      = ["node.kubernetes.io/role=worker"],
      taints      = ["longhorn-storage-exclusion=true:NoSchedule"],
      count       = 1
    }
  ]

  # Autoscaler-Nodepools für Longhorn
  autoscaler_nodepools = [
    {
      name        = "worker-autoscaler",
      server_type = "cpx31",
      location    = "fsn1",
      min_nodes   = 1,
      max_nodes   = 10,
      labels      = {
        "node.kubernetes.io/role" = "worker"
      },
      taints      = [
        {
          key    = "longhorn-storage-exclusion"
          value  = "true"
          effect = "NoSchedule"
        }
      ]
    },
    {
      name        = "longhorn-autoscale",
      server_type = "cpx21",                 # Kleinere Instanz für Longhorn-Storage
      location    = "fsn1",
      min_nodes   = 3,                       # Mindestanzahl für Redundanz
      max_nodes   = 10,
      labels      = {
        "node.kubernetes.io/role" = "longhorn-storage"
      },
      taints      = [],                      # Keine Taints, damit Longhorn Speicher nutzen kann
      backups     = true
    }
  ]

  # Longhorn Konfiguration
  enable_longhorn = true
  enable_iscsid   = true
  longhorn_values = <<EOT
defaultSettings:
  nodeSelector:
    node.kubernetes.io/role: longhorn-storage
EOT

  # Firewall, DNS und andere Einstellungen
  dns_servers = [
    "1.1.1.1",
    "8.8.8.8",
    "2606:4700:4700::1111"
  ]
}

provider "hcloud" {
  token = var.hcloud_token != "" ? var.hcloud_token : local.hcloud_token
}

terraform {
  required_version = ">= 1.5.0"
  required_providers {
    hcloud = {
      source  = "hetznercloud/hcloud"
      version = ">= 1.43.0"
    }
  }
}

output "kubeconfig" {
  value     = module.kube-hetzner.kubeconfig
  sensitive = true
}

variable "hcloud_token" {
  sensitive = true
  default   = ""
}

Screenshots

No response

Platform

Ubuntu Linux Server Image 24.04 on Hetzner-Cloud