kube-hetzner / terraform-hcloud-kube-hetzner

Optimized and Maintenance-free Kubernetes on Hetzner Cloud in one command!
MIT License
2.18k stars 343 forks source link

Missing "cluster-init" option in config.yaml in the only control plane node. #1294

Open mateuszlewko opened 5 months ago

mateuszlewko commented 5 months ago

Description

According to steps for restoring the cluster, one of the control plane nodes should have "cluster-init: true" set in /etc/rancher/k3s/config.yaml. I have a cluster with 1 c-p node and inspecting the config.yaml file shows there is no such option set there.

Do you perhaps have an idea why is that? It's a freshly created cluster with the config below.

Kube.tf file

module "kube-hetzner" {
  providers = {
    hcloud = hcloud
  }
  hcloud_token = ...

  source  = "kube-hetzner/kube-hetzner/hcloud"
  version = "2.13.4"

  # For details on SSH see https://github.com/kube-hetzner/kube-hetzner/blob/master/docs/ssh.md
  ssh_public_key  = ...
  ssh_private_key = ...

  # For Hetzner locations see https://docs.hetzner.com/general/others/data-centers-and-connection/
  network_region = "eu-central" # change to `us-east` if location is ash

  control_plane_nodepools = [
    {
      name        = "control-plane-fsn1",
      server_type = "cax11",
      location    = "fsn1",
      labels      = [],
      taints      = [],
      count       = 1
    },
    {
      name        = "control-plane-nbg1",
      server_type = "cax11",
      location    = "nbg1",
      labels      = [],
      taints      = [],
      count       = 0
    },
    {
      name        = "control-plane-hel1",
      server_type = "cax11",
      location    = "hel1",
      labels      = [],
      taints      = [],
      count       = 0
    }
  ]

  agent_nodepools = [
    {
      name        = "agent-cax21-hel1",
      server_type = "cax21",
      location    = "hel1",
      labels      = [],
      taints      = [],
      count       = 1
    },
    {
      name        = "agent-cax11-nbg1",
      server_type = "cax11",
      location    = "nbg1",
      labels      = [],
      taints      = [],
      count       = 0
    },
  ]

  enable_wireguard = true

  # https://www.hetzner.com/cloud/load-balancer
  load_balancer_type     = "lb11"
  load_balancer_location = "fsn1"

  # See how to configure agent nodepools for longhorn here https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/discussions/373#discussioncomment-3983159
  # Also see Longhorn best practices here https://gist.github.com/ifeulner/d311b2868f6c00e649f33a72166c2e5b
  enable_longhorn = true

  # If you want to configure additional trusted IPs for traefik, enter them here as a list of IPs (strings).
  # Example for Cloudflare:
  traefik_additional_trusted_ips = [...]

  # For all options see: https://kured.dev/docs/configuration/
  kured_options = {
    "reboot-days" : "sa",
    "start-time" : "8am",
    "end-time" : "2pm",
    "time-zone" : "Local",
    "lock-release-delay" : "30m",
    "drain-grace-period" : 180,
  }

  enable_cert_manager = true

  dns_servers = [
   ...
  ]

  use_control_plane_lb = false
  create_kubeconfig    = true
  create_kustomization = false

  etcd_s3_backup = {
    ....
  }
}

Screenshots

No response

Platform

Mac

mateuszlewko commented 5 months ago

I think the initial config created with null_resources.first_control_plane is overridden by null_resource.control_plane_config. Is this intended? Perhaps in locals.k3s-config we should add something like cluster-init: k == 0?

mysticaltech commented 5 months ago

@mateuszlewko Yes, indeed we override it as to not make the first control-plane special. Could you please explain your proposed solution for the restore flow, not sure I follow.

mateuszlewko commented 5 months ago

Hey,

I was referring to "Backup and restore a cluster" guide in https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner?tab=readme-ov-file#examples.

The postinstall_exec script there contains:

 export CLUSTERINIT=$(cat /etc/rancher/k3s/config.yaml | grep -i '"cluster-init": true')
      if [ -n "$CLUSTERINIT" ]; then
        echo indeed this is the first control plane node > /tmp/restorenotes

which kind of assumes the first control plane node is special.

mysticaltech commented 5 months ago

@mateuszlewko Ah yes, so that needs to change, PR welcome to correct this example, please.