kube-hetzner / terraform-hcloud-kube-hetzner

Optimized and Maintenance-free Kubernetes on Hetzner Cloud in one command!
MIT License
2.24k stars 345 forks source link

Autoscaler does not scale up. predicate checking error: Insufficient ephemeral-storage. #1231

Closed schlichtanders closed 7 months ago

schlichtanders commented 7 months ago

Description

I just realized that my kube-hetzner cluster does not scale anylonger. As far as I remember it was actually working for a long time (could be that I just thought it was working, but never really saw it).

Inspecting journalctl logs I found that the key error seems to be inside systemd k3s-agent.service. It logs:

k3s-agent.service: Failed to locate executable /usr/local/bin/k3s: Permission denied

I made a ls -l to inspect the directory, showing

# ls -l /usr/local/bin
total 58068
lrwxrwxrwx. 1 root root        3 Feb 24 09:32 crictl -> k3s
lrwxrwxrwx. 1 root root        3 Feb 24 09:32 ctr -> k3s
-rwxr-xr-x. 1 root root 59441152 Feb 24 09:32 k3s
-rwxr-xr-x. 1 root root     1565 Feb 24 09:32 k3s-agent-uninstall.sh
-rwxr-xr-x. 1 root root     2274 Feb 24 09:32 k3s-killall.sh
lrwxrwxrwx. 1 root root        3 Feb 24 09:32 kubectl -> k3s

Which looks fine, doesn't it?

Googling shows this issue https://github.com/k3s-io/k3s/issues/5903 which links to a workaround ln -s /data/k3s /var/lib/rancher/k3s. I have no clue how this relates.


I am really buffled to see such critical bugs of the cluster because I thought I fixed almost any version I could find. Hence two questions:

EDIT: New Problem, autoscaler still not working

After updating everything to the most recent kube-hetzner (including microos snapshots), the above error disappears, however a new one still prevents the autoscaling from working. The autoscaler logs show something like:

fails because of `k3s-agent.service: Failed to locate executable /usr/local/bin/k3s: Permission denied

Which prevents the scale up.

### Kube.tf file ```terraform locals { # You have the choice of setting your Hetzner API token here or define the TF_VAR_hcloud_token env # within your shell, such as such: export TF_VAR_hcloud_token=xxxxxxxxxxx # If you choose to define it in the shell, this can be left as is. # Your Hetzner token can be found in your Project > Security > API Token (Read & Write is required). hcloud_token = "xxxxxxxxxxx" # to get the corresponding etcd_version for a k3s version you need to # - start k3s # - run `curl -L --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key https://127.0.0.1:2379/version` # for details see https://gist.github.com/superseb/0c06164eef5a097c66e810fe91a9d408 etcd_version = "v3.5.9" initial_k3s_channel = "v1.26" } module "kube-hetzner" { providers = { hcloud = hcloud } hcloud_token = var.hcloud_token != "" ? var.hcloud_token : local.hcloud_token k3s_token = var.k3s_token # using restoration, the timing for the nodes to respond is a bit larger # load_balancer_health_check_interval = "45s" # load_balancer_health_check_timeout = "30s" # load_balancer_health_check_retries = "40" # Then fill or edit the below values. Only the first values starting with a * are obligatory; the rest can remain with their default values, or you # could adapt them to your needs. # The cluster name, by default "k3s" cluster_name = var.cluster_name ################### # FIXING VERSIONS # ################### # * For local dev, path to the git repo # source = "../../kube-hetzner/" # If you want to use the latest master branch # source = "github.com/schlichtanders/terraform-hcloud-kube-hetzner?ref=load-balancer-create" # source = "github.com/schlichtanders/terraform-hcloud-kube-hetzner?ref=postinstall_exec2" # # For normal use, this is the path to the terraform registry # # You can optionally specify a version number - for the registry source = "kube-hetzner/kube-hetzner/hcloud" version = "2.9.2" # If you want to use a specific Hetzner CCM and CSI version, set them below; otherwise, leave them as-is for the latest versions. # https://github.com/hetznercloud/hcloud-cloud-controller-manager hetzner_ccm_version = "v1.18.0" # buggy? # hetzner_ccm_version = "v1.17.2" # https://github.com/hetznercloud/csi-driver hetzner_csi_version = "v2.5.1" # If you want to specify the Kured version, set it below - otherwise it'll use the latest version available. # https://github.com/kubereboot/kured kured_version = "1.14.0" # Allows you to specify either stable, latest, testing or supported minor versions. # see https://rancher.com/docs/k3s/latest/en/upgrades/basic/ and https://update.k3s.io/v1-release/channels # ⚠️ If you are going to use Rancher addons for instance, it's always a good idea to fix the kube version to latest - 0.01, # ⚠️ Rancher currently only supports v1.25 and earlier versions: https://github.com/rancher/rancher/issues/41113 # The default is "v1.26". initial_k3s_channel = local.initial_k3s_channel # You can choose the version of Calico that you want. By default, the latest is used. # More info on available versions can be found at https://github.com/projectcalico/calico/releases # Please note that if you are getting 403s from Github, it's also useful to set the version manually. However there is rarely a need for that! calico_version = "v3.26.1" ####################### # END FIXING VERSIONS # ####################### # Note that some values, notably "location" and "public_key" have no effect after initializing the cluster. # This is to keep Terraform from re-provisioning all nodes at once, which would lose data. If you want to update # those, you should instead change the value here and manually re-provision each node. Grep for "lifecycle". # Customize the SSH port (by default 22) # ssh_port = 2222 # * Your ssh public key ssh_public_key = file(var.ssh_public_key_file) # * Your private key must be "ssh_private_key = null" when you want to use ssh-agent for a Yubikey-like device authentification or an SSH key-pair with a passphrase. # For more details on SSH see https://github.com/kube-hetzner/kube-hetzner/blob/master/docs/ssh.md ssh_private_key = file(var.ssh_private_key_file) # You can add additional SSH public Keys to grant other team members root access to your cluster nodes. # ssh_additional_public_keys = [] # You can also add additional SSH public Keys which are saved in the hetzner cloud by a label. # See https://docs.hetzner.cloud/#label-selector # ssh_hcloud_key_label = "role=admin" # If you want to use an ssh key that is already registered within hetzner cloud, you can pass its id. # If no id is passed, a new ssh key will be registered within hetzner cloud. # It is important that exactly this key is passed via `ssh_public_key` & `ssh_private_key` vars. hcloud_ssh_key_id = var.hcloud_ssh_key_id # These can be customized, or left with the default values # * For Hetzner locations see https://docs.hetzner.com/general/others/data-centers-and-connection/ network_region = "eu-central" # change to `us-east` if location is ash # If you must change the network CIDR you can do so below, but it is highly advised against. # network_ipv4_cidr = "10.0.0.0/8" # If you must change the cluster CIDR you can do so below, but it is highly advised against. # Cluster CIDR must be a part of the network CIDR! # cluster_ipv4_cidr = "10.42.0.0/16" # For the control planes, at least three nodes are the minimum for HA. Otherwise, you need to turn off the automatic upgrades (see README). # **It must always be an ODD number, never even!** Search the internet for "splitbrain problem with etcd" or see https://rancher.com/docs/k3s/latest/en/installation/ha-embedded/ # For instance, one is ok (non-HA), two is not ok, and three is ok (becomes HA). It does not matter if they are in the same nodepool or not! So they can be in different locations and of various types. # Of course, you can choose any number of nodepools you want, with the location you want. The only constraint on the location is that you need to stay in the same network region, Europe, or the US. # For the server type, the minimum instance supported is cpx11 (just a few cents more than cx11); see https://www.hetzner.com/cloud. # IMPORTANT: Before you create your cluster, you can do anything you want with the nodepools, but you need at least one of each, control plane and agent. # Once the cluster is up and running, you can change nodepool count and even set it to 0 (in the case of the first control-plane nodepool, the minimum is 1). # You can also rename it (if the count is 0), but do not remove a nodepool from the list. # The only nodepools that are safe to remove from the list are at the end. That is due to how subnets and IPs get allocated (FILO). # You can, however, freely add other nodepools at the end of each list if you want. The maximum number of nodepools you can create combined for both lists is 255. # Also, before decreasing the count of any nodepools to 0, it's essential to drain and cordon the nodes in question. Otherwise, it will leave your cluster in a bad state. # Before initializing the cluster, you can change all parameters and add or remove any nodepools. You need at least one nodepool of each kind, control plane, and agent. # The nodepool names are entirely arbitrary, you can choose whatever you want, but no special characters or underscore, and they must be unique; only alphanumeric characters and dashes are allowed. # If you want to have a single node cluster, have one control plane nodepools with a count of 1, and one agent nodepool with a count of 0. # Please note that changing labels and taints after the first run will have no effect. If needed, you can do that through Kubernetes directly. # ⚠️ When choosing ARM cax* server types, for the moment they are only available in fsn1. # Muli-architecture clusters are OK for most use cases, as container underlying images tend to be multi-architecture too. # * Example below: control_plane_nodepools = [ { # name = "control-plane" name = "control-${var.default_location}-cax41", server_type = "cax11", location = var.default_location, labels = [ # no longer needed, but may be interesting when using klipper as the load balancer # as this seems to be the final option which counts # taken from https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/issues/447#issuecomment-1350597300 # "node.kubernetes.io/exclude-from-external-load-balancers=true", ], taints = [], count = 1 # Enable automatic backups via Hetzner (default: false) # backups = true }, ] agent_nodepools = [ # Arm based nodes, currently available only in FSN location { # name = "agent-arm-largest" name = "agent-${var.default_location}-cax41", server_type = "cax41", location = var.default_location, labels = [], taints = [], count = 1, } ] # Add custom control plane configuration options here. # E.g to enable monitoring for etcd, proxy etc: # control_planes_custom_config = { # etcd-expose-metrics = true, # kube-controller-manager-arg = "bind-address=0.0.0.0", # kube-proxy-arg ="metrics-bind-address=0.0.0.0", # kube-scheduler-arg = "bind-address=0.0.0.0", # } # You can enable encrypted wireguard for the CNI by setting this to "true". Default is "false". # FYI, Hetzner says "Traffic between cloud servers inside a Network is private and isolated, but not automatically encrypted." # Source: https://docs.hetzner.com/cloud/networks/faq/#is-traffic-inside-hetzner-cloud-networks-encrypted # It works with all CNIs that we support. # Just note, that if Cilium with cilium_values, the responsability of enabling of disabling Wireguard falls on you. enable_wireguard = true # * LB location and type, the latter will depend on how much load you want it to handle, see https://www.hetzner.com/cloud/load-balancer load_balancer_type = "lb11" load_balancer_location = var.default_location ### The following values are entirely optional (and can be removed from this if unused) # Cluster Autoscaler # Providing at least one map for the array enables the cluster autoscaler feature, default is disabled # By default we set a compatible version with the default initial_k3s_channel, to set another one, # have a look at the tag value in https://github.com/kubernetes/autoscaler/blob/master/charts/cluster-autoscaler/values.yaml # ⚠️ Based on how the autoscaler works with this project, you can only choose either x86 instances or ARM server types for ALL autocaler nodepools. # Also, as mentioned above, for the time being ARM cax* instances are only available in fsn1. # If you are curious, it's ok to have a multi-architecture cluster, as most underlying container images are multi-architecture too. # * Example below: autoscaler_nodepools = [ { # "ca", as short for "cluster-autoscaler" - this is the common abbreviation # this needs to be really short, because a string like "-432f51dcc918aeba" is appended, # and the total string must be 63 characters maximum! # SUPER IMPORTANT: The prefix is used internaly to distinguish autoscaler nodes from other node types # search for "${var.cluster_name}-ca-" and change it too if you change the name here name = "ca-${var.default_location}-cax41" # it seems the node # name = "autoscaled-arm-largest" server_type = "cax41" location = var.default_location min_nodes = 0 # somehow a first autoscaler node is spawned even if `min_nodes = 0`, for updates see https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/issues/756 max_nodes = 4 } ] # Enable etcd snapshot backups to S3 storage. # Just provide a map with the needed settings (according to your S3 storage provider) and backups to S3 will # be enabled (with the default settings for etcd snapshots). # Cloudflare's R2 offers 10GB, 10 million reads and 1 million writes per month for free. # For proper context, have a look at https://docs.k3s.io/backup-restore. etcd_s3_backup = { etcd-s3-endpoint = var.etcd_s3_endpoint etcd-s3-access-key = var.etcd_s3_access_key etcd-s3-secret-key = var.etcd_s3_secret_key etcd-s3-bucket = var.etcd_s3_bucket } # To enable Hetzner Storage Box support, you can enable csi-driver-smb, default is "false". # enable_csi_driver_smb = true # To use local storage on the nodes, you can enable Longhorn, default is "false". # See a full recap on how to configure agent nodepools for longhorn here https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/discussions/373#discussioncomment-3983159 # Also see Longhorn best practices here https://gist.github.com/ifeulner/d311b2868f6c00e649f33a72166c2e5b enable_longhorn = true # By default, longhorn is pulled from https://charts.longhorn.io. # If you need a version of longhorn which assures compatibility with rancher you can set this variable to https://charts.rancher.io. # longhorn_repository = "https://charts.rancher.io" # The namespace for longhorn deployment, default is "longhorn-system". # longhorn_namespace = "longhorn-system" # The file system type for Longhorn, if enabled (ext4 is the default, otherwise you can choose xfs). # longhorn_fstype = "xfs" # how many replica volumes should longhorn create (default is 3). longhorn_replica_count = 2 # When you enable Longhorn, you can go with the default settings and just modify the above two variables OR you can add a longhorn_values variable # with all needed helm values, see towards the end of the file in the advanced section. # If that file is present, the system will use it during the deploy, if not it will use the default values with the two variable above that can be customized. # After the cluster is deployed, you can always use HelmChartConfig definition to tweak the configuration. # Also, you can choose to use a Hetzner volume with Longhorn. By default, it will use the nodes own storage space, but if you add an attribute of # longhorn_volume_size (⚠️ not a variable, just a possible agent nodepool attribute) with a value between 10 and 10000 GB to your agent nodepool definition, it will create and use the volume in question. # See the agent nodepool section for an example of how to do that. # To disable Hetzner CSI storage, you can set the following to "true", default is "false". # disable_hetzner_csi = true # If you want to use a specific Hetzner CCM and CSI version, set them below; otherwise, leave them as-is for the latest versions. # https://github.com/hetznercloud/hcloud-cloud-controller-manager # hetzner_ccm_version = "v1.15.0" # https://github.com/hetznercloud/csi-driver # hetzner_csi_version = "v2.3.2" # If you want to specify the Kured version, set it below - otherwise it'll use the latest version available. # https://github.com/kubereboot/kured # kured_version = "1.13.1" # If you want to enable the Nginx ingress controller (https://kubernetes.github.io/ingress-nginx/) instead of Traefik, you can set this to "nginx". Default is "traefik". # By the default we load optimal Traefik and Nginx ingress controller config for Hetzner, however you may need to tweak it to your needs, so to do, # we allow you to add a traefik_values and nginx_values, see towards the end of this file in the advanced section. # After the cluster is deployed, you can always use HelmChartConfig definition to tweak the configuration. # If you want to disable both controllers set this to "none" # ingress_controller = "nginx" # You can change the number of replicas for selected ingress controller here. The default 0 means autoselecting based on number of agent nodes (1 node = 1 replica, 2 nodes = 2 replicas, 3+ nodes = 3 replicas) # ingress_replica_count = 1 # Use the klipperLB (similar to metalLB), instead of the default Hetzner one, that has an advantage of dropping the cost of the setup. # Automatically "true" in the case of single node cluster (as it does not make sense to use the Hetzner LB in that situation). # It can work with any ingress controller that you choose to deploy. # Please note that because the klipperLB points to all nodes, we automatically allow scheduling on the control plane when it is active. # enable_klipper_metal_lb = "true" # If you want to configure additional arguments for traefik, enter them here as a list and in the form of traefik CLI arguments; see https://doc.traefik.io/traefik/reference/static-configuration/cli/ # They are the options that go into the additionalArguments section of the Traefik helm values file. # Example: traefik_additional_options = ["--log.level=DEBUG", "--tracing=true"] # traefik_additional_options = [] # By default traefik is configured to redirect http traffic to https, you can set this to "false" to disable the redirection. # traefik_redirect_to_https = false # If you want to disable the metric server set this to "false". Default is "true". # enable_metrics_server = false # If you want to allow non-control-plane workloads to run on the control-plane nodes, set this to "true". The default is "false". # True by default for single node clusters, and when enable_klipper_metal_lb is true. In those cases, the value below will be ignored. # allow_scheduling_on_control_plane = true # If you want to disable the automatic upgrade of k3s, you can set below to "false". # Ideally, keep it on, to always have the latest Kubernetes version, but lock the initial_k3s_channel to a kube major version, # of your choice, like v1.25 or v1.26. That way you get the best of both worlds without the breaking changes risk. # For production use, always use an HA setup with at least 3 control-plane nodes and 2 agents, and keep this on for maximum security. # The default is "true" (in HA setup i.e. at least 3 control plane nodes & 2 agents, just keep it enabled since it works flawlessly). # automatically_upgrade_k3s = false # The default is "true" (in HA setup it works wonderfully well, with automatic roll-back to the previous snapshot in case of an issue). # IMPORTANT! For non-HA clusters i.e. when the number of control-plane nodes is < 3, you have to turn it off. automatically_upgrade_os = false # If you need more control over kured and the reboot behaviour, you can pass additional options to kured. # For example limiting reboots to certain timeframes. For all options see: https://kured.dev/docs/configuration/ # The default options are: `--reboot-command=/usr/bin/systemctl reboot --pre-reboot-node-labels=kured=rebooting --post-reboot-node-labels=kured=done --period=5m` # Defaults can be overridden by using the same key. # kured_options = { # "reboot-days": "su" # "start-time": "3am" # "end-time": "8am" # "time-zone": "Local" # } # Allows you to specify either stable, latest, testing or supported minor versions. # see https://rancher.com/docs/k3s/latest/en/upgrades/basic/ and https://update.k3s.io/v1-release/channels # ⚠️ If you are going to use Rancher addons for instance, it's always a good idea to fix the kube version to latest - 0.01, # ⚠️ Rancher currently only supports v1.25 and earlier versions: https://github.com/rancher/rancher/issues/41113 # The default is "v1.26". # initial_k3s_channel = "stable" # Whether to use the cluster name in the node name, in the form of {cluster_name}-{nodepool_name}, the default is "true". # use_cluster_name_in_node_name = false # Extra k3s registries. This is useful if you have private registries and you want to pull images without additional secrets. # Or if you want to proxy registries for various reasons like rate-limiting. # It will create the registries.yaml file, more info here https://docs.k3s.io/installation/private-registry. # Note that you do not need to get this right from the first time, you can update it when you want during the life of your cluster. # The default is blank. /* k3s_registries = <<-EOT mirrors: hub.my_registry.com: endpoint: - "hub.my_registry.com" configs: hub.my_registry.com: auth: username: username password: password EOT */ # Additional environment variables for the host OS on which k3s runs. See for example https://docs.k3s.io/advanced#configuring-an-http-proxy . # additional_k3s_environment = { # "CONTAINERD_HTTP_PROXY" : "http://your.proxy:port", # "CONTAINERD_HTTPS_PROXY" : "http://your.proxy:port", # "NO_PROXY" : "127.0.0.0/8,10.0.0.0/8,", # } # Additional commands to execute on the host OS before the k3s install, for example fetching and installing certs. # preinstall_exec = [ # "curl https://somewhere.over.the.rainbow/ca.crt > /root/ca.crt", # "trust anchor --store /root/ca.crt", # ] preinstall_exec = [ # This is adding node-taints to the config # it is safest to do it here, because node-taints may have already been added, and setting --node-taint attribute to k3s agent args will clean out all the other node-taints # we check whether the csi-node-driver can be loaded # if we get 403 response, this means that we got a bad IP # adapted from https://stackoverflow.com/questions/53526188/can-i-have-curl-print-just-the-response-code # simply fail if a wrong IP could be identified # (this prevents request to hcloud api as long as this node still exists) # node that `%%{` is the escaped version of a literal `%{` <<-EOF RESPONSE_CODE=$(curl -IL --silent --write-out "%%{http_code}\n" -o /dev/null https://registry.k8s.io/v2/sig-storage/csi-node-driver-registrar/manifests/v2.7.0) echo "hostname = '$(hostname)'. prefix = '${var.cluster_name}-ca-'. RESPONSE_CODE = '$RESPONSE_CODE'" > /tmp/iamhere if [ $(hostname) != "${var.cluster_name}-ca-" ]; then # fail immediately if not on autoscaler, because the initial nodes really need to be clean [ "403" != "$RESPONSE_CODE" ] else # if on an autoscaler node, just taint the node, so that autoscaler does not fail and the node is cleanup later automatically # we can simply add a second config to be aggregated see https://docs.k3s.io/installation/configuration mkdir -p /etc/rancher/k3s/config.yaml.d [ "403" != "$RESPONSE_CODE" ] || cat > /etc/rancher/k3s/config.yaml.d/jolin.yaml < /etc/rancher/k3s/config.yaml.d/jolin.yaml < /tmp/restorenotes k3s server \ --cluster-reset \ --etcd-s3 \ --cluster-reset-restore-path=${var.etcd_snapshot_name} \ --etcd-s3-endpoint=${var.etcd_s3_endpoint} \ --etcd-s3-bucket=${var.etcd_s3_bucket} \ --etcd-s3-access-key=${var.etcd_s3_access_key} \ --etcd-s3-secret-key=${var.etcd_s3_secret_key} # renaming the k3s.yaml because it is used as a trigger for further downstream # changes. Better to let `k3s server` create it as expected. mv /etc/rancher/k3s/k3s.yaml /etc/rancher/k3s/k3s.backup.yaml # download etcd/etcdctl for adapting the kubernetes config before starting k3s ETCD_VER=${local.etcd_version} case "$(uname -m)" in aarch64) ETCD_ARCH="arm64" ;; x86_64) ETCD_ARCH="amd64" ;; esac; DOWNLOAD_URL=https://github.com/etcd-io/etcd/releases/download rm -f /tmp/etcd-$ETCD_VER-linux-$ETCD_ARCH.tar.gz curl -L $DOWNLOAD_URL/$ETCD_VER/etcd-$ETCD_VER-linux-$ETCD_ARCH.tar.gz -o /tmp/etcd-$ETCD_VER-linux-$ETCD_ARCH.tar.gz tar xzvf /tmp/etcd-$ETCD_VER-linux-$ETCD_ARCH.tar.gz -C /usr/local/bin --strip-components=1 rm -f /tmp/etcd-$ETCD_VER-linux-$ETCD_ARCH.tar.gz etcd --version etcdctl version # delete traefik service so that no load-balancer is accidently changed nohup etcd --data-dir /var/lib/rancher/k3s/server/db/etcd & echo $! > save_pid.txt etcdctl del /registry/services/specs/traefik/traefik etcdctl del /registry/services/endpoints/traefik/traefik # delete old nodes (they interfere with load balancer) # minions is the old name for "nodes" OLD_NODES=$(etcdctl get "" --prefix --keys-only | grep /registry/minions/ | cut -c 19-) for NODE in $OLD_NODES; do for KEY in $(etcdctl get "" --prefix --keys-only | grep $NODE); do etcdctl del $KEY done done kill -9 `cat save_pid.txt` rm save_pid.txt else echo this is not the first control plane node > /tmp/restorenotes fi EOF ] # firstcontrolplane_kubectlisready_exec = [ # "kubectl delete service/traefik -n traefik || true" # ] # Additional flags to pass to the k3s server command (the control plane). # k3s_exec_server_args = "--kube-apiserver-arg enable-admission-plugins=PodTolerationRestriction,PodNodeSelector" # we need to repeat the default kubelet-arg as CLI arguments take precendence. See https://docs.k3s.io/installation/configuration#configuration-file # default kubelet_arg = ["cloud-provider=external", "volume-plugin-dir=/var/lib/kubelet/volumeplugins"] # If you want to allow all outbound traffic you can set this to "false". Default is "true". # restrict_outbound_traffic = false # Adding extra firewall rules, like opening a port # More info on the format here https://registry.terraform.io/providers/hetznercloud/hcloud/latest/docs/resources/firewall extra_firewall_rules = [ # { # description = "For Postgres" # direction = "in" # protocol = "tcp" # port = "5432" # source_ips = ["0.0.0.0/0", "::/0"] # destination_ips = [] # Won't be used for this rule # }, # { # description = "To Allow ArgoCD (or ssh-keyscan) access to resources via SSH" # direction = "out" # protocol = "tcp" # port = "22" # source_ips = [] # Won't be used for this rule # destination_ips = ["0.0.0.0/0", "::/0"] # } { description = "Allow any outward access. To Allow ArgoCD (or ssh-keyscan) access to resources via SSH, access to Databases via special ports, etc." direction = "out" protocol = "tcp" port = "any" source_ips = [] # Won't be used for this rule destination_ips = ["0.0.0.0/0", "::/0"] } ] # If you want to configure a different CNI for k3s, use this flag # possible values: flannel (Default), calico, and cilium # As for Cilium, we allow infinite configurations via helm values, please check the CNI section of the readme over at https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/#cni. # Also, see the cilium_values at towards the end of this file, in the advanced section. # cni_plugin = "cilium" # You can choose the version of Calico that you want. By default, the latest is used. # More info on available versions can be found at https://github.com/projectcalico/calico/releases # Please note that if you are getting 403s from Github, it's also useful to set the version manually. However there is rarely a need for that! # calico_version = "v3.25.0" # If you want to disable the k3s default network policy controller, use this flag! # Both Calico and Ciliun cni_plugin values override this value to true automatically, the default is "false". # disable_network_policy = true # If you want to disable the automatic use of placement group "spread". See https://docs.hetzner.com/cloud/placement-groups/overview/ # We advise to not touch that setting, unless you have a specific purpose. # The default is "false", meaning it's enabled by default. # placement_group_disable = true # By default, we allow ICMP ping in to the nodes, to check for liveness for instance. If you do not want to allow that, you can. Just set this flag to true (false by default). # block_icmp_ping_in = true # You can enable cert-manager (installed by Helm behind the scenes) with the following flag, the default is "true". # enable_cert_manager = false # IP Addresses to use for the DNS Servers, set to an empty list to use the ones provided by Hetzner, defaults to ["1.1.1.1", "8.8.8.8", "9.9.9.9"]. # The number of different DNS servers is limited to 3 by Kubernetes itself. # dns_servers = [] # When this is enabled, rather than the first node, all external traffic will be routed via a control-plane loadbalancer, allowing for high availability. # The default is false. # see https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/discussions/302 use_control_plane_lb = false # Let's say you are not using the control plane LB solution above, and still want to have one hostname point to all your control-plane nodes. # You could create multiple A records of to let's say cp.cluster.my.org pointing to all of your control-plane nodes ips. # In which case, you need to define that hostname in the k3s TLS-SANs config to allow connection through it. It can be hostnames or IP addresses. # additional_tls_sans = ["cp.cluster.my.org"] # Oftentimes, you need to communicate to the cluster from inside the cluster itself, in which case it is important to set this value, as it will configure the hostname # at the load balancer level, and will save you from many slows downs when initiating communications from inside. Later on, you can point your DNS to the IP given # to the LB. And if you have other services pointing to it, you are also free to create CNAMES to point to it, or whatever you see fit. # If set, it will apply to either ingress controllers, Traefik or Ingress-Nginx. lb_hostname = var.lb_hostname # You can refine a base domain name to be use in this form of nodename.base_domain for setting the reserve dns inside Hetzner base_domain = var.lb_hostname # You can enable Rancher (installed by Helm behind the scenes) with the following flag, the default is "false". # ⚠️ Rancher currently only supports Kubernetes v1.25 and earlier, you will need to set initial_k3s_channel to a supported version: https://github.com/rancher/rancher/issues/41113 # When Rancher is enabled, it automatically installs cert-manager too, and it uses rancher's own self-signed certificates. # See for options https://rancher.com/docs/rancher/v2.0-v2.4/en/installation/resources/advanced/helm2/helm-rancher/#choose-your-ssl-configuration # The easiest thing is to leave everything as is (using the default rancher self-signed certificate) and put Cloudflare in front of it. # As for the number of replicas, by default it is set to the numbe of control plane nodes. # You can customized all of the above by adding a rancher_values variable see at the end of this file in the advanced section. # After the cluster is deployed, you can always use HelmChartConfig definition to tweak the configuration. # IMPORTANT: Rancher's install is quite memory intensive, you will require at least 4GB if RAM, meaning cx21 server type (for your control plane). # ALSO, in order for Rancher to successfully deploy, you have to set the "rancher_hostname". # enable_rancher = true # If using Rancher you can set the Rancher hostname, it must be unique hostname even if you do not use it. # If not pointing the DNS, you can just port-forward locally via kubectl to get access to the dashboard. # If you already set the lb_hostname above and are using a Hetzner LB, you do not need to set this one, as it will be used by default. # But if you set this one explicitly, it will have preference over the lb_hostname in rancher settings. # rancher_hostname = "rancher.xyz.dev" # When Rancher is deployed, by default is uses the "latest" channel. But this can be customized. # The allowed values are "stable" or "latest". # rancher_install_channel = "stable" # Finally, you can specify a bootstrap-password for your rancher instance. Minimum 48 characters long! # If you leave empty, one will be generated for you. # (Can be used by another rancher2 provider to continue setup of rancher outside this module.) # rancher_bootstrap_password = "" # Separate from the above Rancher config (only use one or the other). You can import this cluster directly on an # an already active Rancher install. By clicking "import cluster" choosing "generic", giving it a name and pasting # the cluster registration url below. However, you can also ignore that and apply the url via kubectl as instructed # by Rancher in the wizard, and that would register your cluster too. # More information about the registration can be found here https://rancher.com/docs/rancher/v2.6/en/cluster-provisioning/registered-clusters/ # rancher_registration_manifest_url = "https://rancher.xyz.dev/v3/import/xxxxxxxxxxxxxxxxxxYYYYYYYYYYYYYYYYYYYzzzzzzzzzzzzzzzzzzzzz.yaml" # Extra values that will be passed to the `extra-manifests/kustomization.yaml.tpl` if its present. # extra_kustomize_parameters={} # It is best practice to turn this off, but for backwards compatibility it is set to "true" by default. # See https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/issues/349 # When "false". The kubeconfig file can instead be created by executing: "terraform output --raw kubeconfig > cluster_kubeconfig.yaml" # Always be careful to not commit this file! # create_kubeconfig = false # Don't create the kustomize backup. This can be helpful for automation. # create_kustomization = false ### ADVANCED - Custom helm values for packages above (search _values if you want to located where those are mentioned upper in this file) # ⚠️ Inside the _values variable below are examples, up to you to find out the best helm values possible, we do not provide support for customized helm values. # Please understand that the indentation is very important, inside the EOTs, as those are proper yaml helm values. # We advise you to use the default values, and only change them if you know what you are doing! # Cilium, all Cilium helm values can be found at https://github.com/cilium/cilium/blob/master/install/kubernetes/cilium/values.yaml # The following is an example, please note that the current indentation inside the EOT is important. /* cilium_values = <

Screenshots

No response

Platform

Linux

schlichtanders commented 7 months ago

Just saw that this is kind of a duplicate. The error seems to happen on all nodes, not just autoscaler.

Can someone help me? Is it safe to just upgrade to the latest kube-hetzner version and run terraform apply? Will it update the autoscaler? (and hopefully not destroy anything else?)

mysticaltech commented 7 months ago

@schlichtanders If you are using v2.x it should be safe to update. Just run terraform init -upgrade and then terraform plan to confirm. And if you see it replaces the kustomization, it's safe and a good sign. If it wants to replace a node, it's the only thing that's a big no-no, but that's unlikely. And yes, that should update the autoscaler.

And after you have done that, you can also re-apply with changing initial_k3s_channel to to v1.28 and that should also be safe, it will upgrade without downtime, super smooth (the system upgrade controller just replaces the k3s binary and that's it, no need for a restart even). You can watch your nodes upgrade in realtime after applying by running watch kubectl get nodes.

That should hopefully fix it for you. Let us know!

schlichtanders commented 7 months ago

Just to be super sure. Does the following terraform plan include some changes to nodes?

terraform plan output ```tf Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: + create ~ update in-place -/+ destroy and then create replacement Terraform will perform the following actions: # module.kube-hetzner.hcloud_load_balancer.cluster[0] will be updated in-place ~ resource "hcloud_load_balancer" "cluster" { id = "1471768" ~ name = "20231001t152028z-cloud-jolin-io" -> "20231001t152028z-cloud-jolin-io-traefik" # (9 unchanged attributes hidden) # (3 unchanged blocks hidden) } # module.kube-hetzner.hcloud_network.k3s has moved to module.kube-hetzner.hcloud_network.k3s[0] resource "hcloud_network" "k3s" { id = "3403608" name = "20231001t152028z-cloud-jolin-io" # (4 unchanged attributes hidden) } # module.kube-hetzner.local_file.kustomization_backup[0] must be replaced -/+ resource "local_file" "kustomization_backup" { ~ content = <<-EOT # forces replacement "apiVersion": "kustomize.config.k8s.io/v1beta1" "kind": "Kustomization" - "patchesStrategicMerge": - - | - apiVersion: apps/v1 - kind: Deployment - metadata: - name: system-upgrade-controller - namespace: system-upgrade - spec: - template: - spec: - containers: - - name: system-upgrade-controller - volumeMounts: - - name: ca-certificates - mountPath: /var/lib/ca-certificates - volumes: - - name: ca-certificates - hostPath: - path: /var/lib/ca-certificates - type: Directory - - "kured.yaml" - - "ccm.yaml" + "patches": + - "patch": | + apiVersion: apps/v1 + kind: Deployment + metadata: + name: system-upgrade-controller + namespace: system-upgrade + spec: + template: + spec: + containers: + - name: system-upgrade-controller + volumeMounts: + - name: ca-certificates + mountPath: /var/lib/ca-certificates + volumes: + - name: ca-certificates + hostPath: + path: /var/lib/ca-certificates + type: Directory + "target": + "group": "apps" + "kind": "Deployment" + "name": "system-upgrade-controller" + "namespace": "system-upgrade" + "version": "v1" + - "path": "kured.yaml" + - "path": "ccm.yaml" "resources": - "https://github.com/hetznercloud/hcloud-cloud-controller-manager/releases/download/v1.18.0/ccm-networks.yaml" - "https://github.com/kubereboot/kured/releases/download/1.14.0/kured-1.14.0-dockerhub.yaml" - "https://raw.githubusercontent.com/rancher/system-upgrade-controller/master/manifests/system-upgrade-controller.yaml" - "hcloud-csi.yml" - "traefik_ingress.yaml" - "longhorn.yaml" - "cert_manager.yaml" EOT ~ content_base64sha256 = "lNpUo1iYAV0KdsMXjK89vXx721X7clMRjsYcoekMh9M=" -> (known after apply) ~ content_base64sha512 = "qpCSZ3ofys8sSJHjJuJgfNz51WWYFeyUjxP7NpKJcxrvDNeAxukSGO7NB/+VWpDhcsrX2fTUnawkI35KCcIviA==" -> (known after apply) ~ content_md5 = "dfb992de46ba593c58a3be8d34bdb078" -> (known after apply) ~ content_sha1 = "5d34df489f16e4b4927bac5bd6a6c5b0c5f84541" -> (known after apply) ~ content_sha256 = "94da54a35898015d0a76c3178caf3dbd7c7bdb55fb7253118ec61ca1e90c87d3" -> (known after apply) ~ content_sha512 = "aa9092677a1fcacf2c4891e326e2607cdcf9d5659815ec948f13fb369289731aef0cd780c6e91218eecd07ff955a90e172cad7d9f4d49dac24237e4a09c22f88" -> (known after apply) ~ id = "5d34df489f16e4b4927bac5bd6a6c5b0c5f84541" -> (known after apply) # (3 unchanged attributes hidden) } # module.kube-hetzner.null_resource.agent_config["0-0-agent-nbg1-cax41"] will be created + resource "null_resource" "agent_config" { + id = (known after apply) + triggers = { + "agent_id" = "37727681" + "config" = (sensitive value) } } # module.kube-hetzner.null_resource.autoscaled_nodes_registries["20231001t152028z-cloud-jolin-io-ca-nbg1-cax41-2a8faee08034e953"] will be created + resource "null_resource" "autoscaled_nodes_registries" { + id = (known after apply) + triggers = { + "registries" = " " } } # module.kube-hetzner.null_resource.configure_autoscaler[0] must be replaced -/+ resource "null_resource" "configure_autoscaler" { ~ id = "4444793812316690135" -> (known after apply) ~ triggers = { # forces replacement ~ "template" = (sensitive value) } } # module.kube-hetzner.null_resource.control_plane_config["0-0-control-nbg1-cax41"] will be created + resource "null_resource" "control_plane_config" { + id = (known after apply) + triggers = { + "config" = (sensitive value) + "control_plane_id" = "37727682" } } # module.kube-hetzner.null_resource.kustomization must be replaced -/+ resource "null_resource" "kustomization" { ~ id = "3574815439652385236" -> (known after apply) ~ triggers = { # forces replacement ~ "helm_values_yaml" = (sensitive value) ~ "options" = <<-EOT - + period=5m + post-reboot-node-labels=kured=done + pre-reboot-node-labels=kured=rebooting + reboot-command=/usr/bin/systemctl reboot + reboot-sentinel=/sentinel/reboot-required EOT ~ "versions" = <<-EOT v1.26 - v1.27.3 + 20231027 v1.18.0 v2.5.1 1.14.0 v3.26.1 - v1.14.0 + 1.15.1 + N/A + N/A EOT } } # module.kube-hetzner.module.agents["0-0-agent-nbg1-cax41"].null_resource.zram will be created + resource "null_resource" "zram" { + id = (known after apply) + triggers = { + "zram_size" = "" } } # module.kube-hetzner.module.control_planes["0-0-control-nbg1-cax41"].null_resource.zram will be created + resource "null_resource" "zram" { + id = (known after apply) + triggers = { + "zram_size" = "" } } Plan: 8 to add, 1 to change, 3 to destroy. ```
mysticaltech commented 7 months ago

@schlichtanders Nope, all seems good! It will recreate the right stuff.

mysticaltech commented 7 months ago

If the update does not solve your problem, it means that it's an selinux issue, and that cannot be solved with a simple update because the latter is applied only during cloudinit, so first boot. So as a remedy, you can use the following shell script on each of your nodes.

#!/bin/bash

# Create the SELinux policy module file
cat <<EOF >/root/kube_hetzner_selinux.te
module kube_hetzner_selinux 1.0;

require {
  type kernel_t, bin_t, kernel_generic_helper_t, iscsid_t, iscsid_exec_t, var_run_t,
  init_t, unlabeled_t, systemd_logind_t, systemd_hostnamed_t, container_t,
  cert_t, container_var_lib_t, etc_t, usr_t, container_file_t, container_log_t,
  container_share_t, container_runtime_exec_t, container_runtime_t, var_log_t, proc_t, io_uring_t;
  class key { read view };
  class file { open read execute execute_no_trans create link lock rename write append setattr unlink getattr watch };
  class sock_file { watch write create unlink };
  class unix_dgram_socket create;
  class unix_stream_socket { connectto read write };
  class dir { add_name create getattr link lock read rename remove_name reparent rmdir setattr unlink search write watch };
  class lnk_file { read create };
  class system module_request;
  class filesystem associate;
  class bpf map_create;
  class io_uring sqpoll;
  class anon_inode create;
}

#============= kernel_generic_helper_t ==============
allow kernel_generic_helper_t bin_t:file execute_no_trans;
allow kernel_generic_helper_t kernel_t:key { read view };
allow kernel_generic_helper_t self:unix_dgram_socket create;

#============= iscsid_t ==============
allow iscsid_t iscsid_exec_t:file execute;
allow iscsid_t var_run_t:sock_file write;
allow iscsid_t var_run_t:unix_stream_socket connectto;

#============= init_t ==============
allow init_t unlabeled_t:dir { add_name remove_name rmdir };
allow init_t unlabeled_t:lnk_file create;
allow init_t container_t:file { open read };

#============= systemd_logind_t ==============
allow systemd_logind_t unlabeled_t:dir search;

#============= systemd_hostnamed_t ==============
allow systemd_hostnamed_t unlabeled_t:dir search;

#============= container_t ==============
# Basic file and directory operations for specific types
allow container_t cert_t:dir read;
allow container_t cert_t:lnk_file read;
allow container_t cert_t:file { read open };
allow container_t container_var_lib_t:file { create open read write rename lock };
allow container_t etc_t:dir { add_name remove_name write create setattr watch };
allow container_t etc_t:file { create setattr unlink write };
allow container_t etc_t:sock_file { create unlink };
allow container_t usr_t:dir { add_name create getattr link lock read rename remove_name reparent rmdir setattr unlink search write };
allow container_t usr_t:file { append create execute getattr link lock read rename setattr unlink write };

# Additional rules for container_t
allow container_t container_file_t:file { open read write append getattr setattr };
allow container_t container_file_t:sock_file watch;
allow container_t container_log_t:file { open read write append getattr setattr };
allow container_t container_share_t:dir { read write add_name remove_name };
allow container_t container_share_t:file { read write create unlink };
allow container_t container_runtime_exec_t:file { read execute execute_no_trans open };
allow container_t container_runtime_t:unix_stream_socket { connectto read write };
allow container_t kernel_t:system module_request;
allow container_t container_log_t:dir { read watch };
allow container_t container_log_t:file { open read watch };
allow container_t container_log_t:lnk_file read;
allow container_t var_log_t:dir { add_name write };
allow container_t var_log_t:file { create lock open read setattr write };
allow container_t var_log_t:dir remove_name;
allow container_t var_log_t:file unlink;
allow container_t proc_t:filesystem associate;
allow container_t self:bpf map_create;
allow container_t io_uring_t:anon_inode create;
allow container_t self:io_uring sqpoll;
EOF

# Compile the SELinux policy module
checkmodule -M -m -o /root/kube_hetzner_selinux.mod /root/kube_hetzner_selinux.te

# Package the compiled policy module
semodule_package -o /root/kube_hetzner_selinux.pp -m /root/kube_hetzner_selinux.mod

# Load the policy package into the SELinux policy store
semodule -i /root/kube_hetzner_selinux.pp

echo "SELinux policy module 'kube_hetzner_selinux' has been installed."

Alternatively, you can also regenerate the nodes if you are in HA, one by one, as explained in the 2.x update guide pinned in the discussion section. If you do that, make sure to also create fresh snapshots to get the latest and greatest (all is explained in the guide).

schlichtanders commented 7 months ago

Thank you for all the hints.

little update: I terraform applied to the new kube-hetzner version successfully, but the new nodes still do not connect. Could it be that the bug was actually not fixed for the autoscaler part? Just asking

I will continue tomorrow with inspecting it

mysticaltech commented 7 months ago

@schlichtanders It could be. I will try to see if we need to update the autoscaler version. However, did you try the selinux bash script above?

Also, you can try newer autoscaler releases, depending on your kube version, I would advice running kube v1.28. See on this page for the correct version https://github.com/kubernetes/autoscaler/releases, and only consider releases with names of the form cluster-autoscaler-1.*.

Using older version than 1.28.2 will probably not work (as those doesn't have an important hetzner related change merged), so again you will need at least kube v1.28.

You can then set those those variables accordingly, e.g.:

cluster_autoscaler_image   = "registry.k8s.io/autoscaling/cluster-autoscaler"
cluster_autoscaler_version = "v1.28.2"
schlichtanders commented 7 months ago

Thank you for your further support. I now changed the version to 1.28 and also fixed cluster_autoscaler_version. Unfortunately the scaling still does not work.

The autoscaler is not even deployed. It shows the following error in the logs

Failed to create Hetzner manager: `HCLOUD_CLOUD_INIT` is not specified

Looking into the code, this HCLOUD_CLOUD_INIT variable is regarded as legacy... why? version 1.28.2 seems to expect it? My production cluster is down since days because of this... It would be so great to get it working again :cry:

My current kube.tf ```tf locals { # You have the choice of setting your Hetzner API token here or define the TF_VAR_hcloud_token env # within your shell, such as such: export TF_VAR_hcloud_token=xxxxxxxxxxx # If you choose to define it in the shell, this can be left as is. # Your Hetzner token can be found in your Project > Security > API Token (Read & Write is required). hcloud_token = "xxxxxxxxxxx" # to get the corresponding etcd_version for a k3s version you need to # - start k3s # - run `curl -L --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key https://127.0.0.1:2379/version` # for details see https://gist.github.com/superseb/0c06164eef5a097c66e810fe91a9d408 etcd_version = "v3.5.9" initial_k3s_channel = "v1.28" cluster_autoscaler_version = "v1.28.2" # should match k3s_channel version } terraform { required_version = "1.6.4" required_providers { hcloud = { source = "hetznercloud/hcloud" version = "1.45.0" } github = { source = "integrations/github" version = "5.45.0" } } } module "kube-hetzner" { ################### # FIXING VERSIONS # ################### # * For local dev, path to the git repo # source = "../../kube-hetzner/" # If you want to use the latest master branch # source = "github.com/schlichtanders/terraform-hcloud-kube-hetzner?ref=load-balancer-create" # source = "github.com/schlichtanders/terraform-hcloud-kube-hetzner?ref=postinstall_exec2" # # For normal use, this is the path to the terraform registry # # You can optionally specify a version number - for the registry source = "kube-hetzner/kube-hetzner/hcloud" version = "2.12.2" # If you want to use a specific Hetzner CCM and CSI version, set them below; otherwise, leave them as-is for the latest versions. # https://github.com/hetznercloud/hcloud-cloud-controller-manager hetzner_ccm_version = "v1.19.0" # buggy? # hetzner_ccm_version = "v1.17.2" # https://github.com/hetznercloud/csi-driver hetzner_csi_version = "v2.6.0" # If you want to specify the Kured version, set it below - otherwise it'll use the latest version available. # https://github.com/kubereboot/kured kured_version = "1.15.0" # Allows you to specify either stable, latest, testing or supported minor versions. # see https://rancher.com/docs/k3s/latest/en/upgrades/basic/ and https://update.k3s.io/v1-release/channels # ⚠️ If you are going to use Rancher addons for instance, it's always a good idea to fix the kube version to latest - 0.01, # ⚠️ Rancher currently only supports v1.25 and earlier versions: https://github.com/rancher/rancher/issues/41113 # The default is "v1.26". initial_k3s_channel = local.initial_k3s_channel cluster_autoscaler_image = "registry.k8s.io/autoscaling/cluster-autoscaler" cluster_autoscaler_version = local.cluster_autoscaler_version # You can choose the version of Calico that you want. By default, the latest is used. # More info on available versions can be found at https://github.com/projectcalico/calico/releases # Please note that if you are getting 403s from Github, it's also useful to set the version manually. However there is rarely a need for that! calico_version = "v3.26.1" ####################### # END FIXING VERSIONS # ####################### providers = { hcloud = hcloud } hcloud_token = var.hcloud_token != "" ? var.hcloud_token : local.hcloud_token k3s_token = var.k3s_token # using restoration, the timing for the nodes to respond is a bit larger # load_balancer_health_check_interval = "45s" # load_balancer_health_check_timeout = "30s" # load_balancer_health_check_retries = "40" # Then fill or edit the below values. Only the first values starting with a * are obligatory; the rest can remain with their default values, or you # could adapt them to your needs. # The cluster name, by default "k3s" cluster_name = var.cluster_name # Note that some values, notably "location" and "public_key" have no effect after initializing the cluster. # This is to keep Terraform from re-provisioning all nodes at once, which would lose data. If you want to update # those, you should instead change the value here and manually re-provision each node. Grep for "lifecycle". # Customize the SSH port (by default 22) # ssh_port = 2222 # * Your ssh public key ssh_public_key = file(var.ssh_public_key_file) # * Your private key must be "ssh_private_key = null" when you want to use ssh-agent for a Yubikey-like device authentification or an SSH key-pair with a passphrase. # For more details on SSH see https://github.com/kube-hetzner/kube-hetzner/blob/master/docs/ssh.md ssh_private_key = file(var.ssh_private_key_file) # You can add additional SSH public Keys to grant other team members root access to your cluster nodes. # ssh_additional_public_keys = [] # You can also add additional SSH public Keys which are saved in the hetzner cloud by a label. # See https://docs.hetzner.cloud/#label-selector # ssh_hcloud_key_label = "role=admin" # If you want to use an ssh key that is already registered within hetzner cloud, you can pass its id. # If no id is passed, a new ssh key will be registered within hetzner cloud. # It is important that exactly this key is passed via `ssh_public_key` & `ssh_private_key` vars. hcloud_ssh_key_id = var.hcloud_ssh_key_id # These can be customized, or left with the default values # * For Hetzner locations see https://docs.hetzner.com/general/others/data-centers-and-connection/ network_region = "eu-central" # change to `us-east` if location is ash # If you must change the network CIDR you can do so below, but it is highly advised against. # network_ipv4_cidr = "10.0.0.0/8" # If you must change the cluster CIDR you can do so below, but it is highly advised against. # Cluster CIDR must be a part of the network CIDR! # cluster_ipv4_cidr = "10.42.0.0/16" # For the control planes, at least three nodes are the minimum for HA. Otherwise, you need to turn off the automatic upgrades (see README). # **It must always be an ODD number, never even!** Search the internet for "splitbrain problem with etcd" or see https://rancher.com/docs/k3s/latest/en/installation/ha-embedded/ # For instance, one is ok (non-HA), two is not ok, and three is ok (becomes HA). It does not matter if they are in the same nodepool or not! So they can be in different locations and of various types. # Of course, you can choose any number of nodepools you want, with the location you want. The only constraint on the location is that you need to stay in the same network region, Europe, or the US. # For the server type, the minimum instance supported is cpx11 (just a few cents more than cx11); see https://www.hetzner.com/cloud. # IMPORTANT: Before you create your cluster, you can do anything you want with the nodepools, but you need at least one of each, control plane and agent. # Once the cluster is up and running, you can change nodepool count and even set it to 0 (in the case of the first control-plane nodepool, the minimum is 1). # You can also rename it (if the count is 0), but do not remove a nodepool from the list. # The only nodepools that are safe to remove from the list are at the end. That is due to how subnets and IPs get allocated (FILO). # You can, however, freely add other nodepools at the end of each list if you want. The maximum number of nodepools you can create combined for both lists is 255. # Also, before decreasing the count of any nodepools to 0, it's essential to drain and cordon the nodes in question. Otherwise, it will leave your cluster in a bad state. # Before initializing the cluster, you can change all parameters and add or remove any nodepools. You need at least one nodepool of each kind, control plane, and agent. # The nodepool names are entirely arbitrary, you can choose whatever you want, but no special characters or underscore, and they must be unique; only alphanumeric characters and dashes are allowed. # If you want to have a single node cluster, have one control plane nodepools with a count of 1, and one agent nodepool with a count of 0. # Please note that changing labels and taints after the first run will have no effect. If needed, you can do that through Kubernetes directly. # ⚠️ When choosing ARM cax* server types, for the moment they are only available in fsn1. # Muli-architecture clusters are OK for most use cases, as container underlying images tend to be multi-architecture too. # * Example below: control_plane_nodepools = [ { # name = "control-plane" name = "control-${var.default_location}-cax41", server_type = "cax11", location = var.default_location, labels = [ # no longer needed, but may be interesting when using klipper as the load balancer # as this seems to be the final option which counts # taken from https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/issues/447#issuecomment-1350597300 # "node.kubernetes.io/exclude-from-external-load-balancers=true", ], taints = [], count = 1 # Enable automatic backups via Hetzner (default: false) # backups = true }, ] agent_nodepools = [ # Arm based nodes, currently available only in FSN location { # name = "agent-arm-largest" name = "agent-${var.default_location}-cax41", server_type = "cax41", location = var.default_location, labels = [], taints = [], count = 1, } ] # Add custom control plane configuration options here. # E.g to enable monitoring for etcd, proxy etc: # control_planes_custom_config = { # etcd-expose-metrics = true, # kube-controller-manager-arg = "bind-address=0.0.0.0", # kube-proxy-arg ="metrics-bind-address=0.0.0.0", # kube-scheduler-arg = "bind-address=0.0.0.0", # } # You can enable encrypted wireguard for the CNI by setting this to "true". Default is "false". # FYI, Hetzner says "Traffic between cloud servers inside a Network is private and isolated, but not automatically encrypted." # Source: https://docs.hetzner.com/cloud/networks/faq/#is-traffic-inside-hetzner-cloud-networks-encrypted # It works with all CNIs that we support. # Just note, that if Cilium with cilium_values, the responsability of enabling of disabling Wireguard falls on you. enable_wireguard = true # * LB location and type, the latter will depend on how much load you want it to handle, see https://www.hetzner.com/cloud/load-balancer load_balancer_type = "lb11" load_balancer_location = var.default_location ### The following values are entirely optional (and can be removed from this if unused) # Cluster Autoscaler # Providing at least one map for the array enables the cluster autoscaler feature, default is disabled # By default we set a compatible version with the default initial_k3s_channel, to set another one, # have a look at the tag value in https://github.com/kubernetes/autoscaler/blob/master/charts/cluster-autoscaler/values.yaml # ⚠️ Based on how the autoscaler works with this project, you can only choose either x86 instances or ARM server types for ALL autocaler nodepools. # Also, as mentioned above, for the time being ARM cax* instances are only available in fsn1. # If you are curious, it's ok to have a multi-architecture cluster, as most underlying container images are multi-architecture too. # * Example below: autoscaler_nodepools = [ { # "ca", as short for "cluster-autoscaler" - this is the common abbreviation # this needs to be really short, because a string like "-432f51dcc918aeba" is appended, # and the total string must be 63 characters maximum! # SUPER IMPORTANT: The prefix is used internaly to distinguish autoscaler nodes from other node types # search for "${var.cluster_name}-ca-" and change it too if you change the name here name = "ca-${var.default_location}-cax41" # it seems the node # name = "autoscaled-arm-largest" server_type = "cax41" location = var.default_location min_nodes = 0 # somehow a first autoscaler node is spawned even if `min_nodes = 0`, for updates see https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/issues/756 max_nodes = 4 } ] # Enable etcd snapshot backups to S3 storage. # Just provide a map with the needed settings (according to your S3 storage provider) and backups to S3 will # be enabled (with the default settings for etcd snapshots). # Cloudflare's R2 offers 10GB, 10 million reads and 1 million writes per month for free. # For proper context, have a look at https://docs.k3s.io/backup-restore. etcd_s3_backup = { etcd-s3-endpoint = var.etcd_s3_endpoint etcd-s3-access-key = var.etcd_s3_access_key etcd-s3-secret-key = var.etcd_s3_secret_key etcd-s3-bucket = var.etcd_s3_bucket } # To enable Hetzner Storage Box support, you can enable csi-driver-smb, default is "false". # enable_csi_driver_smb = true # To use local storage on the nodes, you can enable Longhorn, default is "false". # See a full recap on how to configure agent nodepools for longhorn here https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/discussions/373#discussioncomment-3983159 # Also see Longhorn best practices here https://gist.github.com/ifeulner/d311b2868f6c00e649f33a72166c2e5b enable_longhorn = true # By default, longhorn is pulled from https://charts.longhorn.io. # If you need a version of longhorn which assures compatibility with rancher you can set this variable to https://charts.rancher.io. # longhorn_repository = "https://charts.rancher.io" # The namespace for longhorn deployment, default is "longhorn-system". # longhorn_namespace = "longhorn-system" # The file system type for Longhorn, if enabled (ext4 is the default, otherwise you can choose xfs). # longhorn_fstype = "xfs" # how many replica volumes should longhorn create (default is 3). longhorn_replica_count = 2 # When you enable Longhorn, you can go with the default settings and just modify the above two variables OR you can add a longhorn_values variable # with all needed helm values, see towards the end of the file in the advanced section. # If that file is present, the system will use it during the deploy, if not it will use the default values with the two variable above that can be customized. # After the cluster is deployed, you can always use HelmChartConfig definition to tweak the configuration. # Also, you can choose to use a Hetzner volume with Longhorn. By default, it will use the nodes own storage space, but if you add an attribute of # longhorn_volume_size (⚠️ not a variable, just a possible agent nodepool attribute) with a value between 10 and 10000 GB to your agent nodepool definition, it will create and use the volume in question. # See the agent nodepool section for an example of how to do that. # To disable Hetzner CSI storage, you can set the following to "true", default is "false". # disable_hetzner_csi = true # If you want to use a specific Hetzner CCM and CSI version, set them below; otherwise, leave them as-is for the latest versions. # https://github.com/hetznercloud/hcloud-cloud-controller-manager # hetzner_ccm_version = "v1.15.0" # https://github.com/hetznercloud/csi-driver # hetzner_csi_version = "v2.3.2" # If you want to specify the Kured version, set it below - otherwise it'll use the latest version available. # https://github.com/kubereboot/kured # kured_version = "1.13.1" # If you want to enable the Nginx ingress controller (https://kubernetes.github.io/ingress-nginx/) instead of Traefik, you can set this to "nginx". Default is "traefik". # By the default we load optimal Traefik and Nginx ingress controller config for Hetzner, however you may need to tweak it to your needs, so to do, # we allow you to add a traefik_values and nginx_values, see towards the end of this file in the advanced section. # After the cluster is deployed, you can always use HelmChartConfig definition to tweak the configuration. # If you want to disable both controllers set this to "none" # ingress_controller = "nginx" # You can change the number of replicas for selected ingress controller here. The default 0 means autoselecting based on number of agent nodes (1 node = 1 replica, 2 nodes = 2 replicas, 3+ nodes = 3 replicas) # ingress_replica_count = 1 # Use the klipperLB (similar to metalLB), instead of the default Hetzner one, that has an advantage of dropping the cost of the setup. # Automatically "true" in the case of single node cluster (as it does not make sense to use the Hetzner LB in that situation). # It can work with any ingress controller that you choose to deploy. # Please note that because the klipperLB points to all nodes, we automatically allow scheduling on the control plane when it is active. # enable_klipper_metal_lb = "true" # If you want to configure additional arguments for traefik, enter them here as a list and in the form of traefik CLI arguments; see https://doc.traefik.io/traefik/reference/static-configuration/cli/ # They are the options that go into the additionalArguments section of the Traefik helm values file. # Example: traefik_additional_options = ["--log.level=DEBUG", "--tracing=true"] # traefik_additional_options = [] # By default traefik is configured to redirect http traffic to https, you can set this to "false" to disable the redirection. # traefik_redirect_to_https = false # If you want to disable the metric server set this to "false". Default is "true". # enable_metrics_server = false # If you want to allow non-control-plane workloads to run on the control-plane nodes, set this to "true". The default is "false". # True by default for single node clusters, and when enable_klipper_metal_lb is true. In those cases, the value below will be ignored. # allow_scheduling_on_control_plane = true # If you want to disable the automatic upgrade of k3s, you can set below to "false". # Ideally, keep it on, to always have the latest Kubernetes version, but lock the initial_k3s_channel to a kube major version, # of your choice, like v1.25 or v1.26. That way you get the best of both worlds without the breaking changes risk. # For production use, always use an HA setup with at least 3 control-plane nodes and 2 agents, and keep this on for maximum security. # The default is "true" (in HA setup i.e. at least 3 control plane nodes & 2 agents, just keep it enabled since it works flawlessly). automatically_upgrade_k3s = false # The default is "true" (in HA setup it works wonderfully well, with automatic roll-back to the previous snapshot in case of an issue). # IMPORTANT! For non-HA clusters i.e. when the number of control-plane nodes is < 3, you have to turn it off. automatically_upgrade_os = false # If you need more control over kured and the reboot behaviour, you can pass additional options to kured. # For example limiting reboots to certain timeframes. For all options see: https://kured.dev/docs/configuration/ # The default options are: `--reboot-command=/usr/bin/systemctl reboot --pre-reboot-node-labels=kured=rebooting --post-reboot-node-labels=kured=done --period=5m` # Defaults can be overridden by using the same key. # kured_options = { # "reboot-days": "su" # "start-time": "3am" # "end-time": "8am" # "time-zone": "Local" # } # Allows you to specify either stable, latest, testing or supported minor versions. # see https://rancher.com/docs/k3s/latest/en/upgrades/basic/ and https://update.k3s.io/v1-release/channels # ⚠️ If you are going to use Rancher addons for instance, it's always a good idea to fix the kube version to latest - 0.01, # ⚠️ Rancher currently only supports v1.25 and earlier versions: https://github.com/rancher/rancher/issues/41113 # The default is "v1.26". # initial_k3s_channel = "stable" # Whether to use the cluster name in the node name, in the form of {cluster_name}-{nodepool_name}, the default is "true". # use_cluster_name_in_node_name = false # Extra k3s registries. This is useful if you have private registries and you want to pull images without additional secrets. # Or if you want to proxy registries for various reasons like rate-limiting. # It will create the registries.yaml file, more info here https://docs.k3s.io/installation/private-registry. # Note that you do not need to get this right from the first time, you can update it when you want during the life of your cluster. # The default is blank. /* k3s_registries = <<-EOT mirrors: hub.my_registry.com: endpoint: - "hub.my_registry.com" configs: hub.my_registry.com: auth: username: username password: password EOT */ # Additional environment variables for the host OS on which k3s runs. See for example https://docs.k3s.io/advanced#configuring-an-http-proxy . # additional_k3s_environment = { # "CONTAINERD_HTTP_PROXY" : "http://your.proxy:port", # "CONTAINERD_HTTPS_PROXY" : "http://your.proxy:port", # "NO_PROXY" : "127.0.0.0/8,10.0.0.0/8,", # } # Additional commands to execute on the host OS before the k3s install, for example fetching and installing certs. # preinstall_exec = [ # "curl https://somewhere.over.the.rainbow/ca.crt > /root/ca.crt", # "trust anchor --store /root/ca.crt", # ] preinstall_exec = [ # This is adding node-taints to the config # it is safest to do it here, because node-taints may have already been added, and setting --node-taint attribute to k3s agent args will clean out all the other node-taints # we check whether the csi-node-driver can be loaded # if we get 403 response, this means that we got a bad IP # adapted from https://stackoverflow.com/questions/53526188/can-i-have-curl-print-just-the-response-code # simply fail if a wrong IP could be identified # (this prevents request to hcloud api as long as this node still exists) # node that `%%{` is the escaped version of a literal `%{` <<-EOF RESPONSE_CODE=$(curl -IL --silent --write-out "%%{http_code}\n" -o /dev/null https://registry.k8s.io/v2/sig-storage/csi-node-driver-registrar/manifests/v2.7.0) echo "hostname = '$(hostname)'. prefix = '${var.cluster_name}-ca-'. RESPONSE_CODE = '$RESPONSE_CODE'." > /tmp/iamhere if [ $(hostname) != "${var.cluster_name}-ca-" ]; then # fail immediately if not on autoscaler, because the initial nodes really need to be clean [ "403" != "$RESPONSE_CODE" ] else # if on an autoscaler node, just taint the node, so that autoscaler does not fail and the node is cleanup later automatically # we can simply add a second config to be aggregated see https://docs.k3s.io/installation/configuration mkdir -p /etc/rancher/k3s/config.yaml.d [ "403" != "$RESPONSE_CODE" ] || cat > /etc/rancher/k3s/config.yaml.d/jolin.yaml < /tmp/restorenotes k3s server \ --cluster-reset \ --etcd-s3 \ --cluster-reset-restore-path=${var.etcd_snapshot_name} \ --etcd-s3-endpoint=${var.etcd_s3_endpoint} \ --etcd-s3-bucket=${var.etcd_s3_bucket} \ --etcd-s3-access-key=${var.etcd_s3_access_key} \ --etcd-s3-secret-key=${var.etcd_s3_secret_key} # renaming the k3s.yaml because it is used as a trigger for further downstream # changes. Better to let `k3s server` create it as expected. mv /etc/rancher/k3s/k3s.yaml /etc/rancher/k3s/k3s.backup.yaml # download etcd/etcdctl for adapting the kubernetes config before starting k3s ETCD_VER=${local.etcd_version} case "$(uname -m)" in aarch64) ETCD_ARCH="arm64" ;; x86_64) ETCD_ARCH="amd64" ;; esac; DOWNLOAD_URL=https://github.com/etcd-io/etcd/releases/download rm -f /tmp/etcd-$ETCD_VER-linux-$ETCD_ARCH.tar.gz curl -L $DOWNLOAD_URL/$ETCD_VER/etcd-$ETCD_VER-linux-$ETCD_ARCH.tar.gz -o /tmp/etcd-$ETCD_VER-linux-$ETCD_ARCH.tar.gz tar xzvf /tmp/etcd-$ETCD_VER-linux-$ETCD_ARCH.tar.gz -C /usr/local/bin --strip-components=1 rm -f /tmp/etcd-$ETCD_VER-linux-$ETCD_ARCH.tar.gz etcd --version etcdctl version # delete traefik service so that no load-balancer is accidently changed nohup etcd --data-dir /var/lib/rancher/k3s/server/db/etcd & echo $! > save_pid.txt etcdctl del /registry/services/specs/traefik/traefik etcdctl del /registry/services/endpoints/traefik/traefik # delete old nodes (they interfere with load balancer) # minions is the old name for "nodes" OLD_NODES=$(etcdctl get "" --prefix --keys-only | grep /registry/minions/ | cut -c 19-) for NODE in $OLD_NODES; do for KEY in $(etcdctl get "" --prefix --keys-only | grep $NODE); do etcdctl del $KEY done done kill -9 `cat save_pid.txt` rm save_pid.txt else echo this is not the first control plane node > /tmp/restorenotes fi EOF ] # firstcontrolplane_kubectlisready_exec = [ # "kubectl delete service/traefik -n traefik || true" # ] # Additional flags to pass to the k3s server command (the control plane). # k3s_exec_server_args = "--kube-apiserver-arg enable-admission-plugins=PodTolerationRestriction,PodNodeSelector" # we need to repeat the default kubelet-arg as CLI arguments take precendence. See https://docs.k3s.io/installation/configuration#configuration-file # default kubelet_arg = ["cloud-provider=external", "volume-plugin-dir=/var/lib/kubelet/volumeplugins"] # If you want to allow all outbound traffic you can set this to "false". Default is "true". # restrict_outbound_traffic = false # Adding extra firewall rules, like opening a port # More info on the format here https://registry.terraform.io/providers/hetznercloud/hcloud/latest/docs/resources/firewall extra_firewall_rules = [ # { # description = "For Postgres" # direction = "in" # protocol = "tcp" # port = "5432" # source_ips = ["0.0.0.0/0", "::/0"] # destination_ips = [] # Won't be used for this rule # }, # { # description = "To Allow ArgoCD (or ssh-keyscan) access to resources via SSH" # direction = "out" # protocol = "tcp" # port = "22" # source_ips = [] # Won't be used for this rule # destination_ips = ["0.0.0.0/0", "::/0"] # } { description = "Allow any outward access. To Allow ArgoCD (or ssh-keyscan) access to resources via SSH, access to Databases via special ports, etc." direction = "out" protocol = "tcp" port = "any" source_ips = [] # Won't be used for this rule destination_ips = ["0.0.0.0/0", "::/0"] } ] # If you want to configure a different CNI for k3s, use this flag # possible values: flannel (Default), calico, and cilium # As for Cilium, we allow infinite configurations via helm values, please check the CNI section of the readme over at https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/#cni. # Also, see the cilium_values at towards the end of this file, in the advanced section. # cni_plugin = "cilium" # You can choose the version of Calico that you want. By default, the latest is used. # More info on available versions can be found at https://github.com/projectcalico/calico/releases # Please note that if you are getting 403s from Github, it's also useful to set the version manually. However there is rarely a need for that! # calico_version = "v3.25.0" # If you want to disable the k3s default network policy controller, use this flag! # Both Calico and Ciliun cni_plugin values override this value to true automatically, the default is "false". # disable_network_policy = true # If you want to disable the automatic use of placement group "spread". See https://docs.hetzner.com/cloud/placement-groups/overview/ # We advise to not touch that setting, unless you have a specific purpose. # The default is "false", meaning it's enabled by default. # placement_group_disable = true # By default, we allow ICMP ping in to the nodes, to check for liveness for instance. If you do not want to allow that, you can. Just set this flag to true (false by default). # block_icmp_ping_in = true # You can enable cert-manager (installed by Helm behind the scenes) with the following flag, the default is "true". # enable_cert_manager = false # IP Addresses to use for the DNS Servers, set to an empty list to use the ones provided by Hetzner, defaults to ["1.1.1.1", "8.8.8.8", "9.9.9.9"]. # The number of different DNS servers is limited to 3 by Kubernetes itself. # dns_servers = [] # When this is enabled, rather than the first node, all external traffic will be routed via a control-plane loadbalancer, allowing for high availability. # The default is false. # see https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/discussions/302 use_control_plane_lb = false # Let's say you are not using the control plane LB solution above, and still want to have one hostname point to all your control-plane nodes. # You could create multiple A records of to let's say cp.cluster.my.org pointing to all of your control-plane nodes ips. # In which case, you need to define that hostname in the k3s TLS-SANs config to allow connection through it. It can be hostnames or IP addresses. # additional_tls_sans = ["cp.cluster.my.org"] # Oftentimes, you need to communicate to the cluster from inside the cluster itself, in which case it is important to set this value, as it will configure the hostname # at the load balancer level, and will save you from many slows downs when initiating communications from inside. Later on, you can point your DNS to the IP given # to the LB. And if you have other services pointing to it, you are also free to create CNAMES to point to it, or whatever you see fit. # If set, it will apply to either ingress controllers, Traefik or Ingress-Nginx. lb_hostname = var.lb_hostname # You can refine a base domain name to be use in this form of nodename.base_domain for setting the reserve dns inside Hetzner base_domain = var.lb_hostname # You can enable Rancher (installed by Helm behind the scenes) with the following flag, the default is "false". # ⚠️ Rancher currently only supports Kubernetes v1.25 and earlier, you will need to set initial_k3s_channel to a supported version: https://github.com/rancher/rancher/issues/41113 # When Rancher is enabled, it automatically installs cert-manager too, and it uses rancher's own self-signed certificates. # See for options https://rancher.com/docs/rancher/v2.0-v2.4/en/installation/resources/advanced/helm2/helm-rancher/#choose-your-ssl-configuration # The easiest thing is to leave everything as is (using the default rancher self-signed certificate) and put Cloudflare in front of it. # As for the number of replicas, by default it is set to the numbe of control plane nodes. # You can customized all of the above by adding a rancher_values variable see at the end of this file in the advanced section. # After the cluster is deployed, you can always use HelmChartConfig definition to tweak the configuration. # IMPORTANT: Rancher's install is quite memory intensive, you will require at least 4GB if RAM, meaning cx21 server type (for your control plane). # ALSO, in order for Rancher to successfully deploy, you have to set the "rancher_hostname". # enable_rancher = true # If using Rancher you can set the Rancher hostname, it must be unique hostname even if you do not use it. # If not pointing the DNS, you can just port-forward locally via kubectl to get access to the dashboard. # If you already set the lb_hostname above and are using a Hetzner LB, you do not need to set this one, as it will be used by default. # But if you set this one explicitly, it will have preference over the lb_hostname in rancher settings. # rancher_hostname = "rancher.xyz.dev" # When Rancher is deployed, by default is uses the "latest" channel. But this can be customized. # The allowed values are "stable" or "latest". # rancher_install_channel = "stable" # Finally, you can specify a bootstrap-password for your rancher instance. Minimum 48 characters long! # If you leave empty, one will be generated for you. # (Can be used by another rancher2 provider to continue setup of rancher outside this module.) # rancher_bootstrap_password = "" # Separate from the above Rancher config (only use one or the other). You can import this cluster directly on an # an already active Rancher install. By clicking "import cluster" choosing "generic", giving it a name and pasting # the cluster registration url below. However, you can also ignore that and apply the url via kubectl as instructed # by Rancher in the wizard, and that would register your cluster too. # More information about the registration can be found here https://rancher.com/docs/rancher/v2.6/en/cluster-provisioning/registered-clusters/ # rancher_registration_manifest_url = "https://rancher.xyz.dev/v3/import/xxxxxxxxxxxxxxxxxxYYYYYYYYYYYYYYYYYYYzzzzzzzzzzzzzzzzzzzzz.yaml" # Extra values that will be passed to the `extra-manifests/kustomization.yaml.tpl` if its present. # extra_kustomize_parameters={} # It is best practice to turn this off, but for backwards compatibility it is set to "true" by default. # See https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/issues/349 # When "false". The kubeconfig file can instead be created by executing: "terraform output --raw kubeconfig > cluster_kubeconfig.yaml" # Always be careful to not commit this file! # create_kubeconfig = false # Don't create the kustomize backup. This can be helpful for automation. # create_kustomization = false ### ADVANCED - Custom helm values for packages above (search _values if you want to located where those are mentioned upper in this file) # ⚠️ Inside the _values variable below are examples, up to you to find out the best helm values possible, we do not provide support for customized helm values. # Please understand that the indentation is very important, inside the EOTs, as those are proper yaml helm values. # We advise you to use the default values, and only change them if you know what you are doing! # Cilium, all Cilium helm values can be found at https://github.com/cilium/cilium/blob/master/install/kubernetes/cilium/values.yaml # The following is an example, please note that the current indentation inside the EOT is important. /* cilium_values = <
schlichtanders commented 7 months ago

I hesitate to apply the SELinux script, because this is for now related to autoscaler, and I don't like to manually interact with every autoscaled node. That is not feasible.

Also I like to prevent such kind of hard to fix production cluster failure. I fixed almost every version of kube.tf already and was looking which fix I was missing. Am I right that this lethal update occured because of the default value automatically_upgrade_k3s = true?

schlichtanders commented 7 months ago

Thank you for your further support. I now changed the version to 1.28 and also fixed cluster_autoscaler_version. Unfortunately the scaling still does not work.

The autoscaler is not even deployed. It shows the following error in the logs

Failed to create Hetzner manager: `HCLOUD_CLOUD_INIT` is not specified

Looking into the code, this HCLOUD_CLOUD_INIT variable is regarded as legacy... why? version 1.28.2 seems to expect it? My production cluster is down since days because of this... It would be so great to get it working again 😢

My current kube.tf

I now updated to 1.29 and at least the cluster autoscaler seems to start again. One step further.

schlichtanders commented 7 months ago

I think I am into a new ~bug~ difficulty, which might just have occured as part of the update process.

The autoscaler logs mention:

Couldn't get autoscaling options for ng

I think this means the new node could register, but the cluster still fails to autoscale...

schlichtanders commented 7 months ago

I think I understood that problem... I deleted a autoscaling node on Hetzner manually, because it was not cleanedup (thought this happened because the autoscaler was restarted in the wrong moment).

I think by doing this I also deleted the node placement group which might be critical for the autoscaling to work

schlichtanders commented 7 months ago

I know removed the autoscaling group from kube.tf, run terraform apply, and added it again, and run terraform apply, but no placement group is created on hetzner.

But the autoscaler still expects it to be there somehow... any help is highly appreciated.

schlichtanders commented 7 months ago

It might be that this error log is actually not telling anything. I found a recent blog post which also has the log, but as part of a verification which is evaluated successful, so this might not be the problem.

Maybe it is just because of the newer autoscaler version... that it does not scale up somehow

schlichtanders commented 7 months ago

So back to the logs I think I now came down the following error as the most descriptive of why it does not scale:

I0226 11:12:14.198771       1 orchestrator.go:542] Pod mypod/mynamespace can't be scheduled on NAME_OF_AUTO_SCALING_GROUP, predicate checking error: Insufficient ephemeral-storage; predicateName=NodeResourcesFit; reasons: Insufficient ephemeral-storage; debugInfo=

This is only an info, but actually might be the key error. It seems like the autoscaler still thinks that there is a node with the same name as the auto scaling group itself. But there is nothing like this.

schlichtanders commented 7 months ago

This is so complicated... I do not get it to work.

Please help - Autoscaling does not work with kube-hetzner, even worse, this bug appeared without me doing anything and destroyed my cluster.

Here the key logs put together from autoscaler

I0226 11:32:54.763746       1 klogx.go:87] Pod mypod is unschedulable
I0226 11:32:54.763783       1 orchestrator.go:108] Upcoming 0 nodes
I0226 11:32:54.763806       1 orchestrator.go:440] Skipping node group draining-node-pool - max size reached
E0226 11:32:54.763812       1 orchestrator.go:446] Couldn't get autoscaling options for ng: NAME_OF_AUTOSCALING_GROUP
I0226 11:32:54.763861       1 orchestrator.go:542] Pod mypod/mynamespace can't be scheduled on NAME_OF_AUTOSCALING_GROUP, predicate checking error: Insufficient ephemeral-storage; predicateName=NodeResourcesFit; reasons: Insufficient ephemeral-storage; debugInfo=
I0226 11:32:54.763903       1 orchestrator.go:150] No pod can fit to NAME_OF_AUTOSCALING_GROUP
I0226 11:32:54.763913       1 orchestrator.go:164] No expansion options

Here the important parts for kube.tf

module "kube-hetzner" {
  # ...
  source = "kube-hetzner/kube-hetzner/hcloud"
  version = "2.12.2"
  cluster_autoscaler_image = "registry.k8s.io/autoscaling/cluster-autoscaler"
  cluster_autoscaler_version = "v1.29.0"
  initial_k3s_channel = "v1.29"

  autoscaler_nodepools = [
    {
      name        = "ca-group"
      server_type = "cax41"
      location    = "nbg1"
      min_nodes   = 0 
      max_nodes   = 6
    }
  ]
  # ...
}

Just create some load and see how it fails.

schlichtanders commented 7 months ago

@mysticaltech Which autoscaler - k3s version combination actually works? I couldn't get any to work... my cluster is still down

schlichtanders commented 7 months ago

(I tried also by setting the minSize of the autoscaler group to 1 - it is ignored somehow, and no error is apparent in the logs actually, super weird.)

mysticaltech commented 7 months ago

@schlichtanders The auto auto upgrade of k3s is locked by the channel you choose for it in initial_k3s_channel, so that's definitely not the issue.

So if you are on the latest version of k3s, v1.29, you should use:

cluster_autoscaler_image   = "registry.k8s.io/autoscaling/cluster-autoscaler"
cluster_autoscaler_version = "v1.29.0"

Please try the above and let us know, and also apply with min of 0 and max 0, then apply again with your real values. @Silvest89 Your input would be super valued here too if you can.

Silvest89 commented 7 months ago

@schlichtanders The ephemeral storage seems to be a bug which has been in autoscaler for ages. Can you post the pod spec it is trying to schedule? I think it seems to request for ephemeral storage

mysticaltech commented 7 months ago

@schlichtanders Please see @Silvest89's answer above. Also, seeing your other issues, as hotfix, set initial_k3s_channel to v1.28, with automatically_upgrade_k3s = true, and delete the lines with cluster_autoscaler_image and cluster_autoscaler_version to force the use of the default values. This should force the system upgrade controller to downgrade your cluster to v1.28 which is compatible with the default autoscaler config that we have.

Silvest89 commented 7 months ago
    return apiv1.ResourceList{
        // TODO somehow determine the actual pods that will be running
        apiv1.ResourcePods:    *resource.NewQuantity(defaultPodAmountsLimit, resource.DecimalSI),
        apiv1.ResourceCPU:     *resource.NewQuantity(int64(typeInfo.Cores), resource.DecimalSI),
        apiv1.ResourceMemory:  *resource.NewQuantity(int64(typeInfo.Memory*1024*1024*1024), resource.DecimalSI),
        apiv1.ResourceStorage: *resource.NewQuantity(int64(typeInfo.Disk*1024*1024*1024), resource.DecimalSI),
    }, nil

I just looked through the autoscaler code apiv1.ResourceEphemeralStorage is not implemented for hetzner

Silvest89 commented 7 months ago

@mysticaltech EphemeralStorage can be assigned to the same *resource.NewQuantity(int64(typeInfo.Disk*1024*1024*1024), resource.DecimalSI), And that will be a one line fix :P

mysticaltech commented 7 months ago

@Silvest89 You are the man, please do (I'm not good at golang)! So at least our own fork will, and then later if you could submit a PR to the mother repo it would be great, as you did last time 🙏

Silvest89 commented 7 months ago

@mysticaltech But I like to test my shit as well :P, testing the autoscaler is a ****. So it will probably take longer than this one line fix :P

mysticaltech commented 7 months ago

@Silvest89 If you push the fix and do a successful release, I can do the testing lol. And I'm sure @schlichtanders will be quicker than me, he is waiting for it!

Silvest89 commented 7 months ago

@mysticaltech The workflow is running. When it is done, you will see a new tag (current date)

mysticaltech commented 7 months ago

Thanks a lot @Silvest89! @schlichtanders So no need to downgrade, keep k3s at v1.29, fix incoming, will tell you what changes need to be done, just two lines in your kube.tf. Keep you posted ASAP as soon as the build is successful.

mysticaltech commented 7 months ago

@schlichtanders Here we go, in your kube.tf with initial_k3s_channel = "v1.29", remove the line with cluster_autoscaler_image as to force it to use the default value. And change cluster_autoscaler_version to "20240226".

Also, make sure your kube.tf follows the same autoscaler nodepools definition format as in the latest kube.tf.example. Change to min 0 and max 0, apply. Let it scale down if needed, if already scaled down, set your wanted values for min and max and apply again.

Let us know please 🙏

mysticaltech commented 7 months ago

@schlichtanders @Silvest89 Works like a charm on my end. Tried with both kube v1.28 and v1.29, it's autoscalling correctly!

@Silvest89 I can make this one the default, yes? On your side, if you could PR to the mother repo it would be dope! 🚀

Screenshot from 2024-02-26 21-23-05

schlichtanders commented 7 months ago

Thank you for all your help. Still I am facing problems

Not sure what changed, but this is a new behaviour I haven't seen on the previous autoscalers... debugging further...

EDIT: it is this affinity

affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: node-role.kubernetes.io/master
                    operator: Exists

this does not make sense on an arbitrary kube-hetzner cluster, does it? I don't have such nodes which are tainted node-role.kubernetes.io/master

schlichtanders commented 7 months ago

I unfortunately get the same affinity problem with 20231027 version.

EDIT: @mysticaltech how has this worked for you? have you manually tainted the nodes? or is it about manual labels?

mysticaltech commented 7 months ago

@schlichtanders It's one replica on my end. And what I do to test is I use a dummy pod definition, give it a lot of memory needs, enough for a full node, like 1500Mi for cpx11, and it works as in a node gets created and the pod is scheduled after the node joins.

You need to inspect the logs of the autoscaler pod. Also Insec my log level to 5, in the variable for this.

Also, like I said many times, you need to let it scale down completely first, to clear the old config, so start with min 0 and max 0.

schlichtanders commented 7 months ago

My autoscaler pod is not even starting - it is in Pending mode because of this not fulfilled Affinity restriction. EDIT: the affinity is defined here

I am really surprised that the same autoscaler has different parameters on your side. I am using

  source = "kube-hetzner/kube-hetzner/hcloud"
  version = "2.12.2"

which version are you using?

mysticaltech commented 7 months ago

@schlichtanders It's one replica on my end. And what I do to test is I use a dummy pod definition, give it a lot of memory needs, enough for a full node, like 1500Mi for cpx11, and it works as in the nodes gets created and the pod is scheduled after the nodes joins.

You need to inspect the logs of the autoscaler pod. Also Insec my log level to 5, in the variable for this.

If you see two replicas it's probably two distinct deployments, you need to delete the old one.

mysticaltech commented 7 months ago

I am using same latest version. And you need to debug by increasing your log level. And describing the autoscaler pod to make sure it's not an old version.

schlichtanders commented 7 months ago

I inspected the yaml on kubernetes and it indeed says replicas: 1 under the definition, but under status it is still replicas: 2. Weird...

The log level is not useful yet, as the autoscaler pod is not even starting (sorry for repeating myself again).

I now delete the cluster-autoscaler manually in the cluster and reapplied terraform. Result: Only one replica, but as expected, the affinity is still leaving it in Pending. There are no resources with this taint.

Have you tainted your nodes with node-role.kubernetes.io/master? How do your nodes get the correct taint? EDIT: After googling I think requiredDuringSchedulingIgnoredDuringExecution is checking against labels, so how do your nodes get the correct labels?

schlichtanders commented 7 months ago

Researching the specific taint it looks like the cluster autoscaler should be scheduled on the control plane https://kubernetes.io/docs/reference/labels-annotations-taints/#node-role-kubernetes-io-master-taint

It is a deprecated taint and should be replaced by node-role.kubernetes.io/control-plane

EDIT: Indeed my control plane nodes have this new taint

Taints:             node-role.kubernetes.io/control-plane:NoSchedule
                    node.kubernetes.io/unschedulable:NoSchedule

But I am confused about the distinction between taints and labels here...

schlichtanders commented 7 months ago

This is definitely the next bug :+1: : Looking at the tolerations it should apparently be scheduled on the control plane.

Hence the affinity to node-role.kubernetes.io/master should be replaced with node-role.kubernetes.io/control-plane

schlichtanders commented 7 months ago

@mysticaltech can you include that to your new pullrequest?

mysticaltech commented 7 months ago

@schlichtanders Thanks for in-depth debugging. @Silvest89 what do you think?

Silvest89 commented 7 months ago

@mysticaltech The autoscaler yaml should be changed. All kubernetes versions now have control-plane since "master" is considered as one of harmful language today.

And there is no node with a master label anymore.

Change in autoscaler.yaml.tpl:

      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: node-role.kubernetes.io/master
                    operator: Exists
schlichtanders commented 7 months ago

@mysticaltech please give me a ping if it is included in your pullrequest, then I can test whether it works for me

mysticaltech commented 7 months ago

@Silvest89 Thanks for clarifying. On it. @schlichtanders Will ping you ASAP.

mysticaltech commented 7 months ago

@schlichtanders I just updated #1239, please try now, and let us know! 🙏

schlichtanders commented 7 months ago

Thank you very much. It still does not work :grin:

But this one might be a special thing of my slightly older control-plane node. As already mentioned it has also the taint node.kubernetes.io/unschedulable:NoSchedule.

k8s docs suggest that this taint is rather temporary, "to avoid race conditions". However for me it is still there...

Any ideas, why it is there?

schlichtanders commented 7 months ago

I just create a fresh cluster to see whether this taint appears and super surprising it appeared on one of the two agent nodes of my cluster. The control-plane does not have it this time...

I have no clue yet how to debug it why this taint node.kubernetes.io/unschedulable:NoSchedule is not cleaned up (or set initially)...

EDIT: Good news: The taint is gone now. Hence I assume it was some mistake/bug/interaction half a year ago which left it there. I will try to delete the taint manually.

mysticaltech commented 7 months ago

@schlichtanders Yes, try with kubectl taint nodes <node-name> node.kubernetes.io/unschedulable:NoSchedule-

schlichtanders commented 7 months ago

Your command does not seem to have any effect. The node is also marked as Unscheduable: true maybe this is stronger.

schlichtanders commented 7 months ago

I've found another command kubectl uncordon <node name> to make a node schedulable @mysticaltech is this safe to do on the control plane?