vinnytwice commented 1 year ago

Description

I'm getting inconsistent results so I can't pinpoint the causes of the issues I'm facing.

I was trying to set a multi node cluster with 1 cp 3 agents and an autoscaler on a working cluster that was using a single node ARM server. The first issue that I faced is that adding labels to the Autoscale Nodes throws an error:

│ Error: Unsupported argument
│ 
│   on kube.tf line 167, in module "kube-hetzner":
│  167:   autoscaler_labels = [
│ 
│ An argument named "autoscaler_labels" is not expected here.

so I tried not including it. After a few tries I started getting the image has incompatible architecture error for all servers( I started using version = "2.0.9", without it the hcloud used is 2.3.4 ) :

│ Error: image has incompatible architecture (invalid_input): [image => [image has incompatible architecture]]
│ 
│   with module.kube-hetzner.module.agents["1-0-neo4j"].hcloud_server.server,
│   on .terraform/modules/kube-hetzner/modules/host/main.tf line 22, in resource "hcloud_server" "server":
│   22: resource "hcloud_server" "server" {
│ 
╵
╷
│ Error: image has incompatible architecture (invalid_input): [image => [image has incompatible architecture]]
│ 
│   with module.kube-hetzner.module.control_planes["0-0-control-plane"].hcloud_server.server,
│   on .terraform/modules/kube-hetzner/modules/host/main.tf line 22, in resource "hcloud_server" "server":
│   22: resource "hcloud_server" "server" {
│ 
╵
╷
│ Error: image has incompatible architecture (invalid_input): [image => [image has incompatible architecture]]
│ 
│   with module.kube-hetzner.module.agents["0-0-mongo"].hcloud_server.server,
│   on .terraform/modules/kube-hetzner/modules/host/main.tf line 22, in resource "hcloud_server" "server":
│   22: resource "hcloud_server" "server" {
│

Destroyed the infra and started again.

Not specifying a version number fixed the incompatible architecture issue and started deploying the infra. I reduced the amount of servers created to 2 so only the control plane and one agent, both cax21 servers in case it was a "quota" problem as Im using free credits, but (after creating all resources as I see on cloud console) tf gets stuck until its throws an error:

module.kube-hetzner.null_resource.kustomization (remote-exec): Waiting for load-balancer to get an IP...
module.kube-hetzner.null_resource.kustomization: Still creating... [6m30s elapsed]
module.kube-hetzner.null_resource.kustomization (remote-exec): Waiting for load-balancer to get an IP...
module.kube-hetzner.null_resource.kustomization (remote-exec): Waiting for load-balancer to get an IP...
module.kube-hetzner.null_resource.kustomization (remote-exec): Waiting for load-balancer to get an IP...
module.kube-hetzner.null_resource.kustomization (remote-exec): Waiting for load-balancer to get an IP...
╷
│ Error: remote-exec provisioner error
│ 
│   with module.kube-hetzner.null_resource.agents["0-0-server"],
│   on .terraform/modules/kube-hetzner/agents.tf line 79, in resource "null_resource" "agents":
│   79:   provisioner "remote-exec" {
│ 
│ error executing "/tmp/terraform_1860320902.sh": Process exited with status 124
╵
╷
│ Error: remote-exec provisioner error
│ 
│   with module.kube-hetzner.null_resource.kustomization,
│   on .terraform/modules/kube-hetzner/init.tf line 285, in resource "null_resource" "kustomization":
│  285:   provisioner "remote-exec" {
│ 
│ error executing "/tmp/terraform_1317930754.sh": Process exited with status 124

I then destroyed the infra once again, set create_kustomization = false and deployed the infra agin. No changes..still same errors:

module.kube-hetzner.null_resource.kustomization: Still creating... [6m40s elapsed]
module.kube-hetzner.null_resource.kustomization (remote-exec): Waiting for load-balancer to get an IP...
module.kube-hetzner.null_resource.kustomization (remote-exec): Waiting for load-balancer to get an IP...
module.kube-hetzner.null_resource.kustomization (remote-exec): Waiting for load-balancer to get an IP...
module.kube-hetzner.null_resource.kustomization (remote-exec): Waiting for load-balancer to get an IP...
╷
│ Error: remote-exec provisioner error
│ 
│   with module.kube-hetzner.null_resource.agents["0-0-server"],
│   on .terraform/modules/kube-hetzner/agents.tf line 79, in resource "null_resource" "agents":
│   79:   provisioner "remote-exec" {
│ 
│ error executing "/tmp/terraform_2038894566.sh": Process exited with status 124
╵
╷
│ Error: remote-exec provisioner error
│ 
│   with module.kube-hetzner.null_resource.kustomization,
│   on .terraform/modules/kube-hetzner/init.tf line 285, in resource "null_resource" "kustomization":
│  285:   provisioner "remote-exec" {
│ 
│ error executing "/tmp/terraform_1937172584.sh": Process exited with status 124

What can I check to solve this? Many thanks

Kube.tf file

locals {
  # You have the choice of setting your Hetzner API token here or define the TF_VAR_hcloud_token env
  # within your shell, such as such: export TF_VAR_hcloud_token=xxxxxxxxxxx
  # If you choose to define it in the shell, this can be left as is.

  # Your Hetzner token can be found in your Project > Security > API Token (Read & Write is required).
  hcloud_token = "<hidden>"
  # diop ="<hidden>"
}

module "kube-hetzner" {
  providers = {
    hcloud = hcloud
  }
  hcloud_token = var.hcloud_token != "" ? var.hcloud_token : local.hcloud_token

  # Then fill or edit the below values. Only the first values starting with a * are obligatory; the rest can remain with their default values, or you
  # could adapt them to your needs.

  # * For local dev, path to the git repo
  # source = "../../kube-hetzner/"
  # If you want to use the latest master branch
  # source = "github.com/kube-hetzner/terraform-hcloud-kube-hetzner"
  # For normal use, this is the path to the terraform registry
  source = "kube-hetzner/kube-hetzner/hcloud"

  # You can optionally specify a version number
  # version = "1.2.0"
  # version = "2.0.9" # Throws -> Error: image has incompatible architecture (invalid_input): [image => [image has incompatible architecture]]

  # Note that some values, notably "location" and "public_key" have no effect after initializing the cluster.
  # This is to keep Terraform from re-provisioning all nodes at once, which would lose data. If you want to update
  # those, you should instead change the value here and manually re-provision each node. Grep for "lifecycle".

  # Customize the SSH port (by default 22)
  # ssh_port = 2222

  # * Your ssh public key
  ssh_public_key = file("~/.ssh/diporco.pub")
  # * Your private key must be "ssh_private_key = null" when you want to use ssh-agent for a Yubikey-like device authentification or an SSH key-pair with a passphrase.
  # For more details on SSH see https://github.com/kube-hetzner/kube-hetzner/blob/master/docs/ssh.md
  ssh_private_key = file("~/.ssh/diporco")
  # You can add additional SSH public Keys to grant other team members root access to your cluster nodes.
  # ssh_additional_public_keys = []

  # You can also add additional SSH public Keys which are saved in the hetzner cloud by a label.
  # See https://docs.hetzner.cloud/#label-selector
  # ssh_hcloud_key_label = "role=admin"

  # If you use SSH agent and have issues with SSH connecting to your nodes, you can increase the number of auth tries (default is 2)
  # ssh_max_auth_tries = 10

  # If you want to use an ssh key that is already registered within hetzner cloud, you can pass its id.
  # If no id is passed, a new ssh key will be registered within hetzner cloud.
  # It is important that exactly this key is passed via `ssh_public_key` & `ssh_private_key` variables.
  # hcloud_ssh_key_id = ""

  # These can be customized, or left with the default values
  # * For Hetzner locations see https://docs.hetzner.com/general/others/data-centers-and-connection/
  network_region = "eu-central" # change to `us-east` if location is ash

  # If you must change the network CIDR you can do so below, but it is highly advised against.
  # network_ipv4_cidr = "10.0.0.0/8"

  # If you must change the cluster CIDR you can do so below, but it is highly advised against.
  # Cluster CIDR must be a part of the network CIDR!
  # cluster_ipv4_cidr = "10.42.0.0/16"

  # For the control planes, at least three nodes are the minimum for HA. Otherwise, you need to turn off the automatic upgrades (see README).
  # **It must always be an ODD number, never even!** Search the internet for "splitbrain problem with etcd" or see https://rancher.com/docs/k3s/latest/en/installation/ha-embedded/
  # For instance, one is ok (non-HA), two is not ok, and three is ok (becomes HA). It does not matter if they are in the same nodepool or not! So they can be in different locations and of various types.

  # Of course, you can choose any number of nodepools you want, with the location you want. The only constraint on the location is that you need to stay in the same network region, Europe, or the US.
  # For the server type, the minimum instance supported is cpx11 (just a few cents more than cx11); see https://www.hetzner.com/cloud.

  # IMPORTANT: Before you create your cluster, you can do anything you want with the nodepools, but you need at least one of each, control plane and agent.
  # Once the cluster is up and running, you can change nodepool count and even set it to 0 (in the case of the first control-plane nodepool, the minimum is 1).
  # You can also rename it (if the count is 0), but do not remove a nodepool from the list.

  # The only nodepools that are safe to remove from the list are at the end. That is due to how subnets and IPs get allocated (FILO).
  # You can, however, freely add other nodepools at the end of each list if you want. The maximum number of nodepools you can create combined for both lists is 255.
  # Also, before decreasing the count of any nodepools to 0, it's essential to drain and cordon the nodes in question. Otherwise, it will leave your cluster in a bad state.

  # Before initializing the cluster, you can change all parameters and add or remove any nodepools. You need at least one nodepool of each kind, control plane, and agent.
  # The nodepool names are entirely arbitrary, you can choose whatever you want, but no special characters or underscore, and they must be unique; only alphanumeric characters and dashes are allowed.

  # If you want to have a single node cluster, have one control plane nodepools with a count of 1, and one agent nodepool with a count of 0.

  # Please note that changing labels and taints after the first run will have no effect. If needed, you can do that through Kubernetes directly.

  # ⚠️ When choosing ARM cax* server types, for the moment they are only available in fsn1.
  # Muli-architecture clusters are OK for most use cases, as container underlying images tend to be multi-architecture too.

  # * Example below:

  control_plane_nodepools = [
    {
      name        = "control-plane",
      server_type = "cax21",
      location    = "fsn1",
      labels      = [
        "node.kubernetes.io/server-type=control-plane"
      ],
      taints      = [],
      count       = 1

      # Enable automatic backups via Hetzner (default: false)
      # backups = true
    },
  ]

  agent_nodepools = [
    {
      name        = "server",
      server_type = "cax21",
      location    = "fsn1",
      labels      = [],
      taints      = [
        "node.kubernetes.io/server-type=server"
      ],
      count       = 1

      # Enable automatic backups via Hetzner (default: false)
      # backups = true
    },
    # {
    #   name        = "mongo",
    #   server_type = "cax21",
    #   location    = "fsn1",
    #   labels      = [
    #     "node.kubernetes.io/server-type=mongo"
    #   ],
    #   taints      = [],
    #   count       = 1

    #   # Enable automatic backups via Hetzner (default: false)
    #   # backups = true
    # },
    # {
    #   name        = "neo",
    #   server_type = "cax21",
    #   location    = "fsn1",
    #   # Fully optional, just a demo.
    #   labels      = [
    #     "node.kubernetes.io/server-type=neo4j"
    #   ],
    #   taints      = [],
    #   count       = 1

    #   # In the case of using Longhorn, you can use Hetzner volumes instead of using the node's own storage by specifying a value from 10 to 10000 (in GB)
    #   # It will create one volume per node in the nodepool, and configure Longhorn to use them.
    #   # Something worth noting is that Volume storage is slower than node storage, which is achieved by not mentioning longhorn_volume_size or setting it to 0.
    #   # So for something like DBs, you definitely want node storage, for other things like backups, volume storage is fine, and cheaper.
    #   # longhorn_volume_size = 20

    #   # Enable automatic backups via Hetzner (default: false)
    #   # backups = true
    # },
    # Egress nodepool useful to route egress traffic using Hetzner Floating IPs (https://docs.hetzner.com/cloud/floating-ips)
    # used with Cilium's Egress Gateway feature https://docs.cilium.io/en/stable/gettingstarted/egress-gateway/
    # See the https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner#examples for an example use case.
    # {
    #   name        = "egress",
    #   server_type = "cpx11",
    #   location    = "fsn1",
    #   labels = [
    #     "node.kubernetes.io/role=egress"
    #   ],
    #   taints = [
    #     "node.kubernetes.io/role=egress:NoSchedule"
    #   ],
    #   floating_ip = true
    #   count = 1
    # },
    # Arm based nodes, currently available only in FSN location
    # {
    #   name        = "agent-arm-small",
    #   server_type = "cax11",
    #   location    = "fsn1",
    #   labels      = [],
    #   taints      = [],
    #   count       = 1
    # }
  ]
  # Add custom control plane configuration options here.
  # E.g to enable monitoring for etcd, proxy etc:
  control_planes_custom_config = {
   etcd-expose-metrics = true,
   kube-controller-manager-arg = "bind-address=0.0.0.0",
   kube-proxy-arg ="metrics-bind-address=0.0.0.0",
   kube-scheduler-arg = "bind-address=0.0.0.0",
  }

  # You can enable encrypted wireguard for the CNI by setting this to "true". Default is "false".
  # FYI, Hetzner says "Traffic between cloud servers inside a Network is private and isolated, but not automatically encrypted."
  # Source: https://docs.hetzner.com/cloud/networks/faq/#is-traffic-inside-hetzner-cloud-networks-encrypted
  # It works with all CNIs that we support.
  # Just note, that if Cilium with cilium_values, the responsability of enabling of disabling Wireguard falls on you.
  # enable_wireguard = true

  # * LB location and type, the latter will depend on how much load you want it to handle, see https://www.hetzner.com/cloud/load-balancer
  load_balancer_type     = "lb11"
  load_balancer_location = "fsn1"

  # Disable IPv6 for the load balancer, the default is false.
  # load_balancer_disable_ipv6 = true

  # Specifies the algorithm type of the load balancer. (default: round_robin).
  # load_balancer_algorithm_type = "least_connections"

  # Specifies the interval at which a health check is performed. Minimum is 3s (default: 15s).
  # load_balancer_health_check_interval = "5s"

  # Specifies the timeout of a single health check. Must not be greater than the health check interval. Minimum is 1s (default: 10s).
  # load_balancer_health_check_timeout = "3s"

  # Specifies the number of times a health check is retried before a target is marked as unhealthy. (default: 3)
  # load_balancer_health_check_retries = 3

  ### The following values are entirely optional (and can be removed from this if unused)

  # You can refine a base domain name to be use in this form of nodename.base_domain for setting the reserve dns inside Hetzner
  # base_domain = "mycluster.example.com"

  # Cluster Autoscaler
  # Providing at least one map for the array enables the cluster autoscaler feature, default is disabled
  # By default we set a compatible version with the default initial_k3s_channel, to set another one,
  # have a look at the tag value in https://github.com/kubernetes/autoscaler/blob/master/charts/cluster-autoscaler/values.yaml
  # ⚠️ Based on how the autoscaler works with this project, you can only choose either x86 instances or ARM server types for ALL autocaler nodepools.
  # Also, as mentioned above, for the time being ARM cax* instances are only available in fsn1.
  # If you are curious, it's ok to have a multi-architecture cluster, as most underlying container images are multi-architecture too.
  # * Example below:
  # autoscaler_nodepools = [
  #   {
  #     name        = "autoscaled-small"
  #     server_type = "cpx21"
  #     location    = "fsn1"
  #     min_nodes   = 0
  #     max_nodes   = 5
  #   }
  # ]

  # Add extra labels on nodes started by the Cluster Autoscaler
  # This argument is not used if autoscaler_nodepools is not set, because the Cluster Autoscaler is installed only if autoscaler_nodepools is set
  # autoscaler_labels = [
  #   "node.kubernetes.io/role=peak-workloads"
  # ]

  # Add extra taints on nodes started by the Cluster Autoscaler
  # This argument is not used if autoscaler_nodepools is not set, because the Cluster Autoscaler is installed only if autoscaler_nodepools is set
  # autoscaler_taints = [
  #   "node.kubernetes.io/role=specific-workloads:NoExecute"
  # ]

  # Configuration of the Cluster Autoscaler binary
  #
  # These arguments and variables are not used if autoscaler_nodepools is not set, because the Cluster Autoscaler is installed only if autoscaler_nodepools is set.
  #
  # Image and version of Kubernetes Cluster Autoscaler for Hetzner Cloud:
  #   - cluster_autoscaler_image: Image of Kubernetes Cluster Autoscaler for Hetzner Cloud to be used.
  #   - cluster_autoscaler_version: Version of Kubernetes Cluster Autoscaler for Hetzner Cloud. Should be aligned with Kubernetes version.
  #
  # Logging related arguments are managed using separate variables:
  #   - cluster_autoscaler_log_level: Controls the verbosity of logs (--v)
  #   - cluster_autoscaler_log_to_stderr: Determines whether to log to stderr (--logtostderr)
  #   - cluster_autoscaler_stderr_threshold: Sets the threshold for logs that go to stderr (--stderrthreshold)
  #
  # Example (using the default values):
  #
  # cluster_autoscaler_image = "registry.k8s.io/autoscaling/cluster-autoscaler"
  # cluster_autoscaler_version = "v1.26.2"
  # cluster_autoscaler_log_level = 4
  # cluster_autoscaler_log_to_stderr = true
  # cluster_autoscaler_stderr_threshold = "WARNING"

  # Additional Cluster Autoscaler binary configuration
  #
  # cluster_autoscaler_extra_args can be used for additional arguments. The default is an empty array.
  #
  # Please note that following arguments are managed by terraform-hcloud-kube-hetzner or the variables above and should not be set manually:
  #   - --v=${var.cluster_autoscaler_log_level}
  #   - --logtostderr=${var.cluster_autoscaler_log_to_stderr}
  #   - --stderrthreshold=${var.cluster_autoscaler_stderr_threshold}
  #   - --cloud-provider=hetzner
  #   - --nodes ...
  #
  # See the Cluster Autoscaler FAQ for the full list of arguments: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca
  #
  # Example:
  #
  # cluster_autoscaler_extra_args = [
  #   "--ignore-daemonsets-utilization=true",
  #   "--enforce-node-group-min-size=true",
  # ]

  # Enable etcd snapshot backups to S3 storage.
  # Just provide a map with the needed settings (according to your S3 storage provider) and backups to S3 will
  # be enabled (with the default settings for etcd snapshots).
  # Cloudflare's R2 offers 10GB, 10 million reads and 1 million writes per month for free.
  # For proper context, have a look at https://docs.k3s.io/datastore/backup-restore.
  # etcd_s3_backup = {
  #   etcd-s3-endpoint        = "xxxx.r2.cloudflarestorage.com"
  #   etcd-s3-access-key      = "<access-key>"
  #   etcd-s3-secret-key      = "<secret-key>"
  #   etcd-s3-bucket          = "k3s-etcd-snapshots"
  # }

  # To enable Hetzner Storage Box support, you can enable csi-driver-smb, default is "false".
  # enable_csi_driver_smb = true

  # To use local storage on the nodes, you can enable Longhorn, default is "false".
  # See a full recap on how to configure agent nodepools for longhorn here https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/discussions/373#discussioncomment-3983159
  # Also see Longhorn best practices here https://gist.github.com/ifeulner/d311b2868f6c00e649f33a72166c2e5b
  # enable_longhorn = true

  # By default, longhorn is pulled from https://charts.longhorn.io.
  # If you need a version of longhorn which assures compatibility with rancher you can set this variable to https://charts.rancher.io.
  # longhorn_repository = "https://charts.rancher.io"

  # The namespace for longhorn deployment, default is "longhorn-system".
  # longhorn_namespace = "longhorn-system"

  # The file system type for Longhorn, if enabled (ext4 is the default, otherwise you can choose xfs).
  # longhorn_fstype = "xfs"

  # how many replica volumes should longhorn create (default is 3).
  # longhorn_replica_count = 1

  # When you enable Longhorn, you can go with the default settings and just modify the above two variables OR you can add a longhorn_values variable
  # with all needed helm values, see towards the end of the file in the advanced section.
  # If that file is present, the system will use it during the deploy, if not it will use the default values with the two variable above that can be customized.
  # After the cluster is deployed, you can always use HelmChartConfig definition to tweak the configuration.

  # Also, you can choose to use a Hetzner volume with Longhorn. By default, it will use the nodes own storage space, but if you add an attribute of
  # longhorn_volume_size (⚠️ not a variable, just a possible agent nodepool attribute) with a value between 10 and 10000 GB to your agent nodepool definition, it will create and use the volume in question.
  # See the agent nodepool section for an example of how to do that.

  # To disable Hetzner CSI storage, you can set the following to "true", default is "false".
  # disable_hetzner_csi = true

  # If you want to use a specific Hetzner CCM and CSI version, set them below; otherwise, leave them as-is for the latest versions.
  # hetzner_ccm_version = ""
  # hetzner_csi_version = ""

  # If you want to specify the Kured version, set it below - otherwise it'll use the latest version available.
  # kured_version = ""

  # If you want to enable the Nginx ingress controller (https://kubernetes.github.io/ingress-nginx/) instead of Traefik, you can set this to "nginx". Default is "traefik".
  # By the default we load optimal Traefik and Nginx ingress controller config for Hetzner, however you may need to tweak it to your needs, so to do,
  # we allow you to add a traefik_values and nginx_values, see towards the end of this file in the advanced section.
  # After the cluster is deployed, you can always use HelmChartConfig definition to tweak the configuration.
  # If you want to disable both controllers set this to "none"
  ingress_controller = "nginx"

  # You can change the number of replicas for selected ingress controller here. The default 0 means autoselecting based on number of agent nodes (1 node = 1 replica, 2 nodes = 2 replicas, 3+ nodes = 3 replicas)
  # ingress_replica_count = 1

  # Use the klipperLB (similar to metalLB), instead of the default Hetzner one, that has an advantage of dropping the cost of the setup.
  # Automatically "true" in the case of single node cluster (as it does not make sense to use the Hetzner LB in that situation).
  # It can work with any ingress controller that you choose to deploy.
  # Please note that because the klipperLB points to all nodes, we automatically allow scheduling on the control plane when it is active.
  # enable_klipper_metal_lb = "true"

  # If you want to configure additional arguments for traefik, enter them here as a list and in the form of traefik CLI arguments; see https://doc.traefik.io/traefik/reference/static-configuration/cli/
  # They are the options that go into the additionalArguments section of the Traefik helm values file.
  # Example:
  # traefik_additional_options = ["--log.level=DEBUG", "--tracing=true"]

  # By default traefik is configured to redirect http traffic to https, you can set this to "false" to disable the redirection.
  # The default is true.
  # traefik_redirect_to_https = false

  # Enable or disable Horizontal Pod Autoscaler for traefik.
  # The default is true.
  # traefik_autoscaling = false

  # Enable or disable pod disruption budget for traefik. Values are maxUnavailable: 33% and minAvailable: 1.
  # The default is true.
  # traefik_pod_disruption_budget = false

  # Enable or disable default resource requests and limits for traefik. Values requested are 100m & 50Mi and limits 300m & 150Mi.
  # The default is true.
  # traefik_resource_limits = false

  # If you want to configure additional ports for traefik, enter them here as a list of objects with name, port, and exposedPort properties.
  # Example:
  # traefik_additional_ports = [{name = "example", port = 1234, exposedPort = 1234}]

  # If you want to disable the metric server set this to "false". Default is "true".
  # enable_metrics_server = false

  # If you want to allow non-control-plane workloads to run on the control-plane nodes, set this to "true". The default is "false".
  # True by default for single node clusters, and when enable_klipper_metal_lb is true. In those cases, the value below will be ignored.
  # allow_scheduling_on_control_plane = true

  # If you want to disable the automatic upgrade of k3s, you can set below to "false".
  # Ideally, keep it on, to always have the latest Kubernetes version, but lock the initial_k3s_channel to a kube major version,
  # of your choice, like v1.25 or v1.26. That way you get the best of both worlds without the breaking changes risk.
  # For production use, always use an HA setup with at least 3 control-plane nodes and 2 agents, and keep this on for maximum security.

  # The default is "true" (in HA setup i.e. at least 3 control plane nodes & 2 agents, just keep it enabled since it works flawlessly).
  automatically_upgrade_k3s = false

  # The default is "true" (in HA setup it works wonderfully well, with automatic roll-back to the previous snapshot in case of an issue).
  # IMPORTANT! For non-HA clusters i.e. when the number of control-plane nodes is < 3, you have to turn it off.
  automatically_upgrade_os = false

  # If you need more control over kured and the reboot behaviour, you can pass additional options to kured.
  # For example limiting reboots to certain timeframes. For all options see: https://kured.dev/docs/configuration/
  # The default options are: `--reboot-command=/usr/bin/systemctl reboot --pre-reboot-node-labels=kured=rebooting --post-reboot-node-labels=kured=done --period=5m`
  # Defaults can be overridden by using the same key.
  # kured_options = {
  #   "reboot-days": "su"
  #   "start-time": "3am"
  #   "end-time": "8am"
  #   "time-zone": "Local"
  # }

  # Allows you to specify either stable, latest, testing or supported minor versions.
  # see https://rancher.com/docs/k3s/latest/en/upgrades/basic/ and https://update.k3s.io/v1-release/channels
  # ⚠️ If you are going to use Rancher addons for instance, it's always a good idea to fix the kube version to latest - 0.01,
  # ⚠️ Rancher currently only supports v1.25 and earlier versions: https://github.com/rancher/rancher/issues/41113
  # The default is "v1.26".
  initial_k3s_channel = "stable"

  # The cluster name, by default "k3s"
  cluster_name = "server"

  # Whether to use the cluster name in the node name, in the form of {cluster_name}-{nodepool_name}, the default is "true".
  use_cluster_name_in_node_name = false

  # Extra k3s registries. This is useful if you have private registries and you want to pull images without additional secrets.
  # Or if you want to proxy registries for various reasons like rate-limiting.
  # It will create the registries.yaml file, more info here https://docs.k3s.io/installation/private-registry.
  # Note that you do not need to get this right from the first time, you can update it when you want during the life of your cluster.
  # The default is blank.
  /* k3s_registries = <<-EOT
    mirrors:
      hub.my_registry.com:
        endpoint:
          - "hub.my_registry.com"
    configs:
      hub.my_registry.com:
        auth:
          username: username
          password: password
  EOT */

  # Additional environment variables for the host OS on which k3s runs. See for example https://docs.k3s.io/advanced#configuring-an-http-proxy .
  # additional_k3s_environment = {
  #   "CONTAINERD_HTTP_PROXY" : "http://your.proxy:port",
  #   "CONTAINERD_HTTPS_PROXY" : "http://your.proxy:port",
  #   "NO_PROXY" : "127.0.0.0/8,10.0.0.0/8,",
  # }

  # Additional commands to execute on the host OS before the k3s install, for example fetching and installing certs.
  # preinstall_exec = [
  #   "curl https://somewhere.over.the.rainbow/ca.crt > /root/ca.crt",
  #   "trust anchor --store /root/ca.crt",
  # ]

  # Additional flags to pass to the k3s server command (the control plane).
  # k3s_exec_server_args = "--kube-apiserver-arg enable-admission-plugins=PodTolerationRestriction,PodNodeSelector"

  # Additional flags to pass to the k3s agent command (every agents nodes, including autoscaler nodepools).
  # k3s_exec_agent_args = "--kubelet-arg kube-reserved=cpu=100m,memory=200Mi,ephemeral-storage=1Gi"

  # If you want to allow all outbound traffic you can set this to "false". Default is "true".
  # restrict_outbound_traffic = false

  # Adding extra firewall rules, like opening a port
  # More info on the format here https://registry.terraform.io/providers/hetznercloud/hcloud/latest/docs/resources/firewall
  # extra_firewall_rules = [
  #   {
  #     description = "For Postgres"
  #     direction       = "in"
  #     protocol        = "tcp"
  #     port            = "5432"
  #     source_ips      = ["0.0.0.0/0", "::/0"]
  #     destination_ips = [] # Won't be used for this rule
  #   },
  #   {
  #     description = "To Allow ArgoCD access to resources via SSH"
  #     direction       = "out"
  #     protocol        = "tcp"
  #     port            = "22"
  #     source_ips      = [] # Won't be used for this rule
  #     destination_ips = ["0.0.0.0/0", "::/0"]
  #   }
  # ]

  # If you want to configure a different CNI for k3s, use this flag
  # possible values: flannel (Default), calico, and cilium
  # As for Cilium, we allow infinite configurations via helm values, please check the CNI section of the readme over at https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/#cni.
  # Also, see the cilium_values at towards the end of this file, in the advanced section.
  # cni_plugin = "cilium"

  # You can choose the version of Calico that you want. By default, the latest is used.
  # More info on available versions can be found at https://github.com/projectcalico/calico/releases
  # Please note that if you are getting 403s from Github, it's also useful to set the version manually. However there is rarely a need for that!
  # calico_version = "v3.25.0"

  # If you want to disable the k3s default network policy controller, use this flag!
  # Both Calico and Ciliun cni_plugin values override this value to true automatically, the default is "false".
  # disable_network_policy = true

  # If you want to disable the automatic use of placement group "spread". See https://docs.hetzner.com/cloud/placement-groups/overview/
  # We advise to not touch that setting, unless you have a specific purpose.
  # The default is "false", meaning it's enabled by default.
  # placement_group_disable = true

  # By default, we allow ICMP ping in to the nodes, to check for liveness for instance. If you do not want to allow that, you can. Just set this flag to true (false by default).
  # block_icmp_ping_in = true

  # You can enable cert-manager (installed by Helm behind the scenes) with the following flag, the default is "true".
  # enable_cert_manager = false

  # IP Addresses to use for the DNS Servers, set to an empty list to use the ones provided by Hetzner, defaults to ["1.1.1.1", "8.8.8.8", "9.9.9.9"].
  # The number of different DNS servers is limited to 3 by Kubernetes itself.
  # dns_servers = []

  # When this is enabled, rather than the first node, all external traffic will be routed via a control-plane loadbalancer, allowing for high availability.
  # The default is false.
  # use_control_plane_lb = true
  # when this use_control_plane_lb is enabled, change the load balancer type to lb21, the default is "lb11"
  # control_plane_lb_type = lb21

  # Let's say you are not using the control plane LB solution above, and still want to have one hostname point to all your control-plane nodes.
  # You could create multiple A records of to let's say cp.cluster.my.org pointing to all of your control-plane nodes ips.
  # In which case, you need to define that hostname in the k3s TLS-SANs config to allow connection through it. It can be hostnames or IP addresses.
  # additional_tls_sans = ["cp.cluster.my.org"]

  # Oftentimes, you need to communicate to the cluster from inside the cluster itself, in which case it is important to set this value, as it will configure the hostname
  # at the load balancer level, and will save you from many slows downs when initiating communications from inside. Later on, you can point your DNS to the IP given
  # to the LB. And if you have other services pointing to it, you are also free to create CNAMES to point to it, or whatever you see fit.
  # If set, it will apply to either ingress controllers, Traefik or Ingress-Nginx.
  # lb_hostname = ""

  # You can enable Rancher (installed by Helm behind the scenes) with the following flag, the default is "false".
  # ⚠️ Rancher currently only supports Kubernetes v1.25 and earlier, you will need to set initial_k3s_channel to a supported version: https://github.com/rancher/rancher/issues/41113
  # When Rancher is enabled, it automatically installs cert-manager too, and it uses rancher's own self-signed certificates.
  # See for options https://rancher.com/docs/rancher/v2.0-v2.4/en/installation/resources/advanced/helm2/helm-rancher/#choose-your-ssl-configuration
  # The easiest thing is to leave everything as is (using the default rancher self-signed certificate) and put Cloudflare in front of it.
  # As for the number of replicas, by default it is set to the numbe of control plane nodes.
  # You can customized all of the above by adding a rancher_values variable see at the end of this file in the advanced section.
  # After the cluster is deployed, you can always use HelmChartConfig definition to tweak the configuration.
  # IMPORTANT: Rancher's install is quite memory intensive, you will require at least 4GB if RAM, meaning cx21 server type (for your control plane).
  # ALSO, in order for Rancher to successfully deploy, you have to set the "rancher_hostname".
  # enable_rancher = true

  # If using Rancher you can set the Rancher hostname, it must be unique hostname even if you do not use it.
  # If not pointing the DNS, you can just port-forward locally via kubectl to get access to the dashboard.
  # If you already set the lb_hostname above and are using a Hetzner LB, you do not need to set this one, as it will be used by default.
  # But if you set this one explicitly, it will have preference over the lb_hostname in rancher settings.
  # rancher_hostname = "rancher.xyz.dev"

  # When Rancher is deployed, by default is uses the "latest" channel. But this can be customized.
  # The allowed values are "stable" or "latest".
  # rancher_install_channel = "stable"

  # Finally, you can specify a bootstrap-password for your rancher instance. Minimum 48 characters long!
  # If you leave empty, one will be generated for you.
  # (Can be used by another rancher2 provider to continue setup of rancher outside this module.)
  # rancher_bootstrap_password = ""

  # Separate from the above Rancher config (only use one or the other). You can import this cluster directly on an
  # an already active Rancher install. By clicking "import cluster" choosing "generic", giving it a name and pasting
  # the cluster registration url below. However, you can also ignore that and apply the url via kubectl as instructed
  # by Rancher in the wizard, and that would register your cluster too.
  # More information about the registration can be found here https://rancher.com/docs/rancher/v2.6/en/cluster-provisioning/registered-clusters/
  # rancher_registration_manifest_url = "https://rancher.xyz.dev/v3/import/xxxxxxxxxxxxxxxxxxYYYYYYYYYYYYYYYYYYYzzzzzzzzzzzzzzzzzzzzz.yaml"

  # Extra commands to be executed after the `kubectl apply -k` (useful for post-install actions, e.g. wait for CRD, apply additional manifests, etc.).
  # extra_kustomize_deployment_commands=""

  # Extra values that will be passed to the `extra-manifests/kustomization.yaml.tpl` if its present.
  # extra_kustomize_parameters={}

  # See an working example for just a manifest.yaml, a HelmChart and a HelmChartConfig examples/kustomization_user_deploy/README.md

  # It is best practice to turn this off, but for backwards compatibility it is set to "true" by default.
  # See https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/issues/349
  # When "false". The kubeconfig file can instead be created by executing: "terraform output --raw kubeconfig > cluster_kubeconfig.yaml"
  # Always be careful to not commit this file!
  # create_kubeconfig = false

  # Don't create the kustomize backup. This can be helpful for automation.
  create_kustomization = false

  # MicroOS snapshot IDs to be used. Per default empty, the most recent image created using createkh will be used.
  # We recommend the default, but if you want to use specific IDs you can.
  # You can fetch the ids with the hcloud cli by running the "hcloud image list --selector 'microos-snapshot=yes'" command.
  # microos_x86_snapshot_id = "1234567"
  # microos_arm_snapshot_id = "1234567"

  ### ADVANCED - Custom helm values for packages above (search _values if you want to located where those are mentioned upper in this file)
  # ⚠️ Inside the _values variable below are examples, up to you to find out the best helm values possible, we do not provide support for customized helm values.
  # Please understand that the indentation is very important, inside the EOTs, as those are proper yaml helm values.
  # We advise you to use the default values, and only change them if you know what you are doing!

  # Cilium, all Cilium helm values can be found at https://github.com/cilium/cilium/blob/master/install/kubernetes/cilium/values.yaml
  # The following is an example, please note that the current indentation inside the EOT is important.
  /*   cilium_values = <<EOT
ipam:
  mode: kubernetes
devices: "eth1"
k8s:
  requireIPv4PodCIDR: true
kubeProxyReplacement: strict
l7Proxy: false
encryption:
  enabled: true
  type: wireguard
extraConfig:
  mtu: "1450"
  EOT */

  # Cert manager, all cert-manager helm values can be found at https://github.com/cert-manager/cert-manager/blob/master/deploy/charts/cert-manager/values.yaml
  # The following is an example, please note that the current indentation inside the EOT is important.
  /*   cert_manager_values = <<EOT
installCRDs: true
replicaCount: 3
webhook:
  replicaCount: 3
cainjector:
  replicaCount: 3
  EOT */

  # csi-driver-smb, all csi-driver-smb helm values can be found at https://github.com/kubernetes-csi/csi-driver-smb/blob/master/charts/latest/csi-driver-smb/values.yaml
  # The following is an example, please note that the current indentation inside the EOT is important.
  /*   csi_driver_smb_values = <<EOT
controller:
  name: csi-smb-controller
  replicas: 1
  runOnMaster: false
  runOnControlPlane: false
  resources:
    csiProvisioner:
      limits:
        memory: 300Mi
      requests:
        cpu: 10m
        memory: 20Mi
    livenessProbe:
      limits:
        memory: 100Mi
      requests:
        cpu: 10m
        memory: 20Mi
    smb:
      limits:
        memory: 200Mi
      requests:
        cpu: 10m
        memory: 20Mi
  EOT */

  # Longhorn, all Longhorn helm values can be found at https://github.com/longhorn/longhorn/blob/master/chart/values.yaml
  # The following is an example, please note that the current indentation inside the EOT is important.
  /*   longhorn_values = <<EOT
defaultSettings:
  defaultDataPath: /var/longhorn
persistence:
  defaultFsType: ext4
  defaultClassReplicaCount: 3
  defaultClass: true
  EOT */

  # Traefik, all Traefik helm values can be found at https://github.com/traefik/traefik-helm-chart/blob/master/traefik/values.yaml
  # The following is an example, please note that the current indentation inside the EOT is important.
  /*   traefik_values = <<EOT
deployment:
  replicas: 1
globalArguments: []
service:
  enabled: true
  type: LoadBalancer
  annotations:
    "load-balancer.hetzner.cloud/name": "k3s"
    "load-balancer.hetzner.cloud/use-private-ip": "true"
    "load-balancer.hetzner.cloud/disable-private-ingress": "true"
    "load-balancer.hetzner.cloud/location": "nbg1"
    "load-balancer.hetzner.cloud/type": "lb11"
    "load-balancer.hetzner.cloud/uses-proxyprotocol": "true"

ports:
  web:
    redirectTo: websecure

    proxyProtocol:
      trustedIPs:
        - 127.0.0.1/32
        - 10.0.0.0/8
    forwardedHeaders:
      trustedIPs:
        - 127.0.0.1/32
        - 10.0.0.0/8
  websecure:
    proxyProtocol:
      trustedIPs:
        - 127.0.0.1/32
        - 10.0.0.0/8
    forwardedHeaders:
      trustedIPs:
        - 127.0.0.1/32
        - 10.0.0.0/8
  EOT */

  # Nginx, all Nginx helm values can be found at https://github.com/kubernetes/ingress-nginx/blob/main/charts/ingress-nginx/values.yaml
  # You can also have a look at https://kubernetes.github.io/ingress-nginx/, to understand how it works, and all the options at your disposal.
  # The following is an example, please note that the current indentation inside the EOT is important.
  /*   nginx_values = <<EOT
controller:
  watchIngressWithoutClass: "true"
  kind: "DaemonSet"
  config:
    "use-forwarded-headers": "true"
    "compute-full-forwarded-for": "true"
    "use-proxy-protocol": "true"
  service:
    annotations:
      "load-balancer.hetzner.cloud/name": "k3s"
      "load-balancer.hetzner.cloud/use-private-ip": "true"
      "load-balancer.hetzner.cloud/disable-private-ingress": "true"
      "load-balancer.hetzner.cloud/location": "nbg1"
      "load-balancer.hetzner.cloud/type": "lb11"
      "load-balancer.hetzner.cloud/uses-proxyprotocol": "true"
  EOT */

  # Rancher, all Rancher helm values can be found at https://rancher.com/docs/rancher/v2.5/en/installation/install-rancher-on-k8s/chart-options/
  # The following is an example, please note that the current indentation inside the EOT is important.
  /*   rancher_values = <<EOT
ingress:
  tls:
    source: "rancher"
hostname: "rancher.example.com"
replicas: 1
bootstrapPassword: "supermario"
  EOT */

}

provider "hcloud" {
  token = var.hcloud_token != "" ? var.hcloud_token : local.hcloud_token
}

terraform {
  required_version = ">= 1.4.0"
  required_providers {
    hcloud = {
      source  = "hetznercloud/hcloud"
      version = ">= 1.41.0"
    }
  }
}

output "kubeconfig" {
  value     = module.kube-hetzner.kubeconfig
  sensitive = true
}

variable "hcloud_token" {
  sensitive = true
  default   = ""
}

Screenshots

No response

Platform

Mac

M4t7e commented 1 year ago

@vinnytwice Please delete your hcloud_token and create a new one. You've exposed it in your kube.tf file. I adjusted your comment, but it will still be available in the history.

vinnytwice commented 1 year ago

@M4t7e thanks for pointing it out, of course I'll rotate it.

vinnytwice commented 1 year ago

Ok, I did start fresh a new Hetzner and Kube-hetzner projects so I could run some tests and see what leads to what results.. but generally speaking using initial_k3s = “stable” is causing the above error.

In all other cases terraform hangs on destroying the cluster either on the control plane node or the agent node depending on how the load balancer is set up.

Here is the step-by-step tweaking I made and the results:

Setup : terraform init --upgrade

Initializing the backend...
Upgrading modules...
Downloading registry.terraform.io/kube-hetzner/kube-hetzner/hcloud 2.3.2 for kube-hetzner...
- kube-hetzner in .terraform/modules/kube-hetzner
- kube-hetzner.agents in .terraform/modules/kube-hetzner/modules/host
- kube-hetzner.control_planes in .terraform/modules/kube-hetzner/modules/host

Initializing provider plugins...
- Finding latest version of hashicorp/random...
- Finding latest version of hashicorp/cloudinit...
- Finding hetznercloud/hcloud versions matching ">= 1.41.0"...
- Finding hashicorp/local versions matching ">= 2.0.0"...
- Finding tenstad/remote versions matching ">= 0.0.23"...
- Finding integrations/github versions matching ">= 4.0.0"...
- Finding latest version of hashicorp/null...
- Installing hashicorp/cloudinit v2.3.2...
- Installed hashicorp/cloudinit v2.3.2 (signed by HashiCorp)
- Installing hetznercloud/hcloud v1.42.0...
- Installed hetznercloud/hcloud v1.42.0 (signed by a HashiCorp partner, key ID 5219EACB3A77198B)
- Installing hashicorp/local v2.4.0...
- Installed hashicorp/local v2.4.0 (signed by HashiCorp)
- Installing tenstad/remote v0.1.2...
- Installed tenstad/remote v0.1.2 (self-signed, key ID 0696D656FC3AC5FA)
- Installing integrations/github v5.32.0...
- Installed integrations/github v5.32.0 (signed by a HashiCorp partner, key ID 38027F80D7FD5FB2)
- Installing hashicorp/null v3.2.1...
- Installed hashicorp/null v3.2.1 (signed by HashiCorp)
- Installing hashicorp/random v3.5.1...
- Installed hashicorp/random v3.5.1 (signed by HashiCorp)

Configuration tests

1.

Updated the default kube.tf file to create just 1 cax11 Control plane + 1 cax11 agent nodes

Apply : success in 3’41” Destroy: hangs on

module.kube-hetzner.hcloud_network_subnet.agent[0]

Network and Load Balancer are not destroyed.

2.

Added:

@L419 automaitically_upgrade_k3s = false @L423 automatically_upgrade_os = false @L441 initial_k3s = “stable”

Apply: Error in 3’25”

│ Error: remote-exec provisioner error
│ 
│   with module.kube-hetzner.null_resource.kustomization,
│   on .terraform/modules/kube-hetzner/init.tf line 285, in resource "null_resource" "kustomization":
│  285:   provisioner "remote-exec" {
│ 
│ error executing "/tmp/terraform_815743714.sh": Process exited with status 1

Destroy: success

3.

Removed: @L441 initial_k3s = “stable”

Apply: Success in 3’30” Destroy: hangs on

module.kube-hetzner.hcloud_network_subnet.agent[0]

Network and Load Balancer are not destroyed.

4.

Added: @L370 ingress_controller = “nginx”

Apply: Success in 7’30” TAKES WAY LONGER !!! Destroy: hangs on

module.kube-hetzner.hcloud_network_subnet.control_plane[0]: Still destroying... [id=3156572-10.255.0.0/16, 9m40s elapsed]

Network and Load Balancer are not destroyed.

5.

Added:

@L204 control_planes_custom_config = {
   etcd-expose-metrics = true,
   kube-controller-manager-arg = "bind-address=0.0.0.0",
   kube-proxy-arg ="metrics-bind-address=0.0.0.0",
   kube-scheduler-arg = "bind-address=0.0.0.0",
  }

Apply: Success in 4’08”

Destroy: hangs on

module.kube-hetzner.hcloud_network_subnet.agent[0]: Still destroying... [id=3156635-10.0.0.0/16, 6m0s elapsed]

Network and Load Balancer are not destroyed.

6.

Added:

@L541 use_control_plane_lb = true

Apply: success in 7’30”

Destroy: hangs on

module.kube-hetzner.hcloud_network_subnet.control_plane[0]: Still destroying... [id=3156672-10.255.0.0/16, 9m20s elapsed]

7.

Added:

     nginx_values = <<EOT
controller:
  watchIngressWithoutClass: "true"
  kind: "DaemonSet"
  config:
    "use-forwarded-headers": "true"
    "compute-full-forwarded-for": "true"
    "use-proxy-protocol": "true"
  service:
    annotations:
      "load-balancer.hetzner.cloud/name": "k3s"
      "load-balancer.hetzner.cloud/use-private-ip": "true"
      "load-balancer.hetzner.cloud/disable-private-ingress": "true"
      "load-balancer.hetzner.cloud/location": "fsn1"
      "load-balancer.hetzner.cloud/type": "lb11"
      "load-balancer.hetzner.cloud/uses-proxyprotocol": "true"
    extraArgs:
      default-ssl-certificate: "default/tls-secret"  # Only difference from sample config

  EOT

Apply: Success

Destroy: hangs on

module.kube-hetzner.hcloud_network_subnet.control_plane[0]: Still destroying... [id=3156860-10.255.0.0/16, 3m30s elapsed]

I can't seem to find a stable configuration..

I'm basically trying to set the cluster as :

1 Control-plane 3 Agents ( Node.js server - MongoDb - Neo4j ) Nginx ingress controller with some TCP port mapping to some services ( mainly to expose Neo4j Browser )

Right now HA is not a priority.. I'm just testing Hetzner cloud as I want to move away from Azure

What is the correct configuration for this cluster??

mysticaltech commented 1 year ago

@vinnytwice See the readme on how to destroy. We've created a cleanupkh command to help you out. Basically as soon as you reach the subnet destroy, you need to apply it in a separate terminal tab or window. It will delete the lb and the autoscaled nodes, that are not part of terraform (they cause the hanging on destroy).

mysticaltech commented 1 year ago

So basically you do that and you should have it stable! :)

vinnytwice commented 1 year ago

@mysticaltech oh I see.. thanks. I'll check this solution for the hanging part.

By the way we're here there is a part which I don't fully get from the kube.tf file.. sorry I'm still quite unexperienced in setting up kubernetes clusters and infrastructure and could use a help here..

Oftentimes, you need to `communicate to the cluster from inside the cluster itself`, in which case it is important to set this value, as it will configure the hostname

at the load balancer level, and will save you from many slows downs when initiating communications from inside. Later on, you can point your DNS to the IP given

to the LB. And if you have other services pointing to it, you are also free to create CNAMES to point to it, or whatever you see fit.

If set, it will apply to either ingress controllers, Traefik or Ingress-Nginx.

lb_hostname = ""

Is setting this parameter related to external communications using a DNS record instead?

AFAIK kubernetes internal cluster communication is performed using Service objects, but the DNS A record I set "hetzner_cloud" on my domain "my domain.com" pointing to the load balancer IP address should give me a "hetzner_cloud.mydomain.com" url to access the cluster. As I understand it, if I set lb_hostname = "hetzner_cloud.mydomain.com" I am then able to use "hetzner_cloud.mydomain.com" as a url to access the cluster, if I leave it unset then I have to use the Load Balancer IP address directly to access the cluster.

Or is this parameter used instead of/in combination with declaring the tls hosts in the Ingress manifest as follow?

kind: Ingress
metadata:
  name: echo-ingress
  annotations:  
    kubernetes.io/ingress.class: nginx
    certmanager.k8s.io/cluster-issuer: letsencrypt-staging
spec:
  tls:
  - hosts:
    - hetzner_cloud.mydomain.com

Thank you very much again

mysticaltech commented 1 year ago

@vinnytwice Apologies for the late reply, just saw this now. I understand this can be confusing. Basically it's only necessary if you have internal pods that are going to use full blown complete domains instead of services to talk to other in-cluster pods.

Let's say you have pod A, with associated service A, exposing as a.mycluster.domain.com, instead of using A, pod B or C are going to try to reach A on a.mycluster.domain.com, this can cause slow down if the Hetzner LB that the A service uses is not associated with the domain name in question.

So basically, if you are in this scenario, it's best to set to lb_hostname to mycluster.domain.com,. A record that to your LB IP, and CNAME a.mycluster.domain.com to mycluster.domain.com, that way when an internal service that to A via a.mycluster.domain.com, there would not be any slow downs.

lb_hostname just sets load-balancer.hetzner.cloud/hostname in the Hetzner LB definition, either in Nginx or Traefik.

Honestly I do not use it, because I just have internel services use the other services names. If it is in another namespace, you just prefix like .service_name, as the kube norms explains, and it works great. So that setting is completely optional!

More context here: https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/discussions/478

Hope this clarifies it enough. This section in the docs is definitely confusing. PR welcome!

mysticaltech commented 1 year ago

@vinnytwice Please check the kube.tf.example again, I clarified the section. Hopefully it's even clearer than the above.

kube-hetzner / terraform-hcloud-kube-hetzner

[Bug]: error executing "/tmp/terraform_1937172584.sh": Process exited with status 124 when creating the cluster #904

Description

Kube.tf file

Screenshots

Platform

1.

2.

3.

4.

5.

6.

7.

Oftentimes, you need to `communicate to the cluster from inside the cluster itself`, in which case it is important to set this value, as it will configure the hostname

at the load balancer level, and will save you from many slows downs when initiating communications from inside. Later on, you can point your DNS to the IP given

to the LB. And if you have other services pointing to it, you are also free to create CNAMES to point to it, or whatever you see fit.

If set, it will apply to either ingress controllers, Traefik or Ingress-Nginx.

lb_hostname = ""