Unofficial Terraform module to build a viable dual-stack Kubernetes cluster in Hetzner Cloud.
Creates a Kubernetes cluster on the Hetzner cloud, with the following features:
etcd
)Service
s get a private (ULA) IPv6 addressLoadBalancer
services provision Hetzner load balancers and deleted nodes are cleaned up.Configure the Hetzner Cloud provider according to the documentation and provide a Hetzner Cloud SSH key resource to access the cluster machines:
resource "hcloud_ssh_key" "key" {
name = "key"
public_key = file("~/.ssh/id_rsa.pub")
}
Create a simple Kubernetes cluster:
module "cluster" {
source = "tibordp/dualstack-k8s/hcloud"
version = "2.3.0"
name = "k8s"
hcloud_ssh_key = hcloud_ssh_key.key.id
hcloud_token = var.hetzner_token
location = "hel1"
}
module "worker_nodes" {
source = "tibordp/dualstack-k8s/hcloud//modules/worker-node"
version = "2.3.0"
cluster = module.cluster
count = 2
name = "k8s-worker-${count.index}"
hcloud_ssh_key = hcloud_ssh_key.key.id
location = "hel1"
}
output "kubeconfig" {
value = module.cluster.kubeconfig
sensitive = true
}
When the cluster is deployed, the kubeconfig
to reach the cluster is available from the output. There are many ways to continue, but you can store it to file:
terraform output -raw kubeconfig > kubeconfig.conf
and check the access by viewing the created cluster nodes:
$ kubectl get nodes --kubeconfig=kubeconfig.conf
NAME STATUS ROLES AGE VERSION
k8s-control-plane-0 Ready control-plane 31m v1.31.1
k8s-worker-0 Ready <none> 31m v1.31.1
k8s-worker-1 Ready <none> 31m v1.31.1
The module should work on most major RPM and DEB distros. It been tested on these base images:
ubuntu-24.04
)debian-12
)centos-stream-9
)rocky-9
)fedora-40
)Others may work as well, but have not been tested.
This module can create a highly available control plane with multiple control plane nodes. There are two options available:
control_plane_endpoint
will be used as a API server endpoint and it is up to you to make sure request are routed to the control plane nodes (see example)It is recommended to set up control_plane_endpoint
(e.g. a DNS record) even if a single control plane node is used, as doing so will allow for additional control plane nodes to be added later. If this is not done, the
cluster will have to be manually reconfigured (e.g like this) to use the new endpoint when new control plane nodes are added.
A first step before removing a control plane node is to remove its membership in the etcd
cluster. Read this section carefully before removing control plane nodes! If etcd membership is not removed from the prior to the node being shutdown, the whole cluster can potentially become inoperable. If the control plane node that is being removed is still functional, the easiest way to remove is by invoking the following command on the node:
kubeadm reset --force
If the node is already defunct, there are two cases to consider:
etcd cluster still has quorum (i.e. N/2+1 nodes are still functional), the membership of the defunct member can be manually removed with etcdctl
, e.g.:
$ kubectl exec -n kube-system etcd-surviving-control-plane-node -- etcdctl \
--endpoints=https://[::1]:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key member list
2a51630843ac2da6, started, defunct-control-plane-node, https://[2a01:db8:2::1]:2380, https://[2a01:db8:2::1]:2379, false
7f196e4d62a04497, started, surviving-control-plane-node, https://[2a01:db8:1::1]:2380, https://[2a01:db8:1::1]:2379, false
$ kubectl exec -n kube-system etcd-surviving-control-plane-node -- etcdctl \
--endpoints=https://[::1]:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key member remove 2a51630843ac2da6
Member 2a51630843ac2da6 removed from cluster 46b13f81dcebb93d
It is important to remove failed members from etcd even if quorum is still present as new control plane nodes will not be able to join until etcd cluster is healthy.
etcd cluster no longer has quorum, e.g. a single control plane node is gone out of a 2-node cluster. In this case the etcd cluster will need to be rebuilt from snapshot, following the steps for disaster recovery. Data loss may have occured.
You may also need to manually remove the Node object, as the Hetzner Cloud Controller that is responsible for deleting defunct nodes may have been running on this very node (should not be an issue if kubectl drain
was done first)
kubectl delete node <node name>
First control plane node is special in that it is used by the provisioning process (e.g. to get the bootstrap tokens for other nodes). If the first node is deleted, another server must be specified, otherwise provisioning operations will fail.
module "k8s" {
source = "tibordp/dualstack-k8s/hcloud"
version = "2.3.0"
...
kubeadm_host = "<ip address of another control plane node>"
}
Afterwards, the node can be replaced as usual, e.g.
terraform taint module.k8s.module.control_plane_nodes[0].hcloud_server.instance
terraform apply
TLS certificate credentials form the output can be used to chain other Terraform modules, such as the Kubernetes provider:
provider "kubernetes" {
host = module.k8s.apiserver_url
# For a single controlplane node cluster, this will be an IPv6 URL. For IPv4, this can
# also be used
# host = "https://${module.k8s.control_plane_nodes[0].ipv4_address}:6443"
client_certificate = module.k8s.client_certificate_data
client_key = module.k8s.client_key_data
cluster_ca_certificate = module.k8s.certificate_authority_data
}
Once control plane is set up, module has an output called join_user_data
that contains a cloud-init script that
can be used to join additional worker nodes outside of Terraform (e.g. for use with cluster autoscaler).
The generated join configuration will be valid for 10 years, after which the bootstrap token will need to be regenerated (but you should probably rebuild the cluster with something better by then).
See example for how it can be used to manage worker separately from this module.
This module can be configured to use Hetzner Cloud private networks by specifying use_hcloud_network
, hcloud_network_id
and hcloud_subnet_id
variables. In this case native routing will be used for IPv4 traffic and Wigglenet overlay will only be used for IPv6 traffic (Hetnzer private networks are IPv4-only). Note that Hetzner private networks are not encrypted, just segregated.
See example for more details.
Read these notes carefully before using this module in production.
NetworkPolicy
is not supported.load-balancer.hetzner.cloud/hostname: <hostname>
must be set on all LoadBalancer
services, otherwise healthchecks will fail and the service will not be accessible from outsie the cluster (see this issue for more details)In addition some caveats for dual-stack clusters in general:
Services
are single-stack by default. Since IPv6 is the primary IP family of the clusters created with this modules, this means the ClusterIP
will be IPv6 only, leading to issues for workloads that only bind on IPv4. Pass ipFamilyPolicy: PreferDualStack
when creating services to assign both IPv4 and IPv6 ClusterIPs. You can use the prefer-dual-stack-webhook admission controller to change the default to PreferDualStack
for all newly creted services that don't specify IP family policy.kubernetes.default.svc.cluster.local
) has to be single-stack, as --apiserver-advertise-address
does not support dual-stack yet. The default address family for the cluster can be selected with primary_ip_family
variable (defaults to ipv6
).Some parts, including this README, adapted from JWDobken/terraform-hcloud-kubernetes by Joost Döbken.