kube-hetzner / terraform-hcloud-kube-hetzner

Optimized and Maintenance-free Kubernetes on Hetzner Cloud in one command!
MIT License
2.16k stars 342 forks source link

[Bug]: autoscaled nodes have more than 3 DNS entries #1422

Open tobiasehlert opened 1 month ago

tobiasehlert commented 1 month ago

Description

This is the error returned by multiple pods on my autoscaler nodes:

Message: Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 185.12.64.2 185.12.64.1 2a01:4ff:ff00::add:2
Reason: DNSConfigForming
Source: kubelet k3s-01-autoscaler-cax21-nbg1-302f5cae41733fd6
Type: Warning

Apparently k8s only support 3 entries for DNS servers (source #689).

When looking on both worker and autoscaler nodes, here is how the /etc/resolv.conf looks on two servers for comparison.

k3s-01-worker-cax21-nbg1-cet:/ # cat /etc/resolv.conf 
nameserver 185.12.64.1
nameserver 185.12.64.2
nameserver 2a01:4ff:ff00::add:1
k3s-01-autoscaler-cax21-nbg1-302f5cae41733fd6:/ # cat /etc/resolv.conf 
# Generated by NetworkManager
nameserver 185.12.64.2
nameserver 185.12.64.1
nameserver 2a01:4ff:ff00::add:2
# NOTE: the libc resolver may not support more than 3 nameservers.
# The nameservers listed below may not be recognized.
nameserver 2a01:4ff:ff00::add:1

So my initial though is that that there is something missing when booting a autoscaler node that gets bootstraped with cloudinit_config, but what I haven't figured out yet.

Kube.tf file

module "kube-hetzner" {
  source  = "kube-hetzner/kube-hetzner/hcloud"
  version = "2.14.1"

  // provider and hcloud token config
  providers = {
    hcloud = hcloud
  }
  hcloud_token = var.hcloud_token

  // ssh key parameters
  ssh_public_key    = hcloud_ssh_key.tibiadata_ssh_key["tobias_ed25519"].public_key
  ssh_private_key   = null
  hcloud_ssh_key_id = hcloud_ssh_key.tibiadata_ssh_key["tobias_ed25519"].id

  // network parameters
  existing_network_id = [hcloud_network.net.id]
  network_ipv4_cidr   = module.net_k3s.base_cidr_block
  cluster_ipv4_cidr   = module.net_k3s.network_cidr_blocks.cluster
  service_ipv4_cidr   = module.net_k3s.network_cidr_blocks.service
  cluster_dns_ipv4    = cidrhost(module.net_k3s.network_cidr_blocks.service, 10)

  // control plane nodepools
  control_plane_nodepools = [
    for location in ["fsn1", "hel1", "nbg1", ] : {
      name        = "control-plane-${location}",
      server_type = "cax11",
      location    = location,
      labels      = [],
      taints      = [],
      count       = 1
    }
  ]

  agent_nodepools = concat(
    # egress nodepool
    [for location in [for dc in data.hcloud_datacenter.ds : dc.location.name] : {
      // [for location in ["fsn1", "hel1", "nbg1"] : {
      name        = "egress-cax11-${location}",
      server_type = "cax11",
      location    = location,
      labels = [
        "node.kubernetes.io/role=egress"
      ],
      taints = [
        "node.kubernetes.io/role=egress:NoSchedule"
      ],
      floating_ip = true
      count       = 1
    }],

    # worker nodepools (dynamically created)
    [for location in [for dc in data.hcloud_datacenter.ds : dc.location.name] : {
      // [for location in ["fsn1", "hel1", "nbg1"] : {
      name        = "worker-cax21-${location}",
      server_type = "cax21",
      location    = location,
      labels      = [],
      taints      = [],
      count       = 2
    }]
  )

  autoscaler_nodepools = concat(
    [for location in [for dc in data.hcloud_datacenter.ds : dc.location.name] : {
      // [for location in ["fsn1", "hel1", "nbg1"] : {
      name        = "autoscaler-cax21-${location}",
      server_type = "cax21",
      location    = location,
      min_nodes   = 1,
      max_nodes   = 2,
      labels = {
        "node.kubernetes.io/role" : "autoscaler",
      },
      taints = [{
        key : "node.kubernetes.io/role",
        value : "autoscaler",
        effect : "NoSchedule",
      }],
    }]
  )

  # firewall whitelisting (for Kube API and SSH)
  firewall_kube_api_source = [for ip in tolist(var.firewall_whitelisting.kube) : "${ip}/32"]
  firewall_ssh_source      = [for ip in tolist(var.firewall_whitelisting.ssh) : "${ip}/32"]

  # cluster generic
  cluster_name        = "k3s-01"
  additional_tls_sans = ["k3s-01.${var.fqdn_domain}"]
  base_domain         = "k3s-01.${var.fqdn_domain}"
  cni_plugin          = "cilium"
  disable_kube_proxy  = true # kube-proxy is replaced by cilium (set in cilium_values)

  # cilium parameters
  cilium_version = "v1.15.1"
  cilium_values  = <<EOT
ipam:
  mode: kubernetes
k8s:
  requireIPv4PodCIDR: true
kubeProxyReplacement: true
kubeProxyReplacementHealthzBindAddr: "0.0.0.0:10256"
k8sServiceHost: "127.0.0.1"
k8sServicePort: "6444"
routingMode: "native"
ipv4NativeRoutingCIDR: "${module.net_k3s.network_cidr_blocks.cluster}"
installNoConntrackIptablesRules: true
endpointRoutes:
  enabled: true
loadBalancer:
  acceleration: native
bpf:
  masquerade: true
encryption:
  enabled: true
  nodeEncryption: true
  type: wireguard
egressGateway:
  enabled: true
MTU: 1450
  EOT

  # Hetzner delete protection
  enable_delete_protection = {
    floating_ip = true
  }

  # various parameters
  ingress_controller   = "none"
  enable_cert_manager  = false
  block_icmp_ping_in   = true
  create_kubeconfig    = false
  create_kustomization = false
}

Screenshots

No response

Platform

Linux

mysticaltech commented 1 month ago

@tobiasehlert Thanks for reporting this. I had no idea on the number of nameservers limitation.

Here are steps to fix it, I will try to address this later, but if you want to shoot a PR, don't hesitate please.


Certainly. Let's recap the solution to your DNS configuration issue with kube-hetzner:

  1. Problem: Your autoscaler nodes have 4 nameserver entries (2 IPv4 + 2 IPv6) in their /etc/resolv.conf, which exceeds Kubernetes' limit of 3 total nameservers.

  2. Solution: Modify the DNS configuration for autoscaler nodes to include only 3 nameserver entries total, combining both IPv4 and IPv6 addresses.

  3. Steps to implement:

    a. Review your kube-hetzner configuration, focusing on autoscaler node provisioning.

    b. Locate the part of the configuration (likely in cloudinit_config or user-data scripts) that sets up DNS for autoscaler nodes.

    c. Modify this configuration to limit nameservers to 3 entries. For example:

      nameserver 185.12.64.2
      nameserver 185.12.64.1
      nameserver 2a01:4ff:ff00::add:2

    d. Ensure this modified configuration is applied when new autoscaler nodes are created.

    e. After making changes, you may need to recreate existing autoscaler nodes to apply the new configuration.

  4. Expected outcome: Once implemented, your autoscaler nodes should have the same number of nameserver entries as your worker nodes, resolving the "Nameserver limits were exceeded" error.

  5. Verification: After applying changes, check the /etc/resolv.conf on new or recreated autoscaler nodes to confirm it contains only 3 nameserver entries.

Remember, the key is to ensure that your DNS configuration provides adequate resolution for both IPv4 and IPv6 as needed by your cluster, while staying within the 3-nameserver limit imposed by Kubernetes.

tobiasehlert commented 1 month ago

hi @mysticaltech,

I've had some progress on the cloud-init stuff, but want to check with you on some things..

It seems like it takes a while and then the autoscaler node actually gets updated and has other DNS settings than right after start. I guess that the K8s stuff start before the networking things are fully settled yet, but I'm not sure on that part.

Anyhow, it looks to me that there is an edit of the /etc/resolv.conf file through _cloudinit_write_filescommon (in addition to modifying /etc/NetworkManager/conf.d/dns.conf) with this:

write_files:
- content: |
    nameserver 185.12.64.1
    nameserver 185.12.64.2
    nameserver 2a01:4ff:ff00::add:1
  path: /etc/resolv.conf
  permissions: '0644'

I read some in the documentation of cloud-init and even there are some ways forward..

  1. use the resolv-conf module (link to docs) Comment: maybe not the best, since many distros have moved from manual editing of /etc/resolv.conf, even though it's exactly what we do today.
  2. use the networking-conf option through cloud-init (link to docs) Comment: makes maybe a little more sense, but that would maybe also include configuration of all network-related things like usage of floating ips (which we slightly different today) so that is probably a larger rewrite I guess.

If using option 1, it would look something like this:

manage_resolv_conf: true
resolv_conf:
  nameservers:
    - 185.12.64.1
    - 185.12.64.2
    - 2a01:4ff:ff00::add:1

What would you prefer? I've changed some things and you can take a look at it and see what you think.. link to my fork: https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/compare/master...tobiasehlert:terraform-hcloud-kube-hetzner:fix-dns-configuration-with-cloud-init