[Bug]: DNS issue on egress node

zypriafl commented 1 year ago

Description

I try to use egress with cilium. However the egress node seems to have a DNS issue and therefore is not able to pull the cilium pod (see screenshots).

This might be the same as discussed here: https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/discussions/830. Unfortunately, this discussion doesn't make it clear how to resolve this.

Thank you for your help.

Kube.tf file

locals {
  # First and foremost set your Hetzner API token here. It can be found in your Project > Security > API Token (Read & Write is required).  
  hcloud_token = "********"
  cluster_ipv4_cidr = "172.17.0.0/16"
}

module "kube-hetzner" {
  providers = {
    hcloud = hcloud
  }
  version = "v2.9.3"
  hcloud_token = local.hcloud_token
  source = "kube-hetzner/kube-hetzner/hcloud"
  ssh_public_key = file("****************")
  ssh_private_key = file("****************")
  network_region = "eu-central"
  control_plane_nodepools = [
    {
      name        = "control-plane-fsn1",
      server_type = "cx21",
      location    = "fsn1",
      labels      = [],
      taints      = [],
      count       = 2
    }
  ]
  agent_nodepools = [
    {
      name        = "agent-middle",
      server_type = "cx31",
      location    = "fsn1",
      labels      = [],
      taints      = [],
      count       = 0
    },
    {
      name        = "agent-large",
      server_type = "cx51",
      location    = "fsn1",
      labels      = [],
      taints      = [],
      count       = 0
    },    
    {
      name        = "agent-middle-large",
      server_type = "cx41",
      location    = "fsn1",
      labels      = [],
      taints      = [],
      count       = 0
    },        
    {
      name        = "egress-pool",
      server_type = "cx21",
      location    = "fsn1",
      labels = [
        "node.kubernetes.io/role=egress"
      ],
      taints = [
        "node.kubernetes.io/role=egress:NoSchedule"
      ],
      floating_ip = true
      count = 1
      },
  ]
  extra_firewall_rules = [
    {
      description     = "For outgoing."
      direction       = "out"
      protocol        = "tcp"
      port            = "any"
      source_ips      = []
      destination_ips = ["0.0.0.0/0", "::/0"]
    },
    {
      description     = "For metrics on 30007."
      direction       = "in"
      protocol        = "tcp"
      port            = "30007"
      source_ips      = ["0.0.0.0/0", "::/0"]
      destination_ips = []
    },
  ]
  network_ipv4_cidr = "172.16.0.0/12"
  cluster_ipv4_cidr = local.cluster_ipv4_cidr
  disable_hetzner_csi = true
  automatically_upgrade_k3s = false
  automatically_upgrade_os = false
  cni_plugin = "cilium"
  cilium_egress_gateway_enabled = true
  enable_cert_manager= "true"
  cluster_name = "app-scaler"
  allow_scheduling_on_control_plane = false
  autoscaler_nodepools = [
    {
      name        = "auto-scaler"
      server_type = "cx41"
      location    = "fsn1"
      min_nodes   = 1
      max_nodes   = 10
    }
  ]
}

provider "hcloud" {
  token = local.hcloud_token
}

terraform {
  required_version = ">= 1.5.0"
  required_providers {
    hcloud = {
      source  = "hetznercloud/hcloud"
      version = ">= 1.44.1"
    }   
  }
}

output "kubeconfig" {
  value     = module.kube-hetzner.kubeconfig
  sensitive = true
}

Screenshots

Platform

Linux

mysticaltech commented 1 year ago

@zypriafl Please see the debug section in the readme and do some tests at the node and cluster level, also look at the cilium logs, see the cilium example in the examples section also in the readme.

@M4t7e Any ideas on this DNS failure for an Egress node?

zypriafl commented 1 year ago

Could it be some kind of deadlock. "NetworkPluginNotReady: cni plugin not initialized" because cilium container cannot be pulled because then again Network is not ready?

mysticaltech commented 1 year ago

@zypriafl You are affected by the infamous Iranian IPs blocks (at least percieved as such) that are blocked from pulling from GCR and every service that depend on that. So two solutions:

Ether use k3s_registries to proxy all container pull requests to another unblocked registry, or and probably the easiest solution.

Cordon the node, and brutally delete it with hcloud server delete xxx, that will free up its IP, then with hcloud also, or the UI register a floating IP (just temporarily), it will assign the liberated IP. They terraform apply again to deploy the missing node with a new IP, then release the reserved IP that is not blocked from GCR.

The latter trick is a 5 minutes op and should work! Good luck.

M4t7e commented 1 year ago

It looks like the node itself has no working DNS...

Regarding #830: This is not the valid anymore. Now Cilium CNI is not limited to a single interface and it detects the needed interfaces automatically. This is mandatory for BPF based NAT scenarios.

@zypriafl Can you execute the following commands in bash and provide me the output please? Just copy & paste all of it and hit enter.

set -o xtrace
ip a show eth0
ip a show eth1
nmcli device show eth0
nmcli device show eth1
ip route
ip route get 1.1.1.1
ping -c 5 1.1.1.1
cat /etc/resolv.conf
cat /etc/NetworkManager/conf.d/dns.conf 
dig google.com @185.12.64.1 +short
dig google.com @185.12.64.2 +short
dig google.com @2a01:4ff:ff00::add:1 +short
dig google.com @2a01:4ff:ff00::add:2 +short
dig google.com @1.1.1.1 +short
dig google.com @8.8.8.8 +short
dig google.com @9.9.9.9 +short
curl -v google.com
set +o xtrace

zypriafl commented 1 year ago

thanks @M4t7e, here are the outputs:

app-scaler-egress-pool-nfe:~ # set -o xtrace
+ set -o xtrace
app-scaler-egress-pool-nfe:~ # ip a show eth0
+ ip --color=auto a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 96:00:02:b7:b2:4f brd ff:ff:ff:ff:ff:ff
    altname enp0s3
    altname ens3
    inet 5.75.209.253/32 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet 91.107.195.41/32 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::352f:e891:5704:18f9/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
app-scaler-egress-pool-nfe:~ # ip a show eth1
+ ip --color=auto a show eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 86:00:00:67:45:19 brd ff:ff:ff:ff:ff:ff
    altname enp0s10
    altname ens10
    inet 172.16.48.101/32 scope global dynamic noprefixroute eth1
       valid_lft 86233sec preferred_lft 86233sec
    inet6 fe80::b3a2:e260:cd40:5d04/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
app-scaler-egress-pool-nfe:~ # nmcli device show eth0
+ nmcli device show eth0
GENERAL.DEVICE:                         eth0
GENERAL.TYPE:                           ethernet
GENERAL.HWADDR:                         96:00:02:B7:B2:4F
GENERAL.MTU:                            1500
GENERAL.STATE:                          100 (connected)
GENERAL.CONNECTION:                     eth0
GENERAL.CON-PATH:                       /org/freedesktop/NetworkManager/ActiveConnection/4
WIRED-PROPERTIES.CARRIER:               on
IP4.ADDRESS[1]:                         91.107.195.41/32
IP4.ADDRESS[2]:                         5.75.209.253/32
IP4.GATEWAY:                            172.31.1.1
IP4.ROUTE[1]:                           dst = 172.31.1.1/32, nh = 0.0.0.0, mt = 20100
IP4.ROUTE[2]:                           dst = 0.0.0.0/0, nh = 172.31.1.1, mt = 20100
IP6.ADDRESS[1]:                         fe80::352f:e891:5704:18f9/64
IP6.GATEWAY:                            --
IP6.ROUTE[1]:                           dst = fe80::/64, nh = ::, mt = 1024
app-scaler-egress-pool-nfe:~ # nmcli device show eth1
GENERAL.DEVICE:                         eth1
GENERAL.TYPE:                           ethernet
GENERAL.HWADDR:                         86:00:00:67:45:19
GENERAL.MTU:                            1450
GENERAL.STATE:                          100 (connected)
GENERAL.CONNECTION:                     eth1
GENERAL.CON-PATH:                       /org/freedesktop/NetworkManager/ActiveConnection/3
WIRED-PROPERTIES.CARRIER:               on
IP4.ADDRESS[1]:                         172.16.48.101/32
IP4.GATEWAY:                            172.16.0.1
IP4.ROUTE[1]:                           dst = 0.0.0.0/0, nh = 172.16.0.1, mt = 20101
IP4.ROUTE[2]:                           dst = 172.16.0.0/12, nh = 172.16.0.1, mt = 101
IP4.ROUTE[3]:                           dst = 172.16.0.1/32, nh = 0.0.0.0, mt = 101
IP6.ADDRESS[1]:                         fe80::b3a2:e260:cd40:5d04/64
IP6.GATEWAY:                            --
IP6.ROUTE[1]:                           dst = fe80::/64, nh = ::, mt = 1024
app-scaler-egress-pool-nfe:~ # 

app-scaler-egress-pool-nfe:~ # ip route
+ ip --color=auto route
default via 172.31.1.1 dev eth0 proto static metric 20100 
default via 172.16.0.1 dev eth1 proto dhcp src 172.16.48.101 metric 20101 
172.16.0.0/12 via 172.16.0.1 dev eth1 proto dhcp src 172.16.48.101 metric 101 
172.16.0.1 dev eth1 proto dhcp scope link src 172.16.48.101 metric 101 
172.31.1.1 dev eth0 proto static scope link metric 20100 
app-scaler-egress-pool-nfe:~ # ip route get 1.1.1.1
+ ip --color=auto route get 1.1.1.1
1.1.1.1 via 172.31.1.1 dev eth0 src 5.75.209.253 uid 0 
    cache 
app-scaler-egress-pool-nfe:~ # ping -c 5 1.1.1.1
+ ping -c 5 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=55 time=6.61 ms

64 bytes from 1.1.1.1: icmp_seq=2 ttl=55 time=6.45 ms
64 bytes from 1.1.1.1: icmp_seq=3 ttl=55 time=5.64 ms
64 bytes from 1.1.1.1: icmp_seq=4 ttl=55 time=5.73 ms
64 bytes from 1.1.1.1: icmp_seq=5 ttl=55 time=5.84 ms

--- 1.1.1.1 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4007ms
rtt min/avg/max/mdev = 5.638/6.055/6.608/0.396 ms
app-scaler-egress-pool-nfe:~ # cat /etc/resolv.conf
+ cat /etc/resolv.conf
# Generated by NetworkManager
app-scaler-egress-pool-nfe:~ # cat /etc/NetworkManager/conf.d/dns.conf 
+ cat /etc/NetworkManager/conf.d/dns.conf
cat: /etc/NetworkManager/conf.d/dns.conf: No such file or directory
app-scaler-egress-pool-nfe:~ # dig google.com @185.12.64.1 +short
+ dig google.com @185.12.64.1 +short
216.58.212.142
app-scaler-egress-pool-nfe:~ # dig google.com @185.12.64.2 +short
+ dig google.com @185.12.64.2 +short
216.58.206.46
app-scaler-egress-pool-nfe:~ # dig google.com @2a01:4ff:ff00::add:1 +short
+ dig google.com @2a01:4ff:ff00::add:1 +short
;; UDP setup with 2a01:4ff:ff00::add:1#53(2a01:4ff:ff00::add:1) for google.com failed: network unreachable.
;; UDP setup with 2a01:4ff:ff00::add:1#53(2a01:4ff:ff00::add:1) for google.com failed: network unreachable.
;; UDP setup with 2a01:4ff:ff00::add:1#53(2a01:4ff:ff00::add:1) for google.com failed: network unreachable.
app-scaler-egress-pool-nfe:~ # dig google.com @2a01:4ff:ff00::add:2 +short
+ dig google.com @2a01:4ff:ff00::add:2 +short
;; UDP setup with 2a01:4ff:ff00::add:2#53(2a01:4ff:ff00::add:2) for google.com failed: network unreachable.
;; UDP setup with 2a01:4ff:ff00::add:2#53(2a01:4ff:ff00::add:2) for google.com failed: network unreachable.
;; UDP setup with 2a01:4ff:ff00::add:2#53(2a01:4ff:ff00::add:2) for google.com failed: network unreachable.
app-scaler-egress-pool-nfe:~ # dig google.com @1.1.1.1 +short
+ dig google.com @1.1.1.1 +short
142.250.185.110
app-scaler-egress-pool-nfe:~ # dig google.com @8.8.8.8 +short
+ dig google.com @8.8.8.8 +short
216.58.206.46
app-scaler-egress-pool-nfe:~ # dig google.com @9.9.9.9 +short
+ dig google.com @9.9.9.9 +short
142.250.186.142
app-scaler-egress-pool-nfe:~ # curl -v google.com
+ curl -v google.com
* Could not resolve host: google.com
* Closing connection 0
curl: (6) Could not resolve host: google.com
app-scaler-egress-pool-nfe:~ # set +o xtrace
+ set +o xtrace
app-scaler-egress-pool-nfe:~ #

zypriafl commented 1 year ago

@mysticaltech the issue persists even with another IP.

M4t7e commented 1 year ago

Thx for the output @zypriafl

That's strange, you have no DNS Servers configured as far as I can see. Was that a fresh server installation or is there some history for that node?

Please reboot the node and try to reach google.com again. If that does not help, please destroy and redeploy the node again or try to configure the DNS servers explicitly in kube.tf with either dns_servers = ["1.1.1.1", "8.8.8.8", "9.9.9.10"] (CloudFlare, Google & Quad9) or dns_servers = ["185.12.64.1", "185.12.64.2"] (Hetzner DNS).

Btw, it looks like your IPv6 configuration is not okay. You have no public routable IPv6 addresses on your egress node. Nowadays operating systems usually prefer IPv6 over IPv4. But this only explains why you can't resolve DNS records via IPv6.

zypriafl commented 1 year ago

Thank you. The node was fresh created. Setting the DNS server explictly in kube.tf fixed it and the cilium image can now be pulled: dns_servers = ["1.1.1.1", "8.8.8.8", "9.9.9.10"]

For me the issue is resolved. However, maybe there is still something to be fixed to make it work without settings dns_servers explicitly....

M4t7e commented 1 year ago

@mysticaltech maybe we have here an unfortunate combination... My theory:

If a floating IP is used, ipv4.method manual disables DHCP for IPv4 entirely and therefore also obtaining IPv4 DNS servers: https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/blob/9bf9edc1a579c097b10954ddab2d5a5aea7067e4/agents.tf#L210-L215

If IPv6 is enabled, it can solve the problem, because you still have the possibility to use DNS via IPv6. But here IPv6 was not working either. This needs to be tested, but so far this is the only explanation I have.

Solution can be explicit pre-configuration of DNS servers in variables.tf with Hetzner Resolvers:

185.12.64.1
185.12.64.2
2a01:4ff:ff00::add:1
2a01:4ff:ff00::add:2

Then we have full redundancy without depending on NetworkManager + DHCP

mysticaltech commented 1 year ago

@M4t7e It makes sense, let's add those default DNS resolvers, will do tomorrow but don't hesitate to PR 🙏

mysticaltech commented 12 months ago

@zypriafl It should be fixed by default in v2.10.0. I followed @M4t7e's fix suggestion.

kube-hetzner / terraform-hcloud-kube-hetzner