k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
27.99k stars 2.34k forks source link

Strange DNS behavior in some containers including cert-manager #5045

Closed macklenc closed 2 years ago

macklenc commented 2 years ago

Environmental Info: K3s Version: v1.22.5+k3s2 (for compatibility with rancher)

Node(s) CPU architecture, OS, and Version:

All nodes:

Cluster Configuration:

3 control plane nodes with integrated etcd, 2 worker nodes. Feel free to look at the deployment scripts: https://gitlab.com/macklenc/ha-k3s-ansible-deployment (yes, the secrets are placeholders).

Describe the bug:

In some containers, e.g. cert-manager and ubuntu, the DNS seems to append my home networks domain name to every request (my net name is home.net). E.g. running an nslookup command against google will show that the address that was actually looked up was google.com.home.net, which works fine for http, but messes with SSL when trying to use e.g. Lets Encrypt. I realize from the linked issues below that there are some workarounds by either removing (in my case) home.net from /etc/resolv.conf inside the running container, or by changing ndots:5 back to the default of 1 in the same file. Or even appending a . to the domain name to force resolution as a FQDN. But all of these solutions are a bit hacky. Seems to me that deploying a fresh K3s cluster and deploying a container to e.g. curl an https domain should work out of the box.

Now strangely, using a busybox container doesn't seem to reproduce the issue even though the resolv.conf file in the container is identical to the problematic containers. See the reproduction section for examples.

Seemingly related issues.

Steps To Reproduce:

As an FYI, I'm running pfSense for my router. I did try resetting it to factory defaults which didn't seem to help the issue.

Launching a network debugging test image based on Debian:

kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot -- /bin/bash
> nslookup google.com

will return:

Server:         10.43.0.10
Address:        10.43.0.10#53

Name:   google.com.home.net
Address: 163.237.192.146

however, using the busybox image as recommended from the guide, the DNS resolves just fine:

kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup google.com

will return:

Server:    10.43.0.10
Address 1: 10.43.0.10 kube-dns.kube-system.svc.cluster.local

Name:      google.com
Address 1: 2607:f8b0:400f:803::2004 den08s06-in-x04.1e100.net
Address 2: 142.250.72.68 den16s09-in-f4.1e100.net

Expected behavior:

pings/curls/nslookups for a domain e.g. google.com, should return results for google.com.

Actual behavior:

pings/curls/nslookups for a domain e.g. google.com is returning google.com.home.net

Additional context / logs:

Host OS resolv.conf (maintained by systemd-resolvd):

nameserver 127.0.0.53
options edns0 trust-ad
search home.net

Control plane systemd service:

[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
After=network-online.target

[Service]
Type=notify
Environment=K3S_TOKEN=hXx9S1A

ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s server --data-dir /var/lib/rancher/k3s --cluster-init --node-taint CriticalAddonsOnly=true:NoExecute --tls-san 10.0.1.1 --disable servicelb --disable traefik 
KillMode=process
Delegate=yes
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target

Agent systemd service:

[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
After=network-online.target

[Service]
Type=notify
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s agent --server https://10.0.1.1:6443 --token K10b9f1d540b53277cbebbb8695322e4a65a5fcca92c697f4c96fd1e0a67c7c5c5b::server:hXx9S1A   
KillMode=process
Delegate=yes
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target

Backporting

macklenc commented 2 years ago

Just for some context, I'm still pretty new to K3s/Kubernetes so feel free to let me know if I left out an important config or details.

manuelbuil commented 2 years ago

Could you check in what node is your coredns pod running? Then verify if nslookup works only on pods that are running in that same node

macklenc commented 2 years ago

Sure thing. I had to get a little fancy since coredns was running on a control plane node. I created the following manifest:

---
apiVersion: v1
kind: Pod
metadata:
  name: netshoot
spec:
  restartPolicy: OnFailure
  nodeSelector:
    kubernetes.io/hostname: queen0
  tolerations:
    - key: CriticalAddonsOnly
      operator: Equal
      value: "true"
      effect: NoExecute
  containers:
    - name: netshoot
      image: nicolaka/netshoot
      imagePullPolicy: IfNotPresent
      command: ["nslookup"]
      args: ["google.com"]

Which resulted in the same .home.net getting appended:

~ 
❯ k -n kube-system get pods coredns-5cdc799f68-vfkd8 -o wide
NAME                       READY   STATUS    RESTARTS   AGE   IP           NODE     NOMINATED NODE   READINESS GATES
coredns-5cdc799f68-vfkd8   1/1     Running   0          21h   10.42.0.12   queen0   <none>           <none>

~ 
❯ k get pods -o wide                                        
NAME       READY   STATUS             RESTARTS      AGE   IP           NODE     NOMINATED NODE   READINESS GATES
netshoot   0/1     Completed          2 (18s ago)   34s   10.42.0.20   queen0   <none>           <none>

~ 
❯ k logs netshoot
Server:         10.43.0.10
Address:        10.43.0.10#53

Name:   google.com.home.net
Address: 163.237.192.146

I don't know if it's relevent, but this showed up in the events:

  Type     Reason       Age              From               Message
  ----     ------       ----             ----               -------
Warning  FailedMount  1s (x3 over 3s)  kubelet            MountVolume.SetUp failed for volume "kube-api-access-wvxn9" : object "default"/"kube-root-ca.crt" not registered

Also, it may be worth noting that the IP resolved is for some website out on the net called home.net instead of resolving to my router's external IP. I tried adding a firewall rule to intercept external DNS queries and re-routing them to my network's DNS server, but it still resolves to the internet IP instead of my routers IP.

manuelbuil commented 2 years ago

Sure thing. I had to get a little fancy since coredns was running on a control plane node. I created the following manifest:

---
apiVersion: v1
kind: Pod
metadata:
  name: netshoot
spec:
  restartPolicy: OnFailure
  nodeSelector:
    kubernetes.io/hostname: queen0
  tolerations:
    - key: CriticalAddonsOnly
      operator: Equal
      value: "true"
      effect: NoExecute
  containers:
    - name: netshoot
      image: nicolaka/netshoot
      imagePullPolicy: IfNotPresent
      command: ["nslookup"]
      args: ["google.com"]

Which resulted in the same .home.net getting appended:

~ 
❯ k -n kube-system get pods coredns-5cdc799f68-vfkd8 -o wide
NAME                       READY   STATUS    RESTARTS   AGE   IP           NODE     NOMINATED NODE   READINESS GATES
coredns-5cdc799f68-vfkd8   1/1     Running   0          21h   10.42.0.12   queen0   <none>           <none>

~ 
❯ k get pods -o wide                                        
NAME       READY   STATUS             RESTARTS      AGE   IP           NODE     NOMINATED NODE   READINESS GATES
netshoot   0/1     Completed          2 (18s ago)   34s   10.42.0.20   queen0   <none>           <none>

~ 
❯ k logs netshoot
Server:         10.43.0.10
Address:        10.43.0.10#53

Name:   google.com.home.net
Address: 163.237.192.146

I don't know if it's relevent, but this showed up in the events:

  Type     Reason       Age              From               Message
  ----     ------       ----             ----               -------
Warning  FailedMount  1s (x3 over 3s)  kubelet            MountVolume.SetUp failed for volume "kube-api-access-wvxn9" : object "default"/"kube-root-ca.crt" not registered

Also, it may be worth noting that the IP resolved is for some website out on the net called home.net instead of resolving to my router's external IP. I tried adding a firewall rule to intercept external DNS queries and re-routing them to my network's DNS server, but it still resolves to the internet IP instead of my routers IP.

Could you run cat /etc/resolv.conf in both the "working" pod and the non-working pod?

macklenc commented 2 years ago

Here's some of the containers:

~ 
❯ kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup google.com      
Server:    10.43.0.10
Address 1: 10.43.0.10 kube-dns.kube-system.svc.cluster.local

Name:      google.com
Address 1: 2607:f8b0:400f:807::200e den16s09-in-x0e.1e100.net
Address 2: 142.250.72.78 den16s09-in-f14.1e100.net
pod "busybox" deleted

~ 
❯ kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- cat /etc/resolv.conf     
search default.svc.cluster.local svc.cluster.local cluster.local home.net
nameserver 10.43.0.10
options ndots:5
pod "busybox" deleted

~ 
❯ kubectl run -it --rm --restart=Never netshoot --image=nicolaka/netshoot -- nslookup google.com
Server:         10.43.0.10
Address:        10.43.0.10#53

Non-authoritative answer:
Name:   google.com.home.net
Address: 163.237.192.146

pod "netshoot" deleted

~ 
❯ kubectl run -it --rm --restart=Never netshoot --image=nicolaka/netshoot -- cat /etc/resolv.conf 
search default.svc.cluster.local svc.cluster.local cluster.local home.net
nameserver 10.43.0.10
options ndots:5
pod "netshoot" deleted

The ubuntu image doesn't seem to have any networking tools, but the apt update returns the same incorrect IP is being resolved:

~ took 2s 
❯ kubectl run -it --rm --restart=Never ubuntu --image=ubuntu -- apt update             
Ign:1 http://security.ubuntu.com/ubuntu focal-security InRelease
Ign:2 http://archive.ubuntu.com/ubuntu focal InRelease
Err:3 http://security.ubuntu.com/ubuntu focal-security Release
  404  Not Found [IP: 163.237.192.146 80]
Ign:4 http://archive.ubuntu.com/ubuntu focal-updates InRelease
Ign:5 http://archive.ubuntu.com/ubuntu focal-backports InRelease
Err:6 http://archive.ubuntu.com/ubuntu focal Release
  404  Not Found [IP: 163.237.192.146 80]
Err:7 http://archive.ubuntu.com/ubuntu focal-updates Release
  404  Not Found [IP: 163.237.192.146 80]
Err:8 http://archive.ubuntu.com/ubuntu focal-backports Release
  404  Not Found [IP: 163.237.192.146 80]
Reading package lists... Done
E: The repository 'http://security.ubuntu.com/ubuntu focal-security Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: The repository 'http://archive.ubuntu.com/ubuntu focal Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: The repository 'http://archive.ubuntu.com/ubuntu focal-updates Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: The repository 'http://archive.ubuntu.com/ubuntu focal-backports Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
pod "ubuntu" deleted
pod default/ubuntu terminated (Error)

~ took 3s 
❯ kubectl run -it --rm --restart=Never ubuntu --image=ubuntu -- cat /etc/resolv.conf 
search default.svc.cluster.local svc.cluster.local cluster.local home.net
nameserver 10.43.0.10
options ndots:5
pod "ubuntu" deleted

EDIT: Looks like curlimages/curl also works and has the same resolv.conf. It is an alpine based image if that helps.

manuelbuil commented 2 years ago

Your search domains in /etc/resolv.conf are default.svc.cluster.local svc.cluster.local cluster.local home.net. Probably the home.net is your hostname.

When you lookup google.com, it will try google.com.default.svc.cluster.local, google.com.svc.cluster.local, google.com.cluster.local, google.com.home.net and google.com. The first one that comes with an answer is the chosen one. For strange reasons google.com.home.net exists but you are not seeing a wrong behaviour

manuelbuil commented 2 years ago

I recommend you to use the tool dig instead of nslookup. When using dig, if you don't set the flag +search, it will just try to resolve the provided url without trying to build FQDN based on your resolv.conf

macklenc commented 2 years ago

I'm not sure what you mean by home.net is my hostname. I double checked that the VMs running the cluster have different names, and the containers created have the same hostname as the name of the container when I created them. E.g. netshoot.

Out of curiosity if that is normal behavior, why do busybox and alpine images seem to work while the distroless coredns and debian-based images do not? You're correct that dig works as expected bypassing the resolv.conf, but that unfortunately doesn't solve my problem (since e.g. apt still uses resolv.conf). I tried adding a domain override to my router to point home.net to itself, as well as choosing a network name that I own, and both of them still tried to append the new network names even though they don't actually return.

E.g. overriding home.net in my DNS:

❯ kubectl run -it --rm --restart=Never netshoot --image=nicolaka/netshoot -- nslookup google.com
Server:         10.43.0.10
Address:        10.43.0.10#53

Non-authoritative answer:
*** Can't find google.com.home.net: No answer

pod "netshoot" deleted

And using a domain I own, but has no DNS records (A, CNAME, or otherwise):

❯ kubectl run -it --rm --restart=Never netshoot --image=nicolaka/netshoot -- nslookup google.com
Server:         10.43.0.10
Address:        10.43.0.10#53

Non-authoritative answer:
*** Can't find google.com.continuumtek.com: No answer

pod "netshoot" deleted

This problem doesn't seem to exist on any other host on my network, bare metal, VM, docker, etc. Just when running on my new k3s cluster.

manuelbuil commented 2 years ago

Sorry, I should not have used the word hostname, ignore that. Let me go through the basics (sorry if you already knew this).

The resolv.conf of your pods is injected by kubelet. You can get more information about it here: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/. By default, kubelet keeps the search domains that it finds in the node's /etc/resolv.conf and adds some extra related to the deployment (e.g. svc.cluster.local). Note that this is configurable as you can read in the previous link.

In your case, you must have search home.net in the resolv.conf file of your node. By looking at the resolv.conf files of your pods, it seems kubelet is working correctly and injecting the correct resolv.conf file. Therefore, in my opinion, k3s is working correctly.

Now, regarding google.com.home.net. I am not sure how each OS is implementing the read on resolv.conf. According to its manual: https://www.man7.org/linux/man-pages/man5/resolv.conf.5.html, it will always try the different search domains before testing google.com alone (i.e. an initial absolute query) because it thinks that it is not a complete FQDN (unless it had 5 dots). This is because of the kubelet's default option ndots5 that you can see in the resolv.conf of the pod. This link has very good information about it: https://pracucci.com/kubernetes-dns-resolution-ndots-options-and-why-it-may-affect-application-performances.html and a couple of potential solutions you can apply.

One thing that surprises me is that you get problems when running apt. I think it is just bad luck that google.com.home.net exists but I'd be surprised if you got also a "collision" with other urls. Could you show me an output of what you get when using apt?

In any case, due to what I explained, I don't think this is a problem of K3s or Kubernetes, in my opinion, what you are seeing is the expected behaviour

manuelbuil commented 2 years ago

I'm not sure what you mean by home.net is my hostname. I double checked that the VMs running the cluster have different names, and the containers created have the same hostname as the name of the container when I created them. E.g. netshoot.

Out of curiosity if that is normal behavior, why do busybox and alpine images seem to work while the distroless coredns and debian-based images do not? You're correct that dig works as expected bypassing the resolv.conf, but that unfortunately doesn't solve my problem (since e.g. apt still uses resolv.conf). I tried adding a domain override to my router to point home.net to itself, as well as choosing a network name that I own, and both of them still tried to append the new network names even though they don't actually return.

E.g. overriding home.net in my DNS:

❯ kubectl run -it --rm --restart=Never netshoot --image=nicolaka/netshoot -- nslookup google.com
Server:         10.43.0.10
Address:        10.43.0.10#53

Non-authoritative answer:
*** Can't find google.com.home.net: No answer

pod "netshoot" deleted

And using a domain I own, but has no DNS records (A, CNAME, or otherwise):

❯ kubectl run -it --rm --restart=Never netshoot --image=nicolaka/netshoot -- nslookup google.com
Server:         10.43.0.10
Address:        10.43.0.10#53

Non-authoritative answer:
*** Can't find google.com.continuumtek.com: No answer

pod "netshoot" deleted

This problem doesn't seem to exist on any other host on my network, bare metal, VM, docker, etc. Just when running on my new k3s cluster.

It is really weird that the image netshoot never tries to look up the "initial absolute query", i.e. google.com. If I understand correctly, according to https://www.man7.org/linux/man-pages/man5/resolv.conf.5.html, at some point it must try with google.com without adding any extra domain (what tries first, will depend on ndots config)

macklenc commented 2 years ago

I always appreciate setting a good foundation, so thanks for that! I had actually found that ndots article earlier which was super helpful. I was able to "recreate" what I'm seeing by setting ndots:5 on a bare-metal host. Do you know why 5 is the default with kubelet and if that can be changed? Seems no matter how hard I try to block DNS traffic for the internal network name I choose, it's still able to get through and find e.g. that home.net IP address.

I have experimented with setting ndots to 1 and removing the home.net search path, and both have fixed the problem, so it seems like ndots:5 isn't the best default from my point of view. Unfortunately removing the search path home.net from the host nodes does mess with the ability of the nodes to be able to pull down images to run. Not sure if I'm messing up a k3s config or what, but it's a pretty vanilla install. Maybe I'll give k8s a try with kubespray to see if I get the same behavior. I have also tried resetting my pfSense router back to factory defaults, which doesn't seem to fix the issue either.

I appreciate the help, I think I see how this isn't necessarily a k3s problem now but I'm at a total loss on how to fix this or where to seek help at this point.

manuelbuil commented 2 years ago

Here is the reasoning for that ==> https://github.com/kubernetes/kubernetes/issues/33554#issuecomment-266251056

But apart from the google.com problem, do you get problems with other urls?

macklenc commented 2 years ago

Thanks for the link, I'll take a look.

Yeah, that issue comes up with every URL I've tried. It gets really fun when accessing internal resources, e.g. pv0.home.net turns into pve0.home.net.home.net.

Oh... Interesting update ubuntu's apt update is working now when using my domain name instead of home.net. And I was mistaken, looks like netshoot is alpine based, not debian based (which still isn't resolving as expected). Installing nslookup on the ubuntu container shows the same behavior interestingly.

macklenc commented 2 years ago

I really appreciate your assistance. I wasn't able to get home.net working as my home network name, but after I removed my host from using cloudflare back to google domains that domain started to work. Super weird that the resolv DNS is able to bypass my local DNS overrides, but at least it works now.

macklenc commented 2 years ago

Once again, thanks for your help. Unless you have further input, I'll go ahead and close the issue. If anyone else comes along in the future, the problem seems to be related to how the unbound service in pfSense handles transparent traffic. If you want the external domain to be blocked you can set the system domain local zone type to static to block the outbound request, though I'm not sure what the side effects are, here's the relevant doc:

static
                 If  there  is a match from local data, the query is answered.
                 Otherwise, the query is answered  with  nodata  or  nxdomain.
                 For  a  negative  answer  a  SOA is included in the answer if
                 present as local-data for the zone apex domain.

It seems that the general recommendation is to choose an internal TLD that doesn't exist on the internet, e.g. home.lan instead.

macklenc commented 2 years ago

Figured I'd follow up with a better solution than what I implemented if anyone else has this issue: https://forums.lawrencesystems.com/t/google-domains-vs-cloudflare-dns-with-ndots-5-in-resolv-conf/12887/3?u=macklenc

hansaya commented 2 years ago

@macklenc Thank you so much. Your research fixed bunch of my issues. adding this to pfsense dns config made it easy

server:
local-zone: "example.com" static