k3d-io / k3d

Little helper to run CNCF's k3s in Docker
https://k3d.io/
MIT License
5.4k stars 460 forks source link

[BUG] DNS not resolving #209

Closed luisdavim closed 3 years ago

luisdavim commented 4 years ago

What did you do?

Start a pod and try a DNS query:

$ export KUBECONFIG="$(k3d get-kubeconfig --name='mycluster')"
$ kubectl run --restart=Never --rm -i --tty tmp --image=alpine -- sh
If you don't see a command prompt, try pressing enter.
/ # nslookup www.gmail.com
Server:         10.43.0.10
Address:        10.43.0.10:53

;; connection timed out; no servers could be reached

/ # cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.43.0.10
options ndots:5
/ # exit

Exec into the k3d container and do the same DNS query:

docker exec -it k3d-endpoint-server sh
/ # nslookup www.gmail.com
Server:         127.0.0.11
Address:        127.0.0.11:53

Non-authoritative answer:
www.gmail.com   canonical name = mail.google.com
mail.google.com canonical name = googlemail.l.google.com
Name:   googlemail.l.google.com
Address: 172.217.164.101

Non-authoritative answer:
www.gmail.com   canonical name = mail.google.com
mail.google.com canonical name = googlemail.l.google.com
Name:   googlemail.l.google.com
Address: 2607:f8b0:4005:80b::2005

/ # cat /etc/resolv.conf
nameserver 127.0.0.11
options ndots:0
/ # exit

What did you expect to happen? I would expect the pods in the k3d cluster to be able to resolve DNS names

Which OS & Architecture? MacOS 10.15.3

Which version of k3d?

Server: Docker Engine - Community Engine: Version: 19.03.5 API version: 1.40 (minimum version 1.12) Go version: go1.12.12 Git commit: 633a0ea Built: Wed Nov 13 07:29:19 2019 OS/Arch: linux/amd64 Experimental: true containerd: Version: v1.2.10 GitCommit: b34a5c8af56e510852c35414db4c1f4fa6172339 runc: Version: 1.0.0-rc8+dev GitCommit: 3e425f80a8c931f88e6d94a8c831b9d5aa481657 docker-init: Version: 0.18.0 GitCommit: fec3683

iwilltry42 commented 4 years ago

Hi there, thanks for opening this issue. As mentioned in the related issue (https://github.com/rancher/k3d/issues/101#issuecomment-603668428), CoreDNS is doing the name resolution inside the cluster. I couldn't reproduce this on Linux (with the same docker and k3d versions), so I guess that the difference in DNS settings is caused by Docker for Desktop.

luisdavim commented 4 years ago

To workaround this issue I'm patching the coredns configmap.

iwilltry42 commented 4 years ago

@luisdavim what's the patch that you apply?

consideRatio commented 4 years ago

@luisdavim I'm also interested in what patch you applied

irizzant commented 4 years ago

I'm pretty sure that adapting the forward . /etc/resolv.conf part is enough.

This let k8s use the machine DNS when the name cannot be resolved internally.

I think this should be the default behaviour.

irizzant commented 4 years ago

After more investigation I found that this could be related to the way k3d creates the docker network.

Indeed, k3d creates a custom docker network for each cluster and when this happens resolving is done through the docker daemon. The requests are actually forwarded to the DNS servers configured in your host's resolv.conf. But through a single DNS server (the embedded one of docker).

This means that if your daemon.json is, like mine, not configured to provide extra DNS servers it defaults to 8.8.8.8 which does not resolve any company address for example.

It would be useful to have a custom options to provide to k3d when it starts the cluster and specify the DNS servers there, as proposed here https://github.com/rancher/k3d/issues/165

iwilltry42 commented 4 years ago

Thanks for your additional input @irizzant , how would you add additional DNS servers here on k3d's side? The network opts of docker don't seem to have such an option (after having a quick glance over what's available there) :thinking:

irizzant commented 4 years ago

Personally I fixed this by injecting a custom ConfigMap for CoreDns, by changing

forward . /etc/resolv.conf

to:

forward . /etc/resolv.conf xxx.xxx.xxx.xxx

replacing the x with the IP of your DNS servers.

I had a quick look to the docker options and I confirm that I don't see an option to configure the custom DNS servers on the docker network.

Maybe a feasible option would be to add a custom flag to k3d command which adds the custom DNS servers to the CoreDns ConfigMap directly.

iwilltry42 commented 4 years ago

Currently, k3d doesn't interact with any Kubernetes resources inside the cluster (i.e. in k3s) and I tend to avoid this because of the huge dependencies on Kubernetes libraries it could draw in. Upon cluster creation this could work however by modifying the Chart that's being auto-deployed by k3s. Not sure if this could go into k3s itself instead :thinking:

irizzant commented 4 years ago

Maybe interacting with k8s itself it's not needed. k3s deploys whatever is under /var/lib/rancher/k3s/server/manifests, so you could add a valid CoreDns configuration just customizing the DNS part according to the command line flag.

dminca commented 4 years ago

This only happens to me if I deploy something in the default namespace, in other namespaces it worked just fine, idk why

LE: just noticed it doesn't matter which namespace you deploy stuff to, it's about the network you're in. So if I'm in the office (company LAN) I get this issue, but when I'm trying it from home, it just simply works. And I cannot say what network restrictions they applied in the company 😄

Also, @Athosone 's solution works for me now

YAMLcase commented 4 years ago

I seem to be running into this issues about every couple of weeks. This is the only workaround that seems to "just work" so I can get back to the job I'm paid to do:

sudo iptables -t nat -A PREROUTING -p udp -d 8.8.8.8  --dport 53 -j DNAT --to <your DNS server IP>
eigood commented 4 years ago

This seems to be broken, because the coredns pod does not have an /etc/resolv.conf in it, while the ConfigMap is configured to forward to that. All the docs have pointed told me that coredns will use the $HOST resolv.conf, but when I used k3d, which uses k3s, the coredns "pod" doesn't run as a container, or as a process on the $HOST. It runs as a process of containerd, and therefore it doesn't get any of the correct settings.

Athosone commented 4 years ago

For those who have the problem a simple fix is to mount your /etc/resolve.conf onto the cluster:

k3d cluster create --volume /etc/resolv.conf:/etc/resolv.conf
YAMLcase commented 4 years ago

What does that --volume command do exactly? I've used it to take advantage of other things registries.yaml, etc) but haven't taken the time to dig into what all it gets mapped to

Athosone commented 4 years ago

From what I understand there is nothing special. It just mounts the volume to the container running the k3s server. Thus I guess you could mount anything and even maybe do some airgapped setup

iwilltry42 commented 4 years ago

Hey folks, sorry for the radio silence, just getting back to k3d now... Let me reply to some of the messages here.


@irizzant

Maybe interacting with k8s itself it's not needed. k3s deploys whatever is under /var/lib/rancher/k3s/server/manifests, so you could add a valid CoreDns configuration just customizing the DNS part according to the command line flag.

That's a good starting point. Unfortunately, this would require us to write the file to disk and bind mount it into the container, as exec'ing into it afterwards to update the ConfigMap manifest, wouldn't update the actual thing inside the cluster (IIRC, there is no loop to do so). It's definitely doable, but we'd need to keep state somewhere and react to changes k3s does in the auto-deploy manifests.


@dminca

This only happens to me if I deploy something in the default namespace, in other namespaces it worked just fine, idk why

This is for real the weirdest thing on this thread that you're experiencing :thinking: No clue, what's going on there..


@yamlCase

I seem to be running into this issues about every couple of weeks. This is the only workaround that seems to "just work" so I can get back to the job I'm paid to do:

sudo iptables -t nat -A PREROUTING -p udp -d 8.8.8.8  --dport 53 -j DNAT --to <your DNS server IP>

Are you executing this on your local host (I assume so because of the sudo) to just route all the Google-DNS traffic (default Docker DNS) to your own DNS server?


@eigood & @Athosone

This seems to be broken, because the coredns pod does not have an /etc/resolv.conf in it, while the ConfigMap is configured to forward to that. All the docs have pointed told me that coredns will use the $HOST resolv.conf, but when I used k3d, which uses k3s, the coredns "pod" doesn't run as a container, or as a process on the $HOST. It runs as a process of containerd, and therefore it doesn't get any of the correct settings.

What do you mean by "it doesn't run as a container"? It surely is running in a container :thinking:

For those who have the problem a simple fix is to mount your /etc/resolve.conf onto the cluster:

k3d cluster create --volume /etc/resolv.conf:/etc/resolv.conf

Also, those two statements seem to conflict, right?


@yamlCase

What does that --volume command do exactly? I've used it to take advantage of other things registries.yaml, etc) but haven't taken the time to dig into what all it gets mapped to

It's doing basically the same as docker's --volume flag: bind-mounting a file or a directory into one or more containers, overlaying anything that might already be there (you can specify the access mode).


Is anyone experiencing this on a Linux machine or in WSL2, i.e. in Docker versions which do not run inside a VM? I could imagine, that we can modify the CoreDNS ConfigMap in the running cluster like we might be doing it to inject the host.k3d.internal entry in #360 . However, just mounting in your resolv.conf might be the easiest solution :thinking:

Athosone commented 4 years ago

Well maybe it conflicts but for me it solves the problem. For information I am running in wsl2. I dont know why it does not managed to use the right resolv.conf if I dint mount it.

iwilltry42 commented 4 years ago

@Athosone , I meant, that CoreDNS does indeed pick up the resolv.conf, as your solution works :+1: I guess the problem is more that the "default" resolv.conf in the k3s container (node) does not work for it (and that one you're effectively replacing with the volume mount) :+1:

irizzant commented 4 years ago

Just as a clarification, if you modify the docker daemon configuration (daemon.json) to add the company DNS and then you launch the k3d cluster you can "docker exec" into the containers where k8s running and you'll see that nslookup finds DNS entries served by company DNS. Consequently the docker container in which k8s is running picks up the right resolv.conf configuration.

The problem is that the default resolv.conf in k8s just reports:

forward . /etc/resolv.conf

which forces CoreDNS to use its own resolv.conf bypassing the docker one.

eigood commented 4 years ago

When I run k3d cluster create, I can certainly volume mount $HOST files into the docker containers(agent+server). However, containerd is then used to start the CoreDNS pod, running inside k8s. It is this internal container that needs to have an /etc/resolv.conf. Nothing I do to the k3d command will allow me to adjust the internal containerd/pod that is created.

I did a bit of research, trying to figure out that this was the case, by figuring out where containerd stores it's filesystems.

eshepelyuk commented 3 years ago

I am running k3d in a Docker that runs in VM in Virtual Box (actually I'm using Docker Toolbox for Windows product that does all the setup).

And I'm experiencing the same problem

I can see that my VM in Virtual Box and all containers running inside it are using proper DNS configurations, but pods - not.

avodaqstephan commented 3 years ago

Mounting the resolv.conf works but this can't be the best solution. If you missed that oppertunity at the beginning you need to re-create a k3d cluster just to mount that volume.

forward . /etc/resolv.conf xxx.xxx.xxx.xxx

This one did not work for me.

Edit: Forward is working but I need to place my DNS server in front.

forward . xxx.xxx.xxx.xxx /etc/resolv.conf

szapps commented 3 years ago

For those who have the problem a simple fix is to mount your /etc/resolve.conf onto the cluster:

k3d cluster create --volume /etc/resolv.conf:/etc/resolv.conf

VERY HELPFULL HINT

iwilltry42 commented 3 years ago

Can someone give those binaries a try and use the experimental --enable-dns-magic flag to try to solve this problem similar to how kind does it? https://drive.google.com/drive/folders/1Hu4Z0fAYr4ktw7r07naBUxOJY9EYr0Bq?usp=sharing They are built from feature/dns

eshepelyuk commented 3 years ago

@iwilltry42

$ ./k3d-windows-amd64.exe --verbose cluster create k3s --enable-dns-magic --wait --no-lb --api-port 192.168.99.103:6550
DEBU[0000] Selected runtime is 'docker.Docker'
DEBU[0000] '--update-default-kubeconfig set: enabling wait-for-server
INFO[0000] Network with name 'k3d-k3s' already exists with ID '856209b2e35cf43c5d5b51d78757c4e57b828c2d8c24c7888a8c3ea2e420816a'
INFO[0000] Created volume 'k3d-k3s-images'
INFO[0001] Creating node 'k3d-k3s-server-0'
DEBU[0001] Created container k3d-k3s-server-0 (ID: 4711af3bcb9c3efb41d4e5bee47ce846e96afda76d9c69fc703d3ea72c656612)
DEBU[0002] Created node 'k3d-k3s-server-0'
DEBU[0002] Waiting for server node 'k3d-k3s-server-0' to get ready
DEBU[0066] Finished waiting for log message 'k3s is up and running' from node 'k3d-k3s-server-0'
INFO[0066] (Optional) Trying to get IP of the docker host and inject it into the cluster as 'host.k3d.internal' for easy access
DEBU[0066] Executing command '[sh -c nslookup host.docker.internal]' in node 'k3d-k3s-server-0'
DEBU[0067] Exec Process Failed, but we still got logs, so we're at least trying to get the IP from there...
ERRO[0067] Exec process in node 'k3d-k3s-server-0' failed with exit code '1'
WARN[0067] Failed to get HostIP: Failed to read address for 'host.docker.internal' from nslookup response
INFO[0067] Enabling DNS Magic...
DEBU[0067] Executing command '[sh -c nslookup host.docker.internal]' in node 'k3d-k3s-server-0'
DEBU[0069] Exec Process Failed, but we still got logs, so we're at least trying to get the IP from there...
ERRO[0069] Exec process in node 'k3d-k3s-server-0' failed with exit code '1'
WARN[0069] Failed to enable DNS Magic: %!w(*errors.errorString=&{Failed to get HostIP: Failed to read address for 'host.docker.internal' from nslookup response})
INFO[0069] Cluster 'k3s' created successfully!
DEBU[0069] Updating default kubeconfig with a new context for cluster k3s
DEBU[0069] Setting new current-context 'k3d-k3s'
DEBU[0070] Wrote kubeconfig to 'C:\Users\eshep/.kube/config'
INFO[0070] You can now use it like this:
kubectl cluster-info

Apparently DNS resolution not working from pods since magic not applied. And DNS resolution still works from the k3s docker container itself.

iwilltry42 commented 3 years ago

@eshepelyuk that seems to be quite a special case. Can you please try again with --trace enabled (You can trim the log output after the DNS magic. I have no experience with Docker Toolbox for Windows. Do you know, how it refers to the Host system? We're trying docker-machine ip and the host.docker.internal on Windows.

eshepelyuk commented 3 years ago

Hello @iwilltry42

  1. Adding requested logs.

Please note that I ran docker-machine ip before k3d and it returned proper address of VirtualBox VM with Docker inside, the same address I've passed via --api-port.

  1. Another question, is there a way to pass that host.docker.internal to k3d command ?
Here are the logs, click to expand ``` $ docker-machine.exe ip 192.168.99.103 $ ./k3d-windows-amd64.exe --trace --verbose cluster create k3s --enable-dns-magic --wait --no-lb --api-port $(docker-machine ip):6550 DEBU[0000] Selected runtime is 'docker.Docker' TRAC[0000] PortFilterMap: map[] TRAC[0000] LabelFilterMap: map[] TRAC[0000] EnvFilterMap: map[] DEBU[0000] '--update-default-kubeconfig set: enabling wait-for-server INFO[0000] Network with name 'k3d-k3s' already exists with ID '856209b2e35cf43c5d5b51d78757c4e57b828c2d8c24c7888a8c3ea2e420816a' INFO[0000] Created volume 'k3d-k3s-images' INFO[0001] Creating node 'k3d-k3s-server-0' TRAC[0001] Creating node from spec &{Name:k3d-k3s-server-0 Role:server Image:docker.io/rancher/k3s:v1.19.3-k3s3 Volumes:[k3d-k3s-images:/k3d/images] Env:[K3S_TOKEN=jWiJFvqWengrHwISsBKh] Cmd:[] Args:[] Ports:[192.168.99.103:6550:6443/tcp] Restart:false Labels:map[k3d.cluster:k3s k3d.cluster.imageVolume:k3d-k3s-images k3d.cluster.network:856209b2e35cf43c5d5b51d78757c4e57b828c2d8c24c7888a8c3ea2e420816a k3d.cluster.network.external:true k3d.cluster.token:jWiJFvqWengrHwISsBKh k3d.cluster.url:https://k3d-k3s-server-0:6443] Network:856209b2e35cf43c5d5b51d78757c4e57b828c2d8c24c7888a8c3ea2e420816a ExtraHosts:[] ServerOpts:{IsInit:false ExposeAPI:{Host:192.168.99.103 HostIP:192.168.99.103 Port:6550}} AgentOpts:{} GPURequest: State:{Running:false Status:}} TRAC[0001] Creating docker container with translated config &{ContainerConfig:{Hostname:k3d-k3s-server-0 Domainname: User: AttachStdin:false AttachStdout:false AttachStderr:false ExposedPorts:map[6443/tcp:{}] Tty:false OpenStdin:false StdinOnce:false Env:[K3S_TOKEN=jWiJFvqWengrHwISsBKh K3S_KUBECONFIG_OUTPUT=/output/kubeconfig.yaml] Cmd:[server --tls-san 192.168.99.103] Healthcheck: ArgsEscaped:false Image:docker.io/rancher/k3s:v1.19.3-k3s3 Volumes:map[] WorkingDir: Entrypoint:[] NetworkDisabled:false MacAddress: OnBuild:[] Labels:map[app:k3d k3d.cluster:k3s k3d.cluster.imageVolume:k3d-k3s-images k3d.cluster.network:856209b2e35cf43c5d5b51d78757c4e57b828c2d8c24c7888a8c3ea2e420816a k3d.cluster.network.external:true k3d.cluster.token:jWiJFvqWengrHwISsBKh k3d.cluster.url:https://k3d-k3s-server-0:6443 k3d.role:server k3d.server.api.host:192.168.99.103 k3d.server.api.hostIP:192.168.99.103 k3d.server.api.port:6550] StopSignal: StopTimeout: Shell:[]} HostConfig:{Binds:[k3d-k3s-images:/k3d/images] ContainerIDFile: LogConfig:{Type: Config:map[]} NetworkMode: PortBindings:map[6443/tcp:[{HostIP:192.168.99.103 HostPort:6550}]] RestartPolicy:{Name: MaximumRetryCount:0} AutoRemove:false VolumeDriver: VolumesFrom:[] CapAdd:[] CapDrop:[] Capabilities:[] CgroupnsMode: DNS:[] DNSOptions:[] DNSSearch:[] ExtraHosts:[] GroupAdd:[] IpcMode: Cgroup: Links:[] OomScoreAdj:0 PidMode: Privileged:true PublishAllPorts:false ReadonlyRootfs:false SecurityOpt:[] StorageOpt:map[] Tmpfs:map[/run: /var/run:] UTSMode: UsernsMode: ShmSize:0 Sysctls:map[] Runtime: ConsoleSize:[0 0] Isolation: Resources:{CPUShares:0 Memory:0 NanoCPUs:0 CgroupParent: BlkioWeight:0 BlkioWeightDevice:[] BlkioDeviceReadBps:[] BlkioDeviceWriteBps:[] BlkioDeviceReadIOps:[] BlkioDeviceWriteIOps:[] CPUPeriod:0 CPUQuota:0 CPURealtimePeriod:0 CPURealtimeRuntime:0 CpusetCpus: CpusetMems: Devices:[] DeviceCgroupRules:[] DeviceRequests:[] KernelMemory:0 KernelMemoryTCP:0 MemoryReservation:0 MemorySwap:0 MemorySwappiness: OomKillDisable: PidsLimit: Ulimits:[] CPUCount:0 CPUPercent:0 IOMaximumIOps:0 IOMaximumBandwidth:0} Mounts:[] MaskedPaths:[] ReadonlyPaths:[] Init:0xc0002cc06a} NetworkingConfig:{EndpointsConfig:map[856209b2e35cf43c5d5b51d78757c4e57b828c2d8c24c7888a8c3ea2e420816a:0xc000194840]}} DEBU[0001] Created container k3d-k3s-server-0 (ID: c8d4f2fd7253433d4fb94be5015743882750e31ebdf6a4f1bad599da67efe720) DEBU[0002] Created node 'k3d-k3s-server-0' DEBU[0002] Waiting for server node 'k3d-k3s-server-0' to get ready DEBU[0059] Finished waiting for log message 'k3s is up and running' from node 'k3d-k3s-server-0' INFO[0059] (Optional) Trying to get IP of the docker host and inject it into the cluster as 'host.k3d.internal' for easy access TRAC[0059] Runtime GOOS: windows DEBU[0059] Executing command '[sh -c nslookup host.docker.internal]' in node 'k3d-k3s-server-0' TRAC[0059] Exec process '[sh -c nslookup host.docker.internal]' still running in node 'k3d-k3s-server-0'.. sleeping for 1 second... DEBU[0060] Exec Process Failed, but we still got logs, so we're at least trying to get the IP from there... TRAC[0060] -> Exec Process Error was: Exec process in node 'k3d-k3s-server-0' failed with exit code '1' TRAC[0060] Scanning Log Line 'Server: 127.0.0.11' TRAC[0060] Scanning Log Line 'Address: 127.0.0.11:53' TRAC[0060] Scanning Log Line '' TRAC[0060] Scanning Log Line '** server can't find host.docker.internal: NXDOMAIN' TRAC[0060] Scanning Log Line '' TRAC[0060] Scanning Log Line '** server can't find host.docker.internal: NXDOMAIN' TRAC[0060] Scanning Log Line '' ERRO[0060] Exec process in node 'k3d-k3s-server-0' failed with exit code '1' WARN[0060] Failed to get HostIP: Failed to read address for 'host.docker.internal' from nslookup response INFO[0060] Enabling DNS Magic... TRAC[0060] Runtime GOOS: windows DEBU[0060] Executing command '[sh -c nslookup host.docker.internal]' in node 'k3d-k3s-server-0' TRAC[0060] Exec process '[sh -c nslookup host.docker.internal]' still running in node 'k3d-k3s-server-0'.. sleeping for 1 second... DEBU[0061] Exec Process Failed, but we still got logs, so we're at least trying to get the IP from there... TRAC[0061] -> Exec Process Error was: Exec process in node 'k3d-k3s-server-0' failed with exit code '1' TRAC[0061] Scanning Log Line 'Server: 127.0.0.11' TRAC[0061] Scanning Log Line 'Address: 127.0.0.11:53' TRAC[0061] Scanning Log Line '' TRAC[0061] Scanning Log Line '** server can't find host.docker.internal: NXDOMAIN' TRAC[0061] Scanning Log Line '' TRAC[0061] Scanning Log Line '** server can't find host.docker.internal: NXDOMAIN' TRAC[0061] Scanning Log Line '' ERRO[0061] Exec process in node 'k3d-k3s-server-0' failed with exit code '1' WARN[0061] Failed to enable DNS Magic: %!w(*errors.errorString=&{Failed to get HostIP: Failed to read address for 'host.docker.internal' from nslookup response}) INFO[0061] Cluster 'k3s' created successfully! .......... ```
iwilltry42 commented 3 years ago

@eshepelyuk , thank you! I just realized, that for the DNS magic (and all other related parts), we're not trying docker-machine ip. I added this now and uploaded a new build to the google drive folder (or you build it yourself). Thanks for testing!

eshepelyuk commented 3 years ago

Hello @iwilltry42

Just downloaded a new build from the link provided above.

The error and behaviour are the same.

$ ./k3d-windows-amd64.exe --version
k3d version v3.3.0-2-g296bd5c
k3s version v1.19.3-k3s3 (default)

$ ./k3d-windows-amd64.exe --trace --verbose cluster create k3s --enable-dns-magic --wait --no-lb --api-port $(docker-machine ip):6550

.....

DEBU[0002] Waiting for server node 'k3d-k3s-server-0' to get ready
DEBU[0062] Finished waiting for log message 'k3s is up and running' from node 'k3d-k3s-server-0'
INFO[0062] (Optional) Trying to get IP of the docker host and inject it into the cluster as 'host.k3d.internal' for easy access
TRAC[0062] Runtime GOOS: windows
DEBU[0062] Docker Machine found: default
DEBU[0062] Executing command '[sh -c nslookup host.docker.internal]' in node 'k3d-k3s-server-0'
TRAC[0063] Exec process '[sh -c nslookup host.docker.internal]' still running in node 'k3d-k3s-server-0'.. sleeping for 1 second...
DEBU[0064] Exec Process Failed, but we still got logs, so we're at least trying to get the IP from there...
TRAC[0064] -> Exec Process Error was: Exec process in node 'k3d-k3s-server-0' failed with exit code '1'
TRAC[0064] Scanning Log Line 'Server:           127.0.0.11'
TRAC[0064] Scanning Log Line 'Address:  127.0.0.11:53'
TRAC[0064] Scanning Log Line ''
TRAC[0064] Scanning Log Line '** server can't find host.docker.internal: NXDOMAIN'
TRAC[0064] Scanning Log Line ''
TRAC[0064] Scanning Log Line '** server can't find host.docker.internal: NXDOMAIN'
TRAC[0064] Scanning Log Line ''
ERRO[0064] Exec process in node 'k3d-k3s-server-0' failed with exit code '1'
WARN[0064] Failed to get HostIP: Failed to read address for 'host.docker.internal' from nslookup response
INFO[0064] Enabling DNS Magic...
TRAC[0064] Runtime GOOS: windows
DEBU[0064] Docker Machine found: default
DEBU[0064] Executing command '[sh -c nslookup host.docker.internal]' in node 'k3d-k3s-server-0'
TRAC[0064] Exec process '[sh -c nslookup host.docker.internal]' still running in node 'k3d-k3s-server-0'.. sleeping for 1 second...
DEBU[0065] Exec Process Failed, but we still got logs, so we're at least trying to get the IP from there...
TRAC[0065] -> Exec Process Error was: Exec process in node 'k3d-k3s-server-0' failed with exit code '1'
TRAC[0065] Scanning Log Line 'Server:           127.0.0.11'
TRAC[0065] Scanning Log Line 'Address:  127.0.0.11:53'
TRAC[0065] Scanning Log Line ''
TRAC[0065] Scanning Log Line '** server can't find host.docker.internal: NXDOMAIN'
TRAC[0065] Scanning Log Line ''
TRAC[0065] Scanning Log Line '** server can't find host.docker.internal: NXDOMAIN'
TRAC[0065] Scanning Log Line ''
ERRO[0065] Exec process in node 'k3d-k3s-server-0' failed with exit code '1'
WARN[0065] Failed to enable DNS Magic: Failed to get HostIP: Failed to read address for 'host.docker.internal' from nslookup response
INFO[0065] Cluster 'k3s' created successfully!

....
iwilltry42 commented 3 years ago

@eshepelyuk thanks again for testing! Apparently my workday before this was too long and stupid me forgot a return there :roll_eyes: I uploaded a new build of feature/dns now which should actually use the docker-machine IP after looking it up (which worked according to the logs) :+1:

eshepelyuk commented 3 years ago

Hello @iwilltry42

Unfortunately, there's no progress. No errors in logs, though.

DEBU[0001] Created container k3d-k3s-server-0 (ID: f34492b4b1bf149d86f0957a5b20aa82d56ef3213cfb4f65fc08bb32bf1104d8)
DEBU[0002] Created node 'k3d-k3s-server-0'
DEBU[0002] Waiting for server node 'k3d-k3s-server-0' to get ready
DEBU[0080] Finished waiting for log message 'k3s is up and running' from node 'k3d-k3s-server-0'
INFO[0080] (Optional) Trying to get IP of the docker host and inject it into the cluster as 'host.k3d.internal' for easy access
TRAC[0080] Runtime GOOS: windows
DEBU[0080] Docker Machine found: default
DEBU[0081] Adding extra host entry '192.168.99.103 host.k3d.internal'...
DEBU[0081] Executing command '[sh -c echo '192.168.99.103 host.k3d.internal' >> /etc/hosts]' in node 'k3d-k3s-server-0'
TRAC[0081] Exec process '[sh -c echo '192.168.99.103 host.k3d.internal' >> /etc/hosts]' still running in node 'k3d-k3s-server-0'.. sleeping for 1 second...
DEBU[0082] Exec process in node 'k3d-k3s-server-0' exited with '0'
DEBU[0082] Executing command '[sh -c test=$(kubectl get cm coredns -n kube-system --template='{{.data.NodeHosts}}' | sed -n -E -e '/[0-9\.]{4,12}\s+host\.k3d\.internal$/!p' -e '$a192.168.99.103 host.k3d.internal' | tr '\n' '^' | busybox xargs -0 printf '{"data": {"NodeHosts":"%s"}}'| sed -E 's%\^%\\n%g') && kubectl patch cm coredns -n kube-system -p="$test"]' in node 'k3d-k3s-server-0'
TRAC[0082] Exec process '[sh -c test=$(kubectl get cm coredns -n kube-system --template='{{.data.NodeHosts}}' | sed -n -E -e '/[0-9\.]{4,12}\s+host\.k3d\.internal$/!p' -e '$a192.168.99.103 host.k3d.internal' | tr '\n' '^' | busybox xargs -0 printf '{"data": {"NodeHosts":"%s"}}'| sed -E 's%\^%\\n%g') && kubectl patch cm coredns -n kube-system -p="$test"]' still running in node 'k3d-k3s-server-0'.. sleeping for 1 second...
TRAC[0083] Exec process '[sh -c test=$(kubectl get cm coredns -n kube-system --template='{{.data.NodeHosts}}' | sed -n -E -e '/[0-9\.]{4,12}\s+host\.k3d\.internal$/!p' -e '$a192.168.99.103 host.k3d.internal' | tr '\n' '^' | busybox xargs -0 printf '{"data": {"NodeHosts":"%s"}}'| sed -E 's%\^%\\n%g') && kubectl patch cm coredns -n kube-system -p="$test"]' still running in node 'k3d-k3s-server-0'.. sleeping for 1 second...
TRAC[0084] Exec process '[sh -c test=$(kubectl get cm coredns -n kube-system --template='{{.data.NodeHosts}}' | sed -n -E -e '/[0-9\.]{4,12}\s+host\.k3d\.internal$/!p' -e '$a192.168.99.103 host.k3d.internal' | tr '\n' '^' | busybox xargs -0 printf '{"data": {"NodeHosts":"%s"}}'| sed -E 's%\^%\\n%g') && kubectl patch cm coredns -n kube-system -p="$test"]' still running in node 'k3d-k3s-server-0'.. sleeping for 1 second...
TRAC[0085] Exec process '[sh -c test=$(kubectl get cm coredns -n kube-system --template='{{.data.NodeHosts}}' | sed -n -E -e '/[0-9\.]{4,12}\s+host\.k3d\.internal$/!p' -e '$a192.168.99.103 host.k3d.internal' | tr '\n' '^' | busybox xargs -0 printf '{"data": {"NodeHosts":"%s"}}'| sed -E 's%\^%\\n%g') && kubectl patch cm coredns -n kube-system -p="$test"]' still running in node 'k3d-k3s-server-0'.. sleeping for 1 second...
TRAC[0086] Exec process '[sh -c test=$(kubectl get cm coredns -n kube-system --template='{{.data.NodeHosts}}' | sed -n -E -e '/[0-9\.]{4,12}\s+host\.k3d\.internal$/!p' -e '$a192.168.99.103 host.k3d.internal' | tr '\n' '^' | busybox xargs -0 printf '{"data": {"NodeHosts":"%s"}}'| sed -E 's%\^%\\n%g') && kubectl patch cm coredns -n kube-system -p="$test"]' still running in node 'k3d-k3s-server-0'.. sleeping for 1 second...
TRAC[0087] Exec process '[sh -c test=$(kubectl get cm coredns -n kube-system --template='{{.data.NodeHosts}}' | sed -n -E -e '/[0-9\.]{4,12}\s+host\.k3d\.internal$/!p' -e '$a192.168.99.103 host.k3d.internal' | tr '\n' '^' | busybox xargs -0 printf '{"data": {"NodeHosts":"%s"}}'| sed -E 's%\^%\\n%g') && kubectl patch cm coredns -n kube-system -p="$test"]' still running in node 'k3d-k3s-server-0'.. sleeping for 1 second...
TRAC[0088] Exec process '[sh -c test=$(kubectl get cm coredns -n kube-system --template='{{.data.NodeHosts}}' | sed -n -E -e '/[0-9\.]{4,12}\s+host\.k3d\.internal$/!p' -e '$a192.168.99.103 host.k3d.internal' | tr '\n' '^' | busybox xargs -0 printf '{"data": {"NodeHosts":"%s"}}'| sed -E 's%\^%\\n%g') && kubectl patch cm coredns -n kube-system -p="$test"]' still running in node 'k3d-k3s-server-0'.. sleeping for 1 second...
TRAC[0089] Exec process '[sh -c test=$(kubectl get cm coredns -n kube-system --template='{{.data.NodeHosts}}' | sed -n -E -e '/[0-9\.]{4,12}\s+host\.k3d\.internal$/!p' -e '$a192.168.99.103 host.k3d.internal' | tr '\n' '^' | busybox xargs -0 printf '{"data": {"NodeHosts":"%s"}}'| sed -E 's%\^%\\n%g') && kubectl patch cm coredns -n kube-system -p="$test"]' still running in node 'k3d-k3s-server-0'.. sleeping for 1 second...
DEBU[0090] Exec process in node 'k3d-k3s-server-0' exited with '0'
INFO[0090] Successfully added host record to /etc/hosts in 1/1 nodes and to the CoreDNS ConfigMap
INFO[0090] Enabling DNS Magic...
TRAC[0090] Runtime GOOS: windows
DEBU[0090] Docker Machine found: default
DEBU[0091] Executing command '[sh -c iptables-save | sed -e 's/--to-source :53/--to-source 192.168.99.103:53/g' -e 's/-A OUTPUT \(.*\) -j DOCKER_OUTPUT/\0\n-A PREROUTING \1 -j DOCKER_OUTPUT/g' -e 's/-d 127.0.0.11/-d 192.168.99.103/g' | iptables-restore && cp /etc/resolv.conf /etc/resolv.conf.original && sed -e 's/127.0.0.11/192.168.99.103/g' /etc/resolv.conf.original >/etc/resolv.conf]' in node 'k3d-k3s-server-0'
TRAC[0091] Exec process '[sh -c iptables-save | sed -e 's/--to-source :53/--to-source 192.168.99.103:53/g' -e 's/-A OUTPUT \(.*\) -j DOCKER_OUTPUT/\0\n-A PREROUTING \1 -j DOCKER_OUTPUT/g' -e 's/-d 127.0.0.11/-d 192.168.99.103/g' | iptables-restore && cp /etc/resolv.conf /etc/resolv.conf.original && sed -e 's/127.0.0.11/192.168.99.103/g' /etc/resolv.conf.original >/etc/resolv.conf]' still running in node 'k3d-k3s-server-0'.. sleeping for 1 second...
DEBU[0092] Exec process in node 'k3d-k3s-server-0' exited with '0'
INFO[0092] Successfully applied DNS Magic to 1/1 nodes
INFO[0092] Cluster 'k3s' created successfully!
iwilltry42 commented 3 years ago

Yeah, so it seems like it at least achieved what should've been achieved. The change is similar to what kind does to "fix" DNS resolution on some systems: Set the host IP as a nameserver in /etc/resolv.conf and configure the iptables rules accordingly. Seems like this doesn't give us anything though :thinking: I'll research a bit more on this. Any input is highly appreciated :)

The problem behind this is, that in custom docker networks, docker uses the embedded DNS server (which also forwards DNS requests). This one is reachable from within the containers via special routes (which is why names can properly be resolved inside the k3s containers), but not from within the pods running inside those containers (which is why the forward . /etc/resolv.conf in CoreDNS doesn't give us much here). When you run a container in docker's default bridge network however, the containers will receive a copy of the host's /etc/resolv.conf, making the name resolution fairly straight forward. We could try to mirror this behavior now when creating the custom docker networks, however, I don't know, how this translates to Docker for Desktop and docker-machine/docker-toolbox :thinking: Any ideas?

therealmikz commented 3 years ago

I am having the exact same issue @eshepelyuk has - inside k3s container all custom domains are seen without any issue but inside any pod they're not resolved. Linux here, newest k3d (installed today). My host is using custom networkmanager connection with few domains set up to use VPN dns. Forward trick worked but it's annoying to do this everytime when I create cluster.

avodaqstephan commented 3 years ago

Workaround: Edit the coredns configmap

forward . xxx.xxx.xxx.xxx /etc/resolv.conf

Problem: This is not persistent if you restart k3d. That is actually the real issue! Which component is responsible for re-writing / resetting changes in the configmap of coredns?

iwilltry42 commented 3 years ago

@avodaqstephan

Workaround: Edit the coredns configmap

forward . xxx.xxx.xxx.xxx /etc/resolv.conf

Problem: This is not persistent if you restart k3d. That is actually the real issue! Which component is responsible for re-writing / resetting changes in the configmap of coredns?

That's k3s itself, which deploys some manifests automatically upon startup: https://rancher.com/docs/k3s/latest/en/advanced/#auto-deploying-manifests

eshepelyuk commented 3 years ago

Just tried v4.0.0 and the error still exists

eshepelyuk commented 3 years ago

But I have to say that adding volume /etc/resolv.conf solved the issue completely and doesn't require patching CoreDNS.

My config file

apiVersion: k3d.io/v1alpha1
kind: Simple
name: k3s
volumes:
  - volume: /c/Users/eshep/.k3s/manifests/cm.yaml:/var/lib/rancher/k3s/server/manifests/cm.yaml
    nodeFilters:
      - all
  - volume: /etc/resolv.conf:/etc/resolv.conf
    nodeFilters:
      - all
options:
  k3d:
    wait: true
    disableLoadbalancer: false
cro commented 3 years ago

I seem to be having a similar issue with Docker Desktop for macOS and the Palo Alto Networks Global Protect VPN.

When the VPN is up, regular Docker containers don't have any problems resolving DNS via the resolver that is pushed to the client at VPN connect. However, under k3d, no containers in any pods can resolve those same addresses.

I haven't tried to map the resolv.conf into the coredns container yet. Why doesn't k3d/k3s just use the host resolver?

ericis commented 3 years ago

I've attempted quite a few of the above solution recommendations with no success.

I've documented the setup and logs here: https://gist.github.com/ericis/c3c0b9c2fbb78fdf56b2193a88c35f6f

INFO[0005] Starting Node 'k3d-dev-serverlb'             
INFO[0005] (Optional) Trying to get IP of the docker host and inject it into the cluster as 'host.k3d.internal' for easy access 
WARN[0006] Failed to patch CoreDNS ConfigMap to include entry '172.28.0.1 host.k3d.internal': Exec process in node 'k3d-dev-server-0' failed with exit code '1' 
iwilltry42 commented 3 years ago

@cro

Why doesn't k3d/k3s just use the host resolver?

How would you suggest that k3d achieves this? We could search for e.g. the /etc/resolv.conf, but that's not platform-agnostic. Then people may as well use different nameresolvers, which k3d would have to account for, so we probably could only have a workaround for some people and not for others. Additionally, there's a challenge to react to changes during the cluster runtime, e.g. when someone (de-)activates a VPN connection that pushes DNS settings. I'd be really happy in general to finally find a generic fix for this :thinking:

@ericis That warning in your logs is not critical for the cluster and does not affect the general DNS issues (it's just an optional tweak that allows you to address your host system from inside the cluster by name). However, the fact that the exec process fails may be a hint, that something else is ggoing wrong there. Can you share the logs of the server container (docker logs k3d-dev-server-0) in the Gist?

ericis commented 3 years ago

Thanks @iwilltry42

I've updated to share the logs:

Also, I have corporate Zscaler internet firewall running. I tried it with Zscaler completely disabled. I also tried with it enabled and copying the CA Root Certificate Authority into both of the docker containers using docker cp ..., verified the certificate file was copied, appended it to "/etc/ssl/certs/ca-certificates.crt" and then restarted the containers. But, I get the same errors.

I also tried pinging:

k3d-dev-server-0

/ # ping www.google.com
ping: bad address 'www.google.com'
/ # nslookup www.google.com
;; connection timed out; no servers could be reached

k3d-dev-serverlb

/ # ping www.google.com
PING www.google.com (142.250.68.164): 56 data bytes
64 bytes from 142.250.68.164: seq=0 ttl=37 time=50.173 ms
64 bytes from 142.250.68.164: seq=1 ttl=37 time=50.178 ms
64 bytes from 142.250.68.164: seq=2 ttl=37 time=50.706 ms
64 bytes from 142.250.68.164: seq=3 ttl=37 time=50.309 ms
^C
--- www.google.com ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 50.173/50.341/50.706 ms
/ # nslookup www.google.com
Server:         127.0.0.11
Address:        127.0.0.11:53

Non-authoritative answer:
Name:   www.google.com
Address: 2607:f8b0:4008:800::2004

Non-authoritative answer:
Name:   www.google.com
Address: 142.250.68.164
ericis commented 3 years ago

Also, here was a new attempt using Kubernetes configuration YAML files: https://gist.github.com/ericis/c3c0b9c2fbb78fdf56b2193a88c35f6f

Same errors as above...

k3d cluster create dev \
    --config dev-cluster.yaml

kubectl apply -f nginx-deployment.yaml

kubectl create service clusterip nginx --tcp=80:80

kubectl apply -f nginx.yaml

kubectl describe pods
iwilltry42 commented 3 years ago

@ericis , in your case, the cluster does not even come up (i.e. system pods don't start), as it cannot even resolve registry-1.docker.io. This might be worth a separate issue/discussion, especially since there's a corporate firewall in place as well :thinking:

cro commented 3 years ago

Why doesn't k3d/k3s just use the host resolver?

How would you suggest that k3d achieves this? We could search for e.g. the /etc/resolv.conf, but that's not platform-agnostic...

I'm sorry, that question was kind of shortsighted, wasn't it? I note that the k8s cluster provided inside Docker for Desktop on the Mac doesn't seem to have this problem, but that is decidedly not a platform agnostic solution. I would be content with a workaround of any kind since I much prefer k3d/k3s, it uses fewer resources.

itaylor commented 3 years ago

For those like me for whom the mounting k3d cluster create --volume /etc/resolv.conf:/etc/resolv.conf doesn't fix the issue, you may be running on a host system that uses systemd-resolve. If you are, the /etc/resov.conf on the host system will have a nameserver set to a 127.0.0.x address for the systemd-resolve cache server, which won't resolve properly inside the k3d container. Instead you'd need to mount the resolve.conf that systemd-resolve uses. For me, this was: /run/systemd/resolve/resolv.conf

This makes the command I had to run to fix this issue: k3d cluster create --volume /run/systemd/resolve/resolv.conf:/etc/resolv.conf

eshepelyuk commented 3 years ago

Hi, all

One interesting observation. Am on Win10, using Docker Toolbox (i.e. VirtualBox VM running Docker). After removing all the VPN - I've discovered that mounting /etc/resolv.conf is braeking DNS resolution from within pods :)

As soon as I removed mounting /etc/resolv.conf - resolution worked fine.

danielefranceschi commented 3 years ago

Confirming that the issue is present in 4.0.0 on Centos7+docker 20.10.6.

Furthermore, injected registries don't go in the coredns configmap under NodeHosts, while k3d-created ones do.

The only workaround was, as said earlier, patch the coredns configmap:

apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
        }
        hosts /etc/coredns/NodeHosts {
          ttl 60
          reload 15s
          fallthrough
        }
        prometheus :9153
        forward . 10.0.0.1
        cache 30
        loop
        reload
        loadbalance
    }
  NodeHosts: |
    192.168.176.3 k3d-local-server-0
    192.168.176.1 host.k3d.internal
    192.168.176.2 k3d-local-registry
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
danielefranceschi commented 3 years ago

some quick-and-dirty bash for patching the coredns configmap with the nameservers in /etc/resolv.conf:

NAMESERVERS=`grep nameserver /etc/resolv.conf | awk '{print $2}' | xargs`
cmpatch=$(kubectl get cm coredns -n kube-system --template='{{.data.Corefile}}' | sed "s/forward.*/forward . $NAMESERVERS/g" | tr '\n' '^' | xargs -0 printf '{"data": {"Corefile":"%s"}}' | sed -E 's%\^%\\n%g') && kubectl patch cm coredns -n kube-system -p="$cmpatch"

(code adapted from here)

srstsavage commented 3 years ago

Another request for this

add a custom flag to k3d command which adds the custom DNS servers to the CoreDns ConfigMap directly.

and another solution for editing the coredns configmap (this one adding a known set of needed DNS servers 10.0.99.1 and 10.0.99.2)

KUBE_EDITOR="sed -i 's|forward.*|forward . 10.0.99.1 10.0.99.2|'" \
  kubectl edit -n kube-system cm coredns

Ideally this could be done using something like

k3d cluster create --servers 3 --dns 10.0.99.1 --dns 10.0.99.2