Open ikus060 opened 6 years ago
I ran into this exact same issue with a fresh installation of OCP 3.7 on a RHEL 7.4 VM.
The outbound networking worked from the VM. The outbound networking also worked when I ran a container out of band from Kubernetes (using docker run). OCP ran the container, the outbound networking broke but it could be fixed by removing the options ndots:5 or "search josborne.com". I couldn't figure out where "search josborne.com" was even coming from because I didn't set that anywhere in the Ansible advanced installation. I changed my /etc/hostname file from openshift.josborne.com to openshift and rebooted. At that point "search josborne.com" was removed from the pod /etc/resolv.conf and everything started working. Is this user error or a bug? I've installed every release of OCP from scratch using a FQDN in my /etc/hostname file and it first broke in either 3.6 or 3.7 so I think something has changed in the platform.
Right, so the problem is that if the domain that gets listed in the search
line does wildcard matching, then because of the ndots:5
, basically all hostnames will end up being treated as subdomains of the default domain. Eg, *.josbourne.com
appears to resolve to a particular AWS hostname, so if you look up, say, github.com
, it ends up matching as github.com.josbourne.com
which resolves to the AWS IP.
I guess the search
field in the pod resolv.conf is set automatically from the node hostname?
What we really want is to make service name lookups behave like ndots:5
, but make other lookups not do that. We can't make the libc resolver do that, but in cases where we're running a DNS server inside the cluster, we could do the ndots
-like special-casing inside that server, and then we could give the pods a resolv.conf without ndots
.
The other possibility would be to stop including the node's domain in the pod resolv.conf's search
field, but that would break any existing pods that were depending on the current behavior, so we'd need some sort of compatibility option.
Since the way to install openshift is to go with ansible playbook. I would add extra validation in ansible to make sure the provided DNS domain is behaving as you like. If not, the playbook should fail and warn the user.
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten /remove-lifecycle stale
This is still an issue. /remove-lifecycle rotten
For minishift this is an issue with some Hypervisor that force a search entry from the DHCP offer. Eg. HyperV on the "default switch" uses search mshome.net
and can cause lookups during S2i to github.com to fail
Note: the options ndots:5
is part of Kubernetes since about 2015 => https://github.com/kubernetes/kubernetes/pull/10266/commits/23caf446ae69236641da0fdc432d4cfb5fff098d#diff-0db82891d463ba14dd59da9c77f4776eR66 (ref: https://github.com/kubernetes/kubernetes/pull/10266)
Same issue with ansible install openshift 3.10
Same for me: ndots:5 makes it substitute domain name (from search line) before checking original address
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/remove-lifecycle stale /lifecycle frozen
Hello, is there a workaround for this? I seem to be facing the same issue with k8s 1.19, coredns and my external domain which is part of the DNS search path, having wildcard match
Name resolution from inside the pod seams to be broken because of multiple factor.
Version
Steps To Reproduce
Look like the
/etc/resolv.conf
file generated by openshift is not working in every scenario.Just to show it's working with something...
This is the /etc/resolv.conf generated in the pod. not working
If I remove my domain name
patrikdufresne.com
. workingAlso working if I remove
ndots:5
.