Closed sdodson closed 6 years ago
it was centos 7.2 on Azure, can provide more input if needed
@FilipVozar after #2112 merged can you see if this works on a clean install? It's a work around but it should address the issue for now.
@FilipVozar is there a particular centos image you used? It doesn't look like there's an official centos image.
@sdodson I'm using image provided by OpenLogic http://www.openlogic.com/products-services/services/cloud-services/azure, that's the only "plain" centos image I could find in the Azure Marketplace.
I created a new host using the same image and did a clean install, dnsmasq was restarted, using correct config and DNS works.
[root@node1 ~]# systemctl status dnsmasq ● dnsmasq.service - DNS caching server. Loaded: loaded (/usr/lib/systemd/system/dnsmasq.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2016-07-05 15:58:27 UTC; 4min 54s ago Main PID: 6852 (dnsmasq) CGroup: /system.slice/dnsmasq.service └─6852 /usr/sbin/dnsmasq -k Jul 05 15:58:27 node1 systemd[1]: Started DNS caching server.. Jul 05 15:58:27 node1 systemd[1]: Starting DNS caching server.... Jul 05 15:58:27 node1 dnsmasq[6852]: started, version 2.66 cachesize 150 Jul 05 15:58:27 node1 dnsmasq[6852]: compile time options: IPv6 GNU-getopt DBus no-i18n IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth Jul 05 15:58:27 node1 dnsmasq[6852]: using nameserver 172.30.0.1#53 for domain cluster.local Jul 05 15:58:27 node1 dnsmasq[6852]: read /etc/hosts - 2 addresses [root@node1 ~]# dig docker-registry.default.svc.cluster.local @localhost ; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> docker-registry.default.svc.cluster.local @localhost ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24506 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;docker-registry.default.svc.cluster.local. IN A ;; ANSWER SECTION: docker-registry.default.svc.cluster.local. 30 IN A 172.30.28.58 ;; Query time: 2 msec ;; SERVER: ::1#53(::1) ;; WHEN: Tue Jul 05 15:59:20 UTC 2016 ;; MSG SIZE rcvd: 75
Actually, it looks like the description of this is wrong, at least based on my testing. Here's the NM debug logs https://gist.github.com/sdodson/1034166301747486549d015991fa40e7 ipv4.dns is set but IP4_NAMESERVERS is empty. Need to see how to get access to that.
@FilipVozar Can you verify that you can still resolve external hosts via dnsmasq?
@sdodson I can't (and I couldn't before, this hasn't changed). /etc/resolv.conf inside pod has 2 nameservers - host IP and nameserver IP inherited from the host. Resolving external domains using host's dnsmasq fails.
@FilipVozar Thanks, it's designed such that dnsmasq should be capable of answering all queries and it should become the only nameserver at the host level, but due to a bug in NetworkManager I believe it's not working properly on Azure. This will be required in the future but it's not fatal today. I'll keep trying to figure out why this isn't working on Azure.
@FilipVozar I think this is another manifestation of https://bugzilla.redhat.com/show_bug.cgi?id=1316138 Can you try updating to NetworkManager-1.0.6-29.el7_2 or later then rebooting your machine? What I'm looking for is the node's /etc/resolv.conf should point at itself (dnsmasq) and /etc/dnsmasq.d/origin-upstream-dns.conf should list the otherwise default nameservers. At that point dnsmasq should be able to resolve all hostnames both cluster.local and external (ie: google.com)
NM 1.0.6-29.el7_2 was installed since the beginning (or openshift-ansible installed it, I only logged in after running ansible).
[root@node1 etc]# yum info NetworkManager Loaded plugins: fastestmirror, langpacks Loading mirror speeds from cached hostfile Installed Packages Name : NetworkManager Arch : x86_64 Epoch : 1 Version : 1.0.6 Release : 29.el7_2 Size : 9.1 M Repo : installed From repo : CentOS-Updates ....
[root@node1 etc]# cat /etc/resolv.conf ; generated by /usr/sbin/dhclient-script search o2pdvy5yxbcu1ds0vouklfrdlg.cx.internal.cloudapp.net nameserver 168.63.129.16
Also there is no /etc/dnsmasq.d/origin-upstream-dns.conf, only /etc/dnsmasq/origin-dns.conf. In /etc/dnsmasq.conf and /etc/dnsmasq.d/origin-dns.conf, these are the only uncommented lines:
[root@node1 etc]# grep -r "^[^#]" /etc/dnsmasq* /etc/dnsmasq.conf:conf-dir=/etc/dnsmasq.d /etc/dnsmasq.d/origin-dns.conf:strict-order /etc/dnsmasq.d/origin-dns.conf:no-resolv /etc/dnsmasq.d/origin-dns.conf:domain-needed /etc/dnsmasq.d/origin-dns.conf:server=/cluster.local/172.30.0.1
possibly a common thing on azure?