mesosphere / mesos-dns

DNS-based service discovery for Mesos.
https://mesosphere.github.com/mesos-dns
Apache License 2.0
484 stars 137 forks source link

mesos-dns not resolving external DNS #343

Open amulyas opened 8 years ago

amulyas commented 8 years ago

My config is this ip-172-18-8-189 core # cat /opt/mesosphere/etc/mesos-dns.json { "zk": "zk://127.0.0.1:2181/mesos", "refreshSeconds": 30, "ttl": 60, "domain": "mesos", "port": 53, "resolvers": ["10.0.0.2","172.1.1.2"], "timeout": 5, "listener": "0.0.0.0", "email": "root.mesos-dns.mesos" }

but while running jobs in chronos. Job can not resolve DNS name at the DNS 172.1.1.2 .. I upgraded mesos-dns version 0.4 still issue .. not sure how can it be fixed

What version of the project are you using? mesossphere for AWS Cloud Formation What operating system and processor architecture are you using? Core OS and on AWS What did you do? Added external DNS in my mesos-dns config restart mesos dns process upgrade mesos dns dns can not be resolved inside jobs on chronos

What did you expect to see? What did you see instead?

sargun commented 8 years ago

@amulyas What do you mean, it cannot resolve the DNS name at DNS 172.1.1.2?

Can you please post the output of the following commands:

dig @localhost google.com
dig @localhost leader.mesos
dig @10.0.0.2 google.com
dig @172.1.1.2 google.com

-Thanks!

mindscratch commented 8 years ago

@amulyas is the chronos job running in docker? If yes, is the docker client on the host where the chronos job runs configured with --dns ipaddr where ipaddr is the ip address of the Mesos End server.

amulyas commented 8 years ago

@mindscratch I tried that but the IP address I added to the --dns is external DNS server IP address what is mesos end server I have three master cluster where anyone can act as master ?

amulyas commented 8 years ago

@sargun thank you .. I have a DNS server running outsie mesos at 172.x address I added it in the resolvers list on mesos-dns-json on all three masters and restarted mesos-dns .. even upgraded to latest release but still its not resolving domain name like example.com with in chronos jobs.. jobs running on docker container with chronos

sargun commented 8 years ago

@amulyas Can you please run the commands I posted and then their output on your mesos-DNS server?

amulyas commented 8 years ago

@sargun thanks for help mesos-dns is installed as part of mesossphere aws cfn and all masters are core of do you know how can I get dig there?

sargun commented 8 years ago

@amulyas You can use the CoreOS toolbox to do this: https://coreos.com/os/docs/latest/install-debugging-tools.html

amulyas commented 8 years ago

@sargun thanks once again I have not get time to try what you suggest I will send you output of these commands.. but if I manually add DNS IP on /etc/resolv.conf I can ping internal hostname but if I add DNS ip in resolvers in mesos-dns config I can not ping internal hostname..
goal is that jobs running in docker inside chronos should be able to resolve internal host name

amulyas commented 8 years ago

@sargun here is the command outputs

[root@ip-172-18-8-189 ~]# dig @localhost google.com

; <<>> DiG 9.10.3-RedHat-9.10.3-2.fc23 <<>> @localhost google.com ; (2 servers found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 61104 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;google.com. IN A

;; ANSWER SECTION: google.com. 285 IN A 216.58.217.142

;; Query time: 1 msec ;; SERVER: ::1#53(::1) ;; WHEN: Sat Nov 14 03:50:12 UTC 2015 ;; MSG SIZE rcvd: 55

[root@ip-172-18-8-189 ~]# dig @localhost leader.mesos

; <<>> DiG 9.10.3-RedHat-9.10.3-2.fc23 <<>> @localhost leader.mesos ; (2 servers found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2632 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION: ;leader.mesos. IN A

;; ANSWER SECTION: leader.mesos. 60 IN A 172.18.8.189

;; Query time: 0 msec ;; SERVER: ::1#53(::1) ;; WHEN: Sat Nov 14 03:50:18 UTC 2015 ;; MSG SIZE rcvd: 46

[root@ip-172-18-8-189 ~]# dig @10.0.0.2 google.com

I have change the CIDR blocks from 10.x to 172.18.x ; <<>> DiG 9.10.3-RedHat-9.10.3-2.fc23 <<>> @172.18.8.2 google.com ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31489 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;google.com. IN A

;; ANSWER SECTION: google.com. 257 IN A 216.58.217.142

;; Query time: 0 msec ;; SERVER: 172.18.8.2#53(172.18.8.2) ;; WHEN: Sat Nov 14 03:50:40 UTC 2015 ;; MSG SIZE rcvd: 55

[root@ip-172-18-8-189 ~]# dig 172.16.8.2

; <<>> DiG 9.10.3-RedHat-9.10.3-2.fc23 <<>> 172.16.8.2 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 7527 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;172.16.8.2. IN A

;; AUTHORITY SECTION: . 60 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2015111301 1800 900 604800 86400

;; Query time: 3 msec ;; SERVER: 172.18.8.187#53(172.18.8.187) ;; WHEN: Sat Nov 14 03:51:06 UTC 2015 ;; MSG SIZE rcvd: 114

[root@ip-172-18-8-189 ~]# dig @172.16.8.2 google.com

; <<>> DiG 9.10.3-RedHat-9.10.3-2.fc23 <<>> @172.16.8.2 google.com ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54645 ;; flags: qr rd ra; QUERY: 1, ANSWER: 11, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4000 ;; QUESTION SECTION: ;google.com. IN A

;; ANSWER SECTION: google.com. 7 IN A 173.194.33.133 google.com. 7 IN A 173.194.33.128 google.com. 7 IN A 173.194.33.131 google.com. 7 IN A 173.194.33.136 google.com. 7 IN A 173.194.33.134 google.com. 7 IN A 173.194.33.132 google.com. 7 IN A 173.194.33.135 google.com. 7 IN A 173.194.33.130 google.com. 7 IN A 173.194.33.142 google.com. 7 IN A 173.194.33.129 google.com. 7 IN A 173.194.33.137

;; Query time: 86 msec ;; SERVER: 172.16.8.2#53(172.16.8.2) ;; WHEN: Sat Nov 14 03:51:26 UTC 2015 ;; MSG SIZE rcvd: 215

sargun commented 8 years ago

So, from the dig commands, it looks like your state is healthy. Can you try to explain in a bit more detail what you're trying to do, and where you're hitting into a problem.

amulyas commented 8 years ago

@sargun sure thanks for your reply .. I have mesossphere cluster running in AWS connected to remote colo via IPsec .. I would like mesos slave to resolve dns entries in colo .. so there is a DNS server which I would like to add as external resolver ... I added it in meos-dns config and restarted mesos-dns but when I do ping I see ping: unknown ..

this is from Debug docker

[root@ip-172-18-8-189 ~]# ping asc.xyx.com ping: unknown host asc.xyx.com [root@ip-172-18-8-189 ~]# dig asc.xyx.com

; <<>> DiG 9.10.3-RedHat-9.10.3-2.fc23 <<>> asd.xyx.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 46422 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;devapp1.synarc.com. IN A

;; AUTHORITY SECTION: xyx.com. 52 IN SOA NS87.WORLDNIC.com. namehost.WORLDNIC.com. 114070314 10800 3600 604800 3600

;; Query time: 1 msec ;; SERVER: 172.18.8.187#53(172.18.8.187) ;; WHEN: Sat Nov 14 06:09:23 UTC 2015 ;; MSG SIZE rcvd: 106

[root@ip-172-18-8-189 ~]# ping google.com PING google.com (74.125.228.194) 56(84) bytes of data. 64 bytes from iad23s23-in-f2.1e100.net (74.125.228.194): icmp_seq=1 ttl=54 time=1.93 ms 64 bytes from iad23s23-in-f2.1e100.net (74.125.228.194): icmp_seq=2 ttl=54 time=1.97 ms

amulyas commented 8 years ago

@sargun dig @ domain.name.com works but ping domain.name.com does not work .. I have set the DNS IP in resolvers of mesos DNS

sargun commented 8 years ago

@amulyas Can you post your /etc/resolv.conf?

amulyas commented 8 years ago

@sargun sure here it is 187 and 189 are where mesos master as I have nameserver[:2] in gen_resolveconf.py are running and .. 172.18.8.2 is the second IP from the base range of mesos vpc in AWS

options timeout:1 options attempts:3 nameserver 172.18.8.187 nameserver 172.18.8.189 nameserver 172.18.8.2

amulyas commented 8 years ago

here is my config.json for mesos DNS I bumped the timeout just in case if thats the problem { "zk": "zk://127.0.0.1:2181/mesos", "refreshSeconds": 30, "ttl": 60, "domain": "mesos", "port": 53, "resolvers": ["172.18.8.2","172.16.8.2","172.16.10.5"], "timeout": 25, "listener": "0.0.0.0", "email": "root.mesos-dns.mesos" }

amulyas commented 8 years ago

Nov 14 23:42:18 ip-172-18-8-188.ec2.internal mesos-dns[15911]: VERY VERBOSE: 2015/11/14 23:42:18 logging.go:75: {MesosRequests:372 MesosSuccess:371 MesosNXDomain:1 MesosFailed:0 NonMesosRequests:0 NonMesosSuccess:0 NonMesosNXDomain:0 NonMesosFailed:0 NonMesosForwarded:0}

Non mesos forwarded should not be 0 I think ...

sargun commented 8 years ago

@amulyas Can you provide PCAP for all port 53, udp packets on the machine running Mesos-DNS?