mesosphere / mesos-dns

DNS-based service discovery for Mesos.
https://mesosphere.github.com/mesos-dns
Apache License 2.0
484 stars 137 forks source link

Should we return NXDOMAIN + AA during initialization #344

Open sargun opened 8 years ago

sargun commented 8 years ago

So, when Mesos-DNS starts, before it loads the state.json from Mesos, it replies to a result of leader.mesos as follows:

3c075477e55e:mesos-dns sdhillon$ dig -p8053 -t A @127.0.0.1 leader.mesos

; <<>> DiG 9.8.3-P1 <<>> -p8053 -t A @127.0.0.1 leader.mesos
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 62516
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;leader.mesos.          IN  A

;; AUTHORITY SECTION:
leader.mesos.       60  IN  SOA ns1.mesos. root.ns1.mesos. 1447404611 60 600 86400 60

;; Query time: 0 msec
;; SERVER: 127.0.0.1#8053(127.0.0.1)
;; WHEN: Fri Nov 13 00:50:18 2015
;; MSG SIZE  rcvd: 75

In some environments loading the state.json can take up to 30 seconds! Given this, hosts can get a result from a recently brought up mesos-dns that's incorrect. I don't know if this is the right behaviour.

Questions:

  1. Should we respond to DNS at all?
  2. Should we be authoritative for .mesos before the zone is loaded?
  3. Should we return a DNS error code, like ServFail, or Refused?
  4. Is it okay that if a mesos-dns daemon is restarted, it may result in all Mesos service discovery being unavailable?
jdef commented 8 years ago

mesos-dns can be configured to pull masters from ZK, or with a static list of masters. Why is leader.mesos resolution dependent upon loading state.json at all? If we have a static list of masters we should be able to respond right away. The next question becomes: if we don't have a static list of masters, and we haven't received a list from the ZK detector -- what's the best course of action?

On Fri, Nov 13, 2015 at 3:55 AM, Sargun Dhillon notifications@github.com wrote:

So, when Mesos-DNS starts, before it loads the state.json from Mesos, it replies to a result of leader.mesos as follows:

3c075477e55e:mesos-dns sdhillon$ dig -p8053 -t A @127.0.0.1 leader.mesos

; <<>> DiG 9.8.3-P1 <<>> -p8053 -t A @127.0.0.1 leader.mesos ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 62516 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION: ;leader.mesos. IN A

;; AUTHORITY SECTION: leader.mesos. 60 IN SOA ns1.mesos. root.ns1.mesos. 1447404611 60 600 86400 60

;; Query time: 0 msec ;; SERVER: 127.0.0.1#8053(127.0.0.1) ;; WHEN: Fri Nov 13 00:50:18 2015 ;; MSG SIZE rcvd: 75

In some environments loading the state.json can take up to 30 seconds! Given this, hosts can get a result from a recently brought up mesos-dns that's incorrect. I don't know if this is the right behaviour. Questions:

  1. Should we respond to DNS at all?
  2. Should we be authoritative for .mesos before the zone is loaded?
  3. Should we return a DNS error code, like ServFail, or Refused?

— Reply to this email directly or view it on GitHub https://github.com/mesosphere/mesos-dns/issues/344.

sargun commented 8 years ago

@jdef leader.mesos is only one example - another thing would be any of the records generated from the master.

sargun commented 8 years ago

I'm going to delegate this to understand if this is a concern to @brndnmtthws and co.