mesosphere / mesos-dns

DNS-based service discovery for Mesos.
https://mesosphere.github.com/mesos-dns
Apache License 2.0
484 stars 137 forks source link

Review system DNS resolver usage #252

Open tsenart opened 9 years ago

tsenart commented 9 years ago

From https://github.com/mesosphere/mesos-dns/pull/251#issuecomment-138824388

  1. Go 1.4 and Go 1.5 have a global sync.Mutex protecting the static host cache read from /etc/hosts. This can hurt our throughput with large numbers of concurrent queries.
  2. Go 1.4 and Go 1.5 have a global static host cache timeout of 5 minutes. No changes to /etc/hosts are available for a maximum of that period. As for availability of a recursing DNS server goes, this can go pretty bad. If operators need to change configuration of static hosts, they'd have to restart Mesos-DNS for it to pickup the new static hosts entries immediately anyways.
  3. In case the given name isn't found in the static hosts cache, a DNS query will be issued against the system configured DNS servers (read from /etc/resolv.conf), only to find the IP of the resolver we want to forward our query to. Hence, for such simple query, we have to perform one extra round-trip (or even two, in case of truncation where we have to fallback to TCP), which besides hurting performance, can hurt our availability.
  4. From an operational simplicity standpoint, having configuration be static throughout the lifetime of a process, is a major advantage. Side-effect free programs are invariantly easier to debug than ones with lots of side-effects and global mutable state.

In sum, in my view, the extra flexibility of deferred resolver resolution isn't worth the costs. I'll also be revisiting these same conclusions in other places in the codebase where we're relying on the system resolver!

sargun commented 8 years ago

I think that 1-2 this will be apparent during scale testing. xref #351.