vagrant-landrush / landrush

A Vagrant plugin that provides a simple DNS server for Vagrant guests
MIT License
666 stars 78 forks source link

Support recursive DNS #281

Open JPvRiel opened 7 years ago

JPvRiel commented 7 years ago

TL;DR, I think recursive DNS limitations with landrush can cause pain on Linux when using dnsmasq with libvirt and NetworkManager, and the default of guest redirection via iptables to use the landrush.

Key issue from VM guest with landrush defaults

$ dig -p 10053 @127.0.0.1 www.google.com
...
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 11678
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available
...

Workaround

config.landrush.guest_redirect_dns = false avoids the pain!

But then, when set to false:

Here's an example where, with false, form within a guest, it doesn't resolve the VM host server IP correctly.

$ nslookup <VM server hostname>
Server:     192.168.121.1
Address:    192.168.121.1#53

Name:   <VM server hostname>
Address: 127.0.1.1

And the default true:

$ nslookup <VM server hostname>
Server:     192.168.121.1
Address:    192.168.121.1#53

** server can't find <VM server hostname>: SERVFAIL

Potentially Related Issues

Originally, I had the same symptoms as #198. No matter which host I ping, landrush seemed to end up 'wildcarding' the FQDNs of external hosts and appending the configured 'local' TLD (in my case, vagrant.test). Might be to do with search vagrant.test being put into /etc/resolve.conf for guests...

And then there are extra complications noted... which relate more to #252 and possibly #174.

More or less default / minimal config causes this upstream DNS resolution bug

A fair bit of verbose context/info - jump down to the dig command that backs up what I saw in network packet captures. landrush DNS (with my stack) can't handle recursive queries.

Vagrantfile:

  config.landrush.enabled = true
  config.landrush.tld = 'vagrant.test'

/etc/NetworkManager/dnsmasq.d/vagrant-landrush (because Ubuntu, like Fedora, ships with NetworkManager, which already has dnsmasq plugged in)

server=/vagrant.test/127.0.0.1#10053

libvirt provides DNS on the virbr1 network spooled up by the vagrant libvirt provider. On the guest VM:

$ cat /etc/resolv.conf 
# Generated by NetworkManager
search vagrant.test
nameserver 192.168.121.1

libvirt is also using dnsmasq... So yay, three layers of dnsmasq that need to play nice together, landrush -> libvirt -> NetworkManager :-/

On the host, various DNS services are listening

$ sudo netstat -lntp | grep 53
tcp        0      0 0.0.0.0:10053           0.0.0.0:*               LISTEN      5179/ruby       
tcp        0      0 192.168.121.1:53        0.0.0.0:*               LISTEN      5155/dnsmasq    
tcp        0      0 127.0.1.1:53            0.0.0.0:*               LISTEN      3321/dnsmasq    

By the way, not sure why landrush decides to run on all interfaces!? 0.0.0.0? Why not just the network vagrant is provisioning (i.e. 192.168.121.1). Maybe something to do with config.landrush.host_redirect_dns (and I should probably file a separate bug for this, I digress)

Checking what happened with iptables on the VM host shows another potential mess with multiple allows for both UDP and TCP.

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0            udp dpt:53
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:53
...

And on the guest

# iptables -t nat -L -n
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
DNAT       tcp  --  0.0.0.0/0            192.168.121.1        tcp dpt:53 to:192.168.121.1:10053
DNAT       udp  --  0.0.0.0/0            192.168.121.1        udp dpt:53 to:192.168.121.1:10053

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         

On the host, www.google.com resovles fine via libvirts dns, e.g.

$ nslookup www.google.co.za 192.168.121.1
Server:     192.168.121.1
Address:    192.168.121.1#53

Non-authoritative answer:
Name:   www.google.co.za
Address: 216.58.223.3

On the libvirt guest, it fails, oddly, with the TLD appended:

# nslookup www.google.com
Server:     192.168.121.1
Address:    192.168.121.1#53

** server can't find www.google.com.vagrant.test: SERVFAIL

When doing a packet trace on the virbr1 (vagrant provisioned) interface of the VM host, with nslookup from the guest (192.168.121.102), I observed multiple DNS query attempts:

  1. 1st go (doesn't append the landrush TLD), e.g. 192.168.121.102 -> 192.168.121.1:10053
    • DNS query from guest IP for www.google.com: type A, class IN to landrush DNS on host (port 10053) listening on all interfaces, including the VM host interface (192.168.121.1)
      • has 0x0100 flags
      • asking for recursion
      • indicating non-authenticated data is unacceptable
    • DNS response from landrush DNS on host seems to suggest that a recursive DNS query is not permitted
      • has 0x8502 flags
      • recursion not allowed
      • answer not authenticated
  2. 2nd go (does append the landrush TLD)
    • Same as above, except now DNS query from guest IP for www.google.com.vagrant.test: type A, class IN
    • probably default cold logic to try append the landrush TLD if the first attempt failed?

Quereis didn't make it to 127.0.1.1:53 (NetworkManager's dnsmasq, and later I also test upstream)

When using nslookup, from the host, I noticed this (working) behaviour where queries did make it to 127.0.1.1:53 (the NetworkManager's dnsmasq):

  1. DNS query from host via host to itself on the virbr1 interface 192.168.121.1 -> 192.168.121.1:53
    • flags in response from DNS service say recursion is allowed!
  2. Triggers a forwarded (recursive) DNS query from the dnsmasq part on 127.0.0.1 to 127.0.1.1:53
    • 127.0.1.1 must have then quired the upstream DNS (as managed by NetworkManager) and responded correclty
  3. 192.168.121.1 reponds to itself.

Reading the man page for dnsmasq, I noticed the following:

Dnsmasq is a DNS query forwarder: it it not capable of recursively answering arbitrary queries starting from the root servers but forwards such queries to a fully recursive upstream DNS server which is typically provided by an ISP

So at a guess, landrush -> libvirt -> NetworkManager causes issues with a recursive DNS query? To confirm this, I also poked at landrush from the VM host:

$ dig -p 10053 @127.0.0.1 www.google.com

; <<>> DiG 9.10.3-P4-Ubuntu <<>> -p 10053 @127.0.0.1 www.google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 11678
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;www.google.com.            IN  A

;; Query time: 2003 msec
;; SERVER: 127.0.0.1#10053(127.0.0.1)
;; WHEN: Thu Oct 20 22:59:54 SAST 2016
;; MSG SIZE  rcvd: 32

I also hacked in config.landrush.upstream '127.0.1.1' to explicitly get landrush to target NetworkManager's dnsmasq, but no luck. Also tried real upstream DNS servers found via:

for d in $(nmcli device show | grep -E "^IP4.DNS" | grep -oP '(\d{1,3}\.){3}\d{1,3}'); do echo $d; done

Doesn't work. Seems landrush doesn't pass on recursive DNS, even directly to upstream!

All the above, was with the following setup (I try keep to base/stable repo's as far as possible):

madhavajay commented 7 years ago

Thanks so much, this fixed my issue where landrush can't resolve outside domains from inside the guest.

config.landrush.guest_redirect_dns = false

Seems like theres a few issues right now with landrush and vagrant etc, would be nice when they are resolved.

For now at least I can confirm that this works with: Vagrant 1.8.7 Version 5.1.10 r112026 (Qt5.6.2) landrush (1.2.0)