vagrant-landrush / landrush

A Vagrant plugin that provides a simple DNS server for Vagrant guests
MIT License
666 stars 78 forks source link

Stop using dnsmasq and a non-standard DNS port on Linux #252

Open bexelbie opened 8 years ago

bexelbie commented 8 years ago

Users of libvirt on linux boxes run into the following problems:

  1. The default port for landrush on linux is 10053, therefore the entry in resolv.conf isn't usable as it assumes port 53

See: https://github.com/vagrant-landrush/landrush/blob/master/lib/landrush/server.rb#L37

  1. Entries in /etc/dnsmasq.d are ignored because dnsmasq has already been started by libvirt and it is using a private configuration directory.

We could resolve this by just using port 53. If port 53 is blocked on 127.0.0.1 we should just bind to other ips in the 127.0.0.0/8 which is reserved for loopback per https://tools.ietf.org/html/rfc3330

Note: This is a change to behavior. Today we try to bind to all interfaces, see: https://github.com/vagrant-landrush/landrush/blob/master/lib/landrush/server.rb#L55

This IP address could then be written to resolv.conf and we won't need dnsmasq or firewall rules. This seems to be a general solution.

Extensions that may solve different versions of this problem are:

  1. We could detect libvirt and libvirt's dnsmasq and use it. (This may have unintended consequences and creates a dependency)
  2. We could check for dnsmasq running and using /etc/dnsmasq.d and use it in lieu of the above.
  3. We could use network manager to add a global DNS configuration for our server and port combination

As part of this we could also extend server.rb to continuously check /etc/resolv.conf for its entry and to replace it whenever network manager rewrites the resolv.conf. The more elegant solution is to talk to networkmanager.

Feedback and comments welcomed.

njam commented 8 years ago

I agree it would simplify many things. We just need sudo then I guess?

About binding to another IP. I understand the current way of resolving from inside virtualbox VMs is to forward all DNS traffic to 10.0.2.3:10053 using iptables. Would we still be able to reach landrush this way?

bexelbie commented 8 years ago

We just need sudo then I guess?

We already need it to make the entry in /etc/resolv.conf so this isn't new. We may need to choose to keep privileges if we don't want to use networkmanager so we can make sure /etc/resolv.conf stays accurate.

About binding to another IP. I understand the current way of resolving from inside virtualbox VMs is to forward all DNS traffic to 10.0.2.3:10053 using iptables. Would we still be able to reach landrush this way?

You're right :). My mind was elsewhere. I was also forgetting to put the -t nat in sudo iptables -t nat -L -n when looking at iptables rules. It was a bad day :)

That said, I think that this should be modified to just bind on the single network address for the NAT and not everything and still shift back to 53.

njam commented 8 years ago

Ok makes sense!

Maybe we could do the port 53 binding additionally to keep it backward compatible.

erickeller commented 8 years ago

same issue here, as a workaround I use the following dirty commands to start dnsmasq prior to libvirt-bin service:

#!/bin/sh
if ! systemctl status dnsmasq | grep -qa running
then
    sudo systemctl stop dnsmasq.service
    sudo systemctl stop libvirt-bin.service
    sleep 1
    sudo pkill --signal=TERM dnsmasq
    sudo systemctl start dnsmasq.service
    sudo systemctl start libvirt-bin.service
fi

it seems you have a better idea, do you have any manual workaround configuring libvirt not to start dnsmasq to share?

Cheers

bexelbie commented 8 years ago

@erickeller my thoughts are around stopping the use of dnsmasq entirely on the host. This way we don't have the libvirt issue at all. (which you can solve by modifying the dnsmasq.conf to use "interface=lo" and "bind-interfaces", iirc)

The could, potentially, also remove the need for iptables rules in the virtual machine.

hferentschik commented 7 years ago

Ok, trying to catch up here. What is the actual problem here and how does it manifest itself.

The default port for landrush on linux is 10053, therefore the entry in resolv.conf isn't usable as it assumes port 53

I don't get this part. The entry in resolv.conf points to the dnsmasq server which in turn has the Landrush DNS configured. It is in the dnsmasq config where we point to the Landrush server running on port 10053.

Entries in /etc/dnsmasq.d are ignored because dnsmasq has already been started by libvirt and it is using a private configuration directory.

The scripts restart dnsmasq, so the config should get picked up, right?

I thought @praveenkumar tested this and it worked for him.

Automating the host DNS configuration is a new thing and I am all up for improvements. However, we should not get overboard here. OS X and Windows are "simple", since there is only on way of doing it. With Linux and all its different versions and flavors I want to avoid to use a different approach for each. Using dnsmasq was my first cut, since this approach was also mentioned in the Landrush docs. If someone has a better idea, let's create a prototype. I fear on though that on Linux we are opening a can of worms here. We might be better off providing a simple "works most of the time" solution and in other cases let the user turn off the automatic configuration (which he already can do) and do whatever she likes.

That said, what are the steps to re-produce this issue?

hferentschik commented 7 years ago

We could resolve this by just using port 53.

I don't think that a user process should use ports in this range. This are really privileged ports. Can resolv.conf not handle custom ports as well?

praveenkumar commented 7 years ago

That said, what are the steps to re-produce this issue?

Below you can see that dnsmasq service is not running but libvirtd is active and it actually using dnsmasq port which actually block it.

$ systemctl status dnsmasq
● dnsmasq.service - DNS caching server.
   Loaded: loaded (/usr/lib/systemd/system/dnsmasq.service; disabled; vendor preset: disabled)
   Active: inactive (dead)

$ systemctl status libvirtd
● libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; disabled; vendor preset: enabled)
   Active: active (running) since Fri 2016-09-16 17:56:59 IST; 14min ago
     Docs: man:libvirtd(8)
           http://libvirt.org

# $ ps aux | grep dnsmasq
nobody    1701  0.0  0.0  51140  1528 ?        S    Jul21   0:05 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/docker-machines.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
root      1702  0.0  0.0  51112  1592 ?        S    Jul21   0:00 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/docker-machines.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
nobody    1791  0.0  0.0  51140  1620 ?        S    Jul21   0:10 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
root      1793  0.0  0.0  51140  1692 ?        S    Jul21   0:00 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
nobody    1896  0.0  0.0  51140  1928 ?        S    Jul21   0:00 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/vagrant.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
root      1897  0.0  0.0  51140   356 ?        S    Jul21   0:00 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/vagrant.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
prkumar  31284  0.0  0.0 118496   848 pts/18   S+   17:57   0:00 grep --color=auto dnsmasq

# netstat -ntpl
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 192.168.100.1:53        0.0.0.0:*               LISTEN      1896/dnsmasq        
tcp        0      0 192.168.110.1:53        0.0.0.0:*               LISTEN      1791/dnsmasq        
tcp        0      0 192.168.42.1:53         0.0.0.0:*               LISTEN      1701/dnsmasq        
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1316/sshd           

I thought @praveenkumar tested this and it worked for him.

I did test it and followed what @erickeller put in the script.

hferentschik commented 7 years ago

So the problem seems to be that libvirt actually uses dnsmasq to service the virtual networks which can lead to conflicts - http://wiki.libvirt.org/page/Libvirtd_and_dnsmasq.

It seems one can indeed work around this by forcing the global dnsmasq instance to a specific interface, for example the loopback one, however this requires changes to the global dnsmasq.conf .

@bexelbie,

my thoughts are around stopping the use of dnsmasq entirely on the host. This way we don't have the libvirt issue at all. (which you can solve by modifying the dnsmasq.conf to use "interface=lo" and "bind-interfaces", iirc)

The could, potentially, also remove the need for iptables rules in the virtual machine.

Can you elaborate on that.