trailofbits / algo

Set up a personal VPN in the cloud
https://blog.trailofbits.com/2016/12/12/meet-algo-the-vpn-that-works/
GNU Affero General Public License v3.0
28.65k stars 2.31k forks source link

DNS resolution fails #489

Closed jimmycuadra closed 6 years ago

jimmycuadra commented 7 years ago

OS / Environment

macOS Sierra 10.12.4

Ansible version

2.2.0.0

Version of components from requirements.txt

ansible (2.2.0.0) apache-libcloud (1.5.0) appdirs (1.4.3) asn1crypto (0.21.1) azure (2.0.0rc5) azure-batch (0.30.0rc5) azure-common (1.1.4) azure-graphrbac (0.30.0rc5) azure-mgmt (0.30.0rc5) azure-mgmt-authorization (0.30.0rc5) azure-mgmt-batch (0.30.0rc5) azure-mgmt-cdn (0.30.0rc5) azure-mgmt-cognitiveservices (0.30.0rc5) azure-mgmt-commerce (0.30.0rc5) azure-mgmt-compute (0.30.0rc5) azure-mgmt-keyvault (0.30.0rc5) azure-mgmt-logic (0.30.0rc5) azure-mgmt-network (0.30.0rc5) azure-mgmt-notificationhubs (0.30.0rc5) azure-mgmt-nspkg (1.0.0) azure-mgmt-powerbiembedded (0.30.0rc5) azure-mgmt-redis (0.30.0rc5) azure-mgmt-resource (0.30.0rc5) azure-mgmt-scheduler (0.30.0rc5) azure-mgmt-storage (0.30.0rc5) azure-mgmt-web (0.30.0rc5) azure-nspkg (1.0.0) azure-servicebus (0.20.2) azure-servicemanagement-legacy (0.20.3) azure-storage (0.32.0) boto (2.46.1) boto3 (1.4.4) botocore (1.5.24) certifi (2017.1.23) cffi (1.9.1) chardet (2.3.0) cryptography (1.8.1) docutils (0.13.1) dopy (0.3.5) enum34 (1.1.6) futures (3.0.5) idna (2.5) ipaddress (1.0.18) isodate (0.5.4) Jinja2 (2.8) jmespath (0.9.2) keyring (10.3) MarkupSafe (1.0) msrest (0.4.1) oauthlib (2.0.1) packaging (16.8) paramiko (2.1.2) pip (9.0.1) pyasn1 (0.2.3) pycparser (2.17) pycrypto (2.6.1) pyOpenSSL (16.2.0) pyparsing (2.2.0) python-dateutil (2.6.0) PyYAML (3.12) requests (2.13.0) requests-oauthlib (0.8.0) s3transfer (0.1.10) setuptools (34.3.2) six (1.10.0) wheel (0.29.0)

Summary of the problem

DNS resolution fails when connected to the VPN.

Steps to reproduce the behavior

Install algo on a DigitalOcean server following the README. Opt in to the ad blocking DNS server but no other optional components.

The way of deployment (cloud or local)

Cloud (DigitalOcean)

Expected behavior

DNS resolution works.

Actual behavior

All DNS resolution fails:

$ nslookup google.com
;; connection timed out; no servers could be reached

/etc/resolv.conf no longer exists:

$ cat /etc/resolv.conf
cat: /etc/resolv.conf: No such file or directory

As soon as I disconnect from the VPN, /etc/resolv.conf reappears with the DNS servers configured for the Wi-Fi interface in the system's network preferences.

I tried manually setting the DNS server to 172.16.0.1 in the system's network preferences for both my main Wi-Fi interface and the algo VPN, and tried once with only one or the other configured, but it DNS resolution fails in all cases.

I also noticed that no IPv4 address is given to my machine when connected to the VPN, whereas my Internet connection at my hotel in London only provides my machine an IPv4 address.

$ ifconfig
# ... snip ...
ipsec0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1400
    inet6 fe80::f60f:24ff:fe3a:827a%ipsec0 prefixlen 64 scopeid 0xf
    inet6 fd9d:bc11:4020::102 prefixlen 64
    nd6 options=201<PERFORMNUD,DAD>

I believe everything was working correctly at first but began failing at some point later, because I used the VPN from my iPhone 6 (installed from the same mobileconfig file) earlier and it worked as expected, but later, after I was noticing problems from macOS, it no longer worked on the phone.

Full log

I didn't save the full log when I installed algo, because I wasn't expecting anything to go wrong. I did save the final output, though:

ok: [REDACTED_IPV4_ADDRESS] => {
    "msg": [
        [
            "\"#                          Congratulations!                            #\"",
            "\"#                     Your Algo server is running.                     #\"",
            "\"#    Config files and certificates are in the ./configs/ directory.    #\"",
            "\"#              Go to https://whoer.net/ after connecting               #\"",
            "\"#        and ensure that all your traffic passes through the VPN.      #\"",
            "\"#               Local DNS resolver 172.16.0.1              #\"",
            ""
        ],
        "    \"#                The p12 and SSH keys password is REDACTED     #\"\n",
        "    ",
        "    \"#      Shell access: ssh -i configs/algo.pem root@REDACTED_IPV4_ADDRESS        #\"\n"
    ]
}
ghost commented 7 years ago

I've been having this issue as well, with the exact same configuration. It seemed to happen intermittently, and restarting the DigitalOcean droplet seemed to fix it for a while, then DNS resolution would fail again (I can tell it is DNS failure because my phone and laptop cannot find the servers of the websites I was attempting to reach). I also noticed I was only receiving an IPv6 address from Algo, which I don't know if that would have anything to do with it but it's worth noting.

salvage1 commented 7 years ago

Me too. Multiple configurations tried on Digital Ocean but all drop after a few hours on my Mac and iOS devices. Restarting the droplet fixes it for a few more hours.

dguido commented 7 years ago

Looks like we might have a bug in the DNS adblocking role. I wonder if it's related to our removal of the proxy adblocking role a few days ago.

purduepete commented 7 years ago

I've had the same exact problem with both Digital Ocean and Azure, on iOS.

I've noticed that when I first connect to the VPN, the IP address allocated, is IPv4, however after a day, upon reconnecting, the address allocateced is IPv6, which seems to cause the DNS resolution problems.

sgasean commented 7 years ago

Yes have also the same issue - using it on a dedicated ubuntu 16.04 server. First it works great but than it shows the same issue assigning IPv6 addresses. Hope this can be solved soon. Thanks for the great work.

jimmycuadra commented 7 years ago

I destroyed the VPN server and created a new one without opting in to the DNS server and experienced the same problem from both macOS and iOS: It worked at first, but sometime roughly 12 hours later all DNS resolution began failing.

This time I saved the full Ansible output: https://gist.github.com/jimmycuadra/3aea518d8237cff498e78e9fe7bed36d

jackivanov commented 7 years ago

Seems, related to #437, as I warned we may get the depletion of the ipv4 pool Folks, check ipsec status, how much connection do you see? Then, without restarting strongswan, check the logs in realtime: journalctl -f -u strongswan, try to connect and put the logs here

jackivanov commented 7 years ago

Show this also: journalctl -u strongswan | grep "pool"

sgasean commented 7 years ago

macarne@7781010-2201:~$ journalctl -u strongswan | grep "pool" Apr 27 04:38:55 7781010-2201 charon[9243]: 12[CFG] pool '10.19.48.0/24' is full, unable to assign address Apr 27 04:43:19 7781010-2201 charon[9243]: 10[CFG] pool '10.19.48.0/24' is full, unable to assign address Apr 27 04:54:20 7781010-2201 charon[9243]: 11[CFG] pool '10.19.48.0/24' is full, unable to assign address Apr 27 04:58:05 7781010-2201 charon[9243]: 06[CFG] pool '10.19.48.0/24' is full, unable to assign address Apr 27 05:09:47 7781010-2201 charon[9243]: 15[CFG] pool '10.19.48.0/24' is full, unable to assign address Apr 27 05:12:48 7781010-2201 charon[9243]: 12[CFG] pool '10.19.48.0/24' is full, unable to assign address Apr 27 05:22:20 7781010-2201 charon[9243]: 07[CFG] pool '10.19.48.0/24' is full, unable to assign address Apr 27 05:27:32 7781010-2201 charon[9243]: 09[CFG] pool '10.19.48.0/24' is full, unable to assign address Apr 27 05:37:57 7781010-2201 charon[9243]: 08[CFG] pool '10.19.48.0/24' is full, unable to assign address

bradleyhd commented 7 years ago

^ I'm experiencing the same issue (set up on Digital Ocean last night, DNS stopped working a little while ago). journalctl -u strongswan | grep "pool" output is identical to that of @sgasean

dguido commented 7 years ago

Thanks, we correctly predicted the issue. The virtual IP pool is getting filled up. This is due to https://github.com/trailofbits/algo/pull/437 and the fix is https://github.com/trailofbits/algo/issues/494.

jackivanov commented 7 years ago

You can use Ubuntu 17.04 until we fix that

purduepete commented 7 years ago

After upgrade to Ubuntu 17.04 on Azure, everything is working well, and this is no longer an issue for me.

Thanks for everyone input!

JohnTroony commented 7 years ago

Did anyone get a solid fix for this issue other than moving to Ubuntu 17.04?

GordeevD commented 7 years ago

Experiencing same issue, after night stops resolving DNS. Any ideas how to fix by Ubuntu 16.04?

sachitv commented 6 years ago

I'm having this issue on latest. Anyone have a viable fix for this without an upgrade to 17.04?

giooootis commented 6 years ago

Seems like the issue is still present for Ubuntu 16.04? Any clue on how to fix it without upgrading to an non-LTS version?

jbwhaley commented 6 years ago

Yes, this is still an issue, and upgrading to 17.04 is no longer even possible on Digital Ocean. Algo doesn't work (yet?) with 17.10, so apparently we're out of luck if we want to use Digital Ocean.

giooootis commented 6 years ago

I can confirm the above, i upgraded to 17.10 today in order to resolve this issue and the problem persists. Basically, Digital Ocean is not an option at the current time.

jackivanov commented 6 years ago

@jbwhaley @giooootis Can you provide some debug (you can take the commands above as a reference)?

jbwhaley commented 6 years ago

@gunph1ld For a while I was using Algo on a Digital Ocean box running Ubuntu 17.04, because 16.04 simply didn't work when using with more than one device. The option to build on 17.04 completely resolved the issue. However, the last time I went to build a new instance running 17.04 there were lots of errors, and eventually I discovered that: A) Digital Ocean no longer allows creation of droplets with 17.04, but they do allow 17.10; B) Algo dropped 17.04 support without adding support for 17.10. This left me with no working alternative using Digital Ocean.

At this point, I went with an AWS EC2 instance, because they allow 17.04, which seems to be the minimum usable Ubuntu release for use with multiple devices...but it isn't great for my needs in a lot of ways (mainly the fact that tons of hosts block EC2 traffic), and I wish I could go back to Digital Ocean.

I wish I still had some of the error logs so I could help you track down the root cause of the issue, but I do know that there is no issue at all on Ubuntu 17.04, and presumably not on 17.10 either. Any chance we might see 17.10 support anytime soon?

davidemyers commented 6 years ago

The way I avoid lease depletion issues is to create separate certificates for each of my devices and set uniqueids=yes in /etc/ipsec.conf on the VPN server. I've had no issues running with 16.04 on DigitalOcean for weeks at a time.