mullvad / mullvadvpn-app

The Mullvad VPN client app for desktop and mobile
https://mullvad.net/
GNU General Public License v3.0
5.08k stars 338 forks source link

Internet is blocked after sleep #2477

Open yukkyma opened 3 years ago

yukkyma commented 3 years ago

Issue report

Operating system: Mac OS Big Sur 11.1

App version: 2021.1

Issue description

When waking my MacBook Pro up from sleep Mullvad is blocking the internet and won’t allow it to access if I don’t force quit Mullvad-daemon in the activity manager or restart my computer. This started to happen when I updated from 2020.4 and every version after 2020.5. I have tried leaving my computer for about 10 minutes after that started it from sleep but it still wouldn’t connect. I have tried to leave Mullvad disconnected when I put the computer to sleep but it still said that the internet was blocked.

pinkisemils commented 3 years ago

This is not what we've come to expect from macOS - there is an expected delay after wakeup where the captive portal check has to timeout before the default route actually gets published on the routing table. But for this timeout to be an issue you have to be in a connected or a blocked state when suspending the machine - do you have block when disconnected enabled? Have you sent a problem report with the logs? I'd be interested in reading the logs because it may well be an issue that we have not seen before.

General misbehavior around suspend/wakeup with macOS is a known issue. This is because macOS relies on a working DNS after wakeup, before it allows for connectivity. Since we try and block DNS requests when in the connected or blocked state to prevent leaks, the user is forced to endure a timeout. But this most definitely shouldn't be an issue if you disconnect before suspending the machine, unless you've enbaled block when disconnected. We have an idea how to fix this, but it'd be a lot of effort, so we don't have it on our roadmap just yet.

yukkyma commented 3 years ago

This is not what we've come to expect from macOS - there is an expected delay after wakeup where the captive portal check has to timeout before the default route actually gets published on the routing table. But for this timeout to be an issue you have to be in a connected or a blocked state when suspending the machine - do you have block when disconnected enabled? Have you sent a problem report with the logs? I'd be interested in reading the logs because it may well be an issue that we have not seen before.

General misbehavior around suspend/wakeup with macOS is a known issue. This is because macOS relies on a working DNS after wakeup, before it allows for connectivity. Since we try and block DNS requests when in the connected or blocked state to prevent leaks, the user is forced to endure a timeout. But this most definitely shouldn't be an issue if you disconnect before suspending the machine, unless you've enbaled block when disconnected. We have an idea how to fix this, but it'd be a lot of effort, so we don't have it on our roadmap just yet.

I have block when disconnected disabled. I can send logs for it in a few hours, and I can try it a few more times before so you can perhaps easier see it. But the strange thing is is that it worked without problem on version 2020.4. But as soon as I updated Mullvad to 2020.8, I think, when the older version stopped working the problem began.

pinkisemils commented 3 years ago

For the next time you encounter these issues, could you also please save the output of route get default? This is behavior we have not seen and the changes we made from 2020.4 to 2020.5 were there to actually improve the offline detection. You can also disable offline detection altogether to circumvent this by running our daemon with TALPID_DISABLE_OFFLINE_MONITOR set to 1, I can write a more detailed guide later in the day.

yukkyma commented 3 years ago

I can reproduce these issues whenever. All I have to do is put the computer to sleep and when I wake it up it doesn't work. The output of route get default is: route to: default destination: default mask: default gateway: rt-ac88u-9a08 interface: en7 flags: <UP,GATEWAY,DONE,STATIC,PRCLONING> recvpipe sendpipe ssthresh rtt,msec rttvar hopcount mtu expire 0 0 0 0 0 0 1500 0

I would like a guide for disabling offline detection

pinkisemils commented 3 years ago

To fix this for a single boot, you can just set the environment variable and restart our service like so -

sudo launchctl setenv TALPID_DISABLE_OFFLINE_MONITOR 1
sudo launchctl unload /Library/LaunchDaemons/net.mullvad.daemon.plist
sudo launchctl load /Library/LaunchDaemons/net.mullvad.daemon.plist

You can also edit /Library/LaunchDaemons/net.mullvad.daemon.plist directly to set the environment variable in a more permanent fashion like so. If you update the app, you'll have to make this change again, but it would survive reboots.

Bear in mind, this isn't so much a fix as a workaround - setting the environment variable disables offline detection, so if you don't disconnect the tunnel when you've lost connectivity, the app will try to connect to a tunnel forever.

rakshazi commented 3 years ago

Hello, same problem on Linux

LSB Version:    n/a
Distributor ID: ManjaroLinux
Description:    Manjaro Linux
Release:    20.2.1
Codename:   Nibia
pinkisemils commented 3 years ago

@rakshazi Would you mind posting the output of the following the next time you end up in this situation?

ip route get 193.138.218.78 mark 0x6d6f6c65
ip route
ip -6 route
easiestripes commented 3 years ago

Same issue started happening to me roughly 2 months ago on Ubuntu 20.04. Now every single time after computer wakes from suspend, I have to reconnect on Mullvad to get internet working.

Output from above command after waking from suspend:

$ ip route get 193.138.218.78 mark 0x6d6f6c65
193.138.218.78 via 192.168.1.1 dev eno1 src 192.168.1.57 mark 0x6d6f6c65 uid 1000 
    cache

$ ip route
default via 192.168.1.1 dev eno1 proto dhcp metric 100 
10.64.0.1 dev wg-mullvad proto static 
169.254.0.0/16 dev eno1 scope link metric 1000 
192.168.1.0/24 dev eno1 proto kernel scope link src 192.168.1.57 metric 100 

$ ip -6 route
::1 dev lo proto kernel metric 256 pref medium
2603:8000:7f03:1ff5::/64 dev eno1 proto ra metric 100 pref medium
fc00:bbbb:bbbb:bb01::1 dev wg-mullvad proto static metric 1024 pref medium
fc00:bbbb:bbbb:bb01::3:8ce dev wg-mullvad proto kernel metric 256 pref medium
fe80::/64 dev eno1 proto kernel metric 100 pref medium
default via fe80::f85b:3bff:fe61:8c4e dev eno1 proto ra metric 100 pref high

And if I turn Mullvad off before suspending, my internet works perfectly fine immediately after waking my computer up.

pinkisemils commented 3 years ago

@easiestripes I suspect you're not stuck in offline mode but instead just have a wonky DNS config because NM is overwriting our config. Would you mind posting a problem report and possibly opening a separate issue so we can track this down appropriately? In the mean time, we have some fixes for wonky DNS behavior in an upcoming release. It's really around the corner, it's just waiting on me removing some nasty bugs I introduced myself a week ago :)

easiestripes commented 3 years ago

@pinkisemils oh ok interesting. Sounds good, I submitted a problem report just now. And that's great to hear about the DNS updates coming soon!

pinkisemils commented 3 years ago

@easiestripes the bug you're experiencing should be fixed in an upcoming release. We're testing it internally, and soon there should be a beta.

Please do note that the upcoming release will not fix the issues with macOS and sleep.

Aitchy commented 3 years ago

+1 for having this issue on Arch linux box using 2021.2 (A Ubuntu 20.04 box running thae same version has no issues)

The interesting thing is if I "switch location" that fixes it. Just mentioning this to make sure it's the same issue you have adressed in the upcoming release.

DrChr commented 3 years ago

I have this issue with 2021.2 on Ubuntu 20.04.1. I think I remember this issue also from 2021.1. Using the "Reconnect" via the little miniature icon in the toolbar fixes the problem after resuming from sleep.

Like @Aitchy I'd also like a confirmation it's the same issue. Please let me know if you'd like me to provide some logs or do some other kind of trouble shooting.

pinkisemils commented 3 years ago

@DrChr, @Aitchy please open a separate GitHub issue for Linux sleep issues :slightly_smiling_face: . Regardless, the issue you are experiencing is not that the daemon is blocking traffic after sleep, but instead NetworkManager is overwriting the DNS config of an interface it's not managing, which clears our config, and since our daemon only allows our DNS servers to be reached, all of your DNS requests time out. This has been fixed in the latest beta (2021.3-beta1), so I suggest you try and use it, if you are comfortable using beta versions.

DrChr commented 3 years ago

@pinkisemils I'm now using the 2021.3-beta1. I can confirm it's fixed the reconnect issue after sleep for me. So I'm good with this - thank you! Just to confirm: If you'd like, I can still create a separate GitHub issue for this? (Maybe it'd benefit other users while waiting for the next release. The issue would of course also suggest that they can try the beta version).

pinkisemils commented 3 years ago

This issue already contains enough information for people to come to the right conclusions, IMO, and with the impending release of 2021.3, I hope it won't be necessary. But if you do still experience any issues that you think would best be reported here rather than via the problem report, we're always happy to see clear and descriptive issues.

Aitchy commented 3 years ago

@DrChr @pinkisemils I've also now loaded 2021.3-beta1 from AUR and can confirm the issue is gone. Thanks to the Mullvad team for being on the ball once more :+1:

yaomtc commented 3 years ago

I have 2021.4-1 installed, from the AUR on Arch since June 30, and I'm experiencing this issue. EDIT: I've tried replacing it with the beta channel version (2021.4.stable-1) and it has this same issue for me.

pinkisemils commented 3 years ago

@yaomtc please create a new issue. this one is macOS specific :slightly_smiling_face: Would you mind also sending us a problem report and showing the output of ip route when the issue happens?

As of just now, I cannot reproduce this with NetworkManager and 2021.4.

ph00lt0 commented 2 years ago

This is not what we've come to expect from macOS - there is an expected delay after wakeup where the captive portal check has to timeout before the default route actually gets published on the routing table. But for this timeout to be an issue you have to be in a connected or a blocked state when suspending the machine - do you have block when disconnected enabled? Have you sent a problem report with the logs? I'd be interested in reading the logs because it may well be an issue that we have not seen before.

General misbehavior around suspend/wakeup with macOS is a known issue. This is because macOS relies on a working DNS after wakeup, before it allows for connectivity. Since we try and block DNS requests when in the connected or blocked state to prevent leaks, the user is forced to endure a timeout. But this most definitely shouldn't be an issue if you disconnect before suspending the machine, unless you've enbaled block when disconnected. We have an idea how to fix this, but it'd be a lot of effort, so we don't have it on our roadmap just yet.

Just to be sure, is this reconnect feature dependent on the captive portal check? For security reasons I disabled the captive portal services in my devices. I wonder if this effects the reconnect problem.

pinkisemils commented 2 years ago

@ph00lt0 if you're using macOS, then yes. I don't believe that offline issues are tied to captive portal checks on Linux.

ph00lt0 commented 2 years ago

@pinkisemils it would be great if that check could be skipped when captive portal is disabled. Many security minded people disable this as captive portals are an attack surface.

pinkisemils commented 2 years ago

@ph00lt0 The connectivity check workaround doesn't leak any extra traffic as it currently stands - in the offline state the system is configured to resolve DNS via our client, and our client only responds to queries for macOS's portal check domain with a non-routable IP address, to which traffic will be dropped immediately due to our firewall rules. This tricks the captive portal enough to make it work faster. This currently isn't disable-able behavior, and it will remain so for the foreseeable future as it's just a lot simpler and easier to support. I've written pretty much the same thing here, but there may be more detail there.

ph00lt0 commented 2 years ago

It's not about leaking data. Captive portals can be used in attacks by replicating wifi networks and auto open websites as captive portal. Therefore execute webpages within insecure browsers used by the operating system.

faern commented 2 years ago

We are not letting the captive portal check reach the network. Nor any subsequent actual captive portal load. So this attack does not work against our app currently.

ph00lt0 commented 2 years ago

@faern the attack is different then you describe here. The idea is that you execute code (javascript) usually in a less secure browser which the OS makes you use. The page will open automatically if you don't disable captive portal and you would have no option to stop this. This is why many people disable captive portal. Because you can always open these pages yourself. Even if you block any loading of captive portals in the Mullvad app, people will have the need to use captive portals from time to time to get on a network. It is better when the user opens this themselves through a more secure browser then letting this auto execute. It's simply reducing attack surface.

Everyone who has captive portal check disabled is now stuck with having to kill the mullvad-deamon to reconnect, while it should just do this automatically.

pinkisemils commented 2 years ago

When you're stuck in the offline state, what does route get default return? If you don't have a default route, the daemon won't be able to connect.

pinkisemils commented 2 years ago

And just to be clear, our client isn't blocking anything in this case. Our client depends on the routing table having a default route. macOS has a connectivity check that seemingly involves a captive portal check that needs to pass before it publishes a default route it has received from a DHCP server. Since the captive portal check normally requires a working DNS configuration and our client tries its best to never leak DNS queries when it's target state is to be secure, there's a slight dead-lock - our client will block all traffic because it can't start a tunnel since there's no default route and macOS won't publish the default route until it's connectivity check is finished. The connectivity check will eventually give up and macOS will publish the default route anyway, but this usually takes a lot of time.

I am still testing to see what's macOS's behavior here when the captive portal check is disabled.

@ph00lt0 may I ask, how exactly have you disabled the captive portal check on macOS? I disabled the captive portal service via sudo defaults write /Library/Preferences/SystemConfiguration/com.apple.captive.control Active -boolean false and confirmed that when not using our client and connecting to a new wireless network, I don't see any DNS queries for captive.apple.com. With the captive portal check in macOS disabled, I couldn't reproduce issues with our daemon being stuck in the offline state.

ph00lt0 commented 2 years ago

Yeah same command used here to disable it indeed. I haven't experienced the problem yesterday and today now strangely. The days before I kept having to kill the application whenever I had left my Mac in standby for a bit and it just wouldn't reconnect. I read this issue and since then I have been wondering if this was due to my modification to captive portals. I am not saying it is the case, purely assumption, as you confirmed earlier the reconnect depends on it. It could definitely be that's caused by something else.

Edit:

Everyone who has captive portal check disabled is now stuck with having to kill the mullvad-deamon to reconnect, while it should just do this automatically.

I see now I wrote this, should probably have been clear that this was based on your comment ("if you're using macOS, then yes.") earlier that it actually is required.

pinkisemils commented 2 years ago

Whenever you do get into that state, please do see if a default route exists or not - if there is a default route, a problem report would be greatly appreciated. Or if the daemon seems to ever be stuck and not responding to mullvad status or mullvad reconnect --wait commands. However, if route get default doesn't return a route, it's expected behavior to have the client be stuck in the offline state, and this might still be an issue if this happens for too long and we'll try our best to fix such behavior, but it's often impossible for us to reproduce all environments in which this takes place.

ph00lt0 commented 2 years ago

@pinkisemils just got the same problem again and tried your commands.

image

The only way for me to fix this is to force kill the mullvad-deamon activity, when it restarts it connects again without problems.

pinkisemils commented 2 years ago

As you can see, route get default doesn't actually present the next hop. This means that macOS is not publishing the default route. Under these circumstances, does it ever unblock it self after some time or is it just stuck in the offline state? In both cases, a problem report would be appreciated.

ph00lt0 commented 2 years ago

@pinkisemils no it doesn't unblock itself and keeps being stuck. I tested this earlier and it kept like this for hours. Where exactly should I report this other then here?

pinkisemils commented 2 years ago

You can send a problem report from our app - this will send anonymized logs from our daemon to our support team. You can also choose to post raw logs here, but a problem report from our app would be preferable.

micdonato commented 2 years ago

I am having the same issue:

Mac OS 12.3.1 (21E258) Mullvad 2022.1

"Blocked connection" after sleep. The "Cancel" and refresh button are non responsive.

faern commented 2 years ago

Maybe try our latest beta, 2022.2-beta1? See if that helps. Or send a problem report from within the app. Please reference to this GH issue in the message in the report