Multiple Wireguard connections with Domain Based routing

peacey / split-vpn

A split tunnel VPN script for Unifi OS routers (UDM, UXG, UDR) with policy based routing.

GNU General Public License v3.0

802 stars 56 forks source link

Multiple Wireguard connections with Domain Based routing #171

Closed patelniket closed 1 year ago

patelniket commented 1 year ago

Hello,

Currently I do have a this split vpn setup on my UDM Pro SE with one wireguard connection with domain based routing enabled (using dnsmasq) and everything is working perfectly fine.

wg0 is used for this.
vpn.conf file has EXEMPT_IPSETS="VPN_EXEMPT:dst" set.
VPN_domains.conf file has some EXEMPT_DOMAINS list.

Now I want to enable another wireguard connection, let's say wg1, and the idea is to force certain domains to go through this tunnel.

I have also looked at #146 & #147 that multiple tunnels needs to be setup with in their own directory with unique custom configs.

The questions I have is around the feasibility of multiple tunnels with domain based routing.

I assume I have to create another VPN_domains.conf file and the new one will have FORCE_DOMAINS list?.
For wg1 in vpn.conf file do I need to set EXEMPT_IPSETS="VPN2_EXEMPT:dst"? (If PREFIX="VPN2_")
Is it okay to keep FORCED_SOURCE_INTERFACE="br0" in vpn conf files for both tunnels?.
Do I need to look at setting up CUSTOM_FORCED_RULES_IPV4/IPV6 rules and if so where it will be configured?.

I plan to test this out in coming days but any pointers beforehand will be helpful if such setup was ever tested. Thank you in advance!

peacey commented 1 year ago

Hi @patelniket.

So FORCED/EXEMPT_IPSETS forces or exempts those ipsets/domains through the VPN tunnel for ALL clients, not just the ones you set in FORCEDSOURCE options. If you want those ipset rules to only apply to specific clients or interfaces, then you do need to use CUSTOM_FORCEDRULES option.

Is it okay to keep FORCED_SOURCE_INTERFACE="br0" in vpn conf files for both tunnels?.

No you cannot have the same interface forced for both wireguard tunnels, because that option forces all traffic from that interface. So that would conflict if you used that for both tunnels.

If you want to use on tunnel to force all traffic through wg0 from br0, but use another tunnel to force certain domains through from br0, it is possible with CUSTOM_FORCED_RULES_IPV4, though.

I assume I have to create another VPN_domains.conf file and the new one will have FORCE_DOMAINS list?.

You can crate another VPN2_domains.conf for example. Or if you are not using the FORCE_DOMAINS from the first VPN_domains.conf, you can just define the force/exempt under one file. But then you can reference the exempt one in wg0's vpn.conf and the force one in wg1's vpn.conf.

If you can explain exactly what you want to do (what clients you wan to force, all clients or just certain one, and whether you want to force all traffic or just certain domains, and to which wg0 or wg1), I can help you craft the custom rules.

patelniket commented 1 year ago

Hi @peacey, thank you for your response.

If you can explain exactly what you want to do (what clients you wan to force, all clients or just certain one, and whether you want to force all traffic or just certain domains, and to which wg0 or wg1), I can help you craft the custom rules.

So in essence below is what I am trying to do:

wg0: will be the main tunnel for all clients for which I have already set FORCED_SOURCE_INTERFACE="br0" in vpn.conf file. This tunnel also has some exempt domains which is also configured as part of VPN_domains.conf which needs to be exempt from ALL vpn tunnel(s).
wg1: will be a new secondary vpn tunnel which I want to setup. I want to enforce all local clients trying to access certain domains (via FORCED_IPSETS) to go through this tunnel. Otherwise fall back on rules of wg0 setup.

Hope that makes sense. Thank you again for your help!

peacey commented 1 year ago

Okay, that seems simple enough. You can try configuring it this way:

Just make one VPN_domains.conf with the exempt domains for wg0, and the forced domains for wg1.

Then make two folders for the wireguard configs, one for wg0, and one for wg1.

In the wg0 vpn.conf, set FORCED_SOURCE_INTERFACE="br0" and EXEMPT_IPSETS="VPN_EXEMPT:dst".
In wg1.conf, set FORCED_IPSETS="VPN_FORCED:dst" only.
Make sure to use a unique route_table, dev, mark, prefix in each vpn.vonf

Then start wg0 first then wg1. It has to be started in this order.

patelniket commented 1 year ago

Thank you. Yeah that's what I was planning to test. I will get back on how it goes when I can put all the configurations in couple of days when I can actually take down my home network because I also need upgrade everything to new version as well.

One question on above is that if I use unique PREFIX=VPN2_ for wg1, then don't I need to set FORCED_IPSETS="VPN2_FORCED:dst"?

peacey commented 1 year ago

No you don't need to. The IPSET prefix is not connected to the vpn.conf prefix. It's just for your reference. For example, if you needed to make multiple forced ipsets (VPN_FORCED, VPN2_FORCED, etc), then you can by adding another VPN2_domains.conf with that new prefix, but in this case you only use VPN_FORCED in one config and VPN_EXEMPT in another, so we don't need to do that.

patelniket commented 1 year ago

Hello @peacey, so this setup seems to work perfectly fine when the UDM Pro SE console first boots (as part of systemd service). But if I was to restart any of vpn tunnels for whatever reason then the setup somehow breaks and all the traffic is by default always routed via wg0 even though the IP addresses are added to ipset list VPN_FORCED4 which are supposed to be routed via 'wg1'.

Below is how the start up script looks like /etc/split-vpn/run-vpn.sh:

#!/bin/sh

# Run WireGuard VPN

# Set up the WireGuard kernel module and tools
cd /mnt/data/wireguard
./setup_wireguard.sh >setup-wireguard.log 2>&1

# Add dnsmasq ipsets for domain-based routing
cd /mnt/data/split-vpn/ipsets
./add-dnsmasq-ipset.sh >add-dnsmasq-ipset.log 2>&1

# Set up the wireguard vpn tunnels
/etc/split-vpn/wireguard/wg0/run-vpn.sh
/etc/split-vpn/wireguard/wg1/run-vpn.sh

wg0 vpn.conf:

FORCED_SOURCE_INTERFACE="br0"
FORCED_LOCAL_INTERFACE="eth8"

FORCED_IPSETS=""
EXEMPT_IPSETS="VPN_EXEMPT:dst"

DNS_IPV4_INTERFACE="br0"

KILLSWITCH=0
REMOVE_KILLSWITCH_ON_EXIT=1

ROUTE_TABLE=101
MARK=0x169
PREFIX="VPN0_"
PREF=98
DEV=wg0

wg1 vpn.conf

FORCED_SOURCE_INTERFACE=""
FORCED_LOCAL_INTERFACE=""

FORCED_IPSETS="VPN_FORCED:dst"
EXEMPT_IPSETS=""

DNS_IPV4_INTERFACE=""

KILLSWITCH=1
REMOVE_KILLSWITCH_ON_EXIT=0

ROUTE_TABLE=102
MARK=0x170
PREFIX="VPN1_"
PREF=99
DEV=wg1

The individual wg0|1 vpn start and stop scripts are like so:

run-vpn.sh:

#!/bin/sh

# Load configuration and run wireguard
cd /mnt/data/split-vpn/wireguard/wg0
. ./vpn.conf
# /etc/split-vpn/vpn/updown.sh ${DEV} pre-up >pre-up.log 2>&1
wg-quick up ./${DEV}.conf >wireguard.log 2>&1
cat wireguard.log

stop-vpn.sh:

#!/bin/sh

# Load configuration and run wireguard
cd /etc/split-vpn/wireguard/wg1
. ./vpn.conf

wg-quick down ./${DEV}.conf >wireguard.log 2>&1
cat wireguard.log

On a separated note: During the stop of wg0 I have always gotten below error (even before this new setup). I don't think it's related but just want to mention it.

Error: any valid prefix is expected rather than "".
[#] ip link delete dev wg0

I searched & looked at this issue 54. I have:

VPN_ENDPOINT_IPV4=""
VPN_ENDPOINT_IPV6=""

but unsure why it is still complaining. But the tunnel is stopped successfully.

I am not sure why the setup is not able to survive restarts of vpn tunnels and only works after a reboot. I don't want to necessary restart the whole console if I want to restart the vpn tunnels. Would appreciate if we are able to dig into this. My guess is perhaps something related to ip rules not respected / refreshed properly?.

peacey commented 1 year ago

Hey @patelniket,

So the problem is that the first VPN's rules need to be added before the second VPN (due to how marking the packets work, we need to make sure the domain forced packets override any mark from the first VPN, so those rules need to be last). However, because of the killswitch and REMOVE_KILLSWITCH_ON_EXIT=0 on the second VPN, the rules for the second VPN are not deleted on exit. Which means when you re-run the VPN start script, the first VPNs rules are added after the second (since the second's rules are already there).

So basically, you have two options:

Either remove the Killswitch rules from VPN2.
Or if you want to keep the Killswitch for VPN2, force the VPN2 down when you restart the VPN so that it removes the Killswitch. Doing a force-down is different than just down, which doesn't delete the rules if there is a Killswitch. For this option, just add this line to your run-vpn.sh for VPN2 right before the wg-quick up line (but after the . ./vpn.conf line)

/etc/split-vpn/vpn/updown.sh ${DEV} force-down

Note that even with option 2, the Killswitch will be disabled for a brief moment. Just a few milliseconds, but it could be enough time for traffic to leak to wg0 if anything is happening at that time. Just keep that in mind.

Also, one more thing. I noticed in your first VPN config, you have DNS_IPV4_INTERFACE=br0 but no DNS_IPV4_IP set. Is that on purpose or you just forgot to show the other option? Because DNS_IPV4_INTERFACE won't work without DNS_IPV4_IP if you're trying to force DNS to a local interface.

Also, I noticed you used FORCED_LOCAL_INTERFACE. Keep in mind this is an experimental option that forces UDM local traffic, not routed traffic, out the VPN. But it hasn't been tested extensively and might cause problems with Unifi features like remote login. So if anything is weird, keep that in mind and remove that option.

patelniket commented 1 year ago

Hey @peacey, thank you for a detailed explanation on this and suggestions with options. That makes much more sense now. I have now included the force-down before the start on VPN2 in run-vpn.sh and now it's working as expected with vpn restarts as well!

Note that even with option 2, the Killswitch will be disabled for a brief moment. Just a few milliseconds, but it could be enough time for traffic to leak to wg0 if anything is happening at that time. Just keep that in mind.

I will definitely keep this in mind when I am doing any manual vpn restarts.

Also, one more thing. I noticed in your first VPN config, you have DNS_IPV4_INTERFACE=br0 but no DNS_IPV4_IP set. Is that on purpose or you just forgot to show the other option? Because DNS_IPV4_INTERFACE won't work without DNS_IPV4_IP if you're trying to force DNS to a local interface.

Yes this is something that I had included there when I was tinkering around with this setup but this is the same thing that I was thinking about as well. I will set DNS_IPV4_INTERFACE="" in first VPN config now as it is not needed. Because I already have nextdns setup locally within dnsmasq config (pointing to server=127.0.0.1#5553)

patelniket commented 1 year ago

Hi @peacey, I have one separate question on the above setup. What is the best way to enable blackhole so that no traffic is leaked from either VPNs during startup until the policy rules are added.

I read the "How can I block Internet access until after this script runs at boot?" in README.

After adding blackhole routes in Unifi network settings, and enabling the REMOVE_STARTUP_BLACKHOLES=1 in both tunnels, will I need to add the pre-up script to be executed in both VPN1 and VPN2?.

Is there a way to ensure that when blackhole routes are removed the traffic for domains which is forced to VPN2 doesn't leak through from VPN1 for a short period of time even microseconds?. (I am assuming this can happen because by the time the pre-up step in VPN1 removes the blackhole routes the VPN2 policy rules are yet to be setup as a next step.)

Or should I add the pre-up step just un the VPN2 run-script?. By not adding the pre-up step in VPN1, I am unsure if that will cause any other issues or if that script will fail because it won't have any internet connection up until pre-up is execute in VPN2.

peacey commented 1 year ago

So the system blackholes are removed at the pre-up or up phase, but only after the killswitch and other iptables forcing/exempt rules are added. You do need the pre-up when you want to make sure the killswitch/iptables rules are added before the VPN starts (because up phase is only called after the VPN starts). So in your case, yes you should make sure to add the pre-up to both run configs.

Then, in order to get what you want, I think you should set REMOVE_STARTUP_BLACKHOLES=0 in VPN1, but set it to 1 only in VPN2. This way the system blackholes will only be removed after VPN2's rules are added.

But this only helps you when the system restarts to prevent leaks to VPN1 or WAN. It doesn't help if you manually restart the VPN yourself where it can leak since the system blackhole routes have already been removed. However, if you just add the blackhole routes yourself in your restart script (after you run down on both VPNs, but before you run force-down on VPN2), then you can also prevent leaks at manual restart. You can add the blackhole routes like this:

ip route add blackhole 0.0.0.0/1
ip route add blackhole 128.0.0.0/1
ip route add blackhole ::/1
ip route add blackhole 8000::/1

Also keep in mind that the system blackhole routes disable internet access for the whole system, not just forced clients. So if anything goes wrong and the script didn't start for whatever reason like a config error, you won't have internet anymore and will have to manually delete the blackhole routes or make the script start properly.

patelniket commented 1 year ago

Hi @peacey, I am facing below issues when I try this setup. I have added these blackhole routes:

But now it seems like VPN1 is not able to connect so the main run script doesn't seem to be able to get past that step.

wg0 vpn.conf:

FORCED_SOURCE_INTERFACE="br0"
FORCED_LOCAL_INTERFACE="eth8"

FORCED_IPSETS=""
EXEMPT_IPSETS="VPN_EXEMPT:dst"

KILLSWITCH=1
REMOVE_KILLSWITCH_ON_EXIT=0

REMOVE_STARTUP_BLACKHOLES=0
DISABLE_BLACKHOLE=1

ROUTE_TABLE=101
MARK=0x169
PREFIX="VPN0_"
PREF=98
DEV=wg0

wg1 vpn.conf:

FORCED_SOURCE_INTERFACE=""
FORCED_LOCAL_INTERFACE=""

FORCED_IPSETS="VPN_FORCED:dst"
EXEMPT_IPSETS=""

KILLSWITCH=1
REMOVE_KILLSWITCH_ON_EXIT=0

REMOVE_STARTUP_BLACKHOLES=1
DISABLE_BLACKHOLE=1

ROUTE_TABLE=102
MARK=0x170
PREFIX="VPN1_"
PREF=99
DEV=wg1

Upon system startup: ip route:

blackhole 0.0.0.0/1 proto static metric 1
<redacted>/24 dev eth8 proto kernel scope link src <redacted>
blackhole 128.0.0.0/1 proto static metric 1
192.168.1.0/24 dev br0 proto kernel scope link src 192.168.1.1

cat wireguard/wg0/wireguard.log:

[#] ip link add wg0 type wireguard
[#] wg setconf wg0 /dev/fd/63
Try again: `<redacted>'. Trying again in 1.00 seconds...
Try again: `<redacted>'. Trying again in 1.20 seconds...
Try again: `<redacted>'. Trying again in 1.44 seconds...
Try again: `<redacted>'. Trying again in 1.73 seconds...
Try again: `<redacted>'. Trying again in 2.07 seconds...
Try again: `<redacted>'. Trying again in 2.49 seconds...
Try again: `<redacted>'. Trying again in 2.99 seconds...
Try again: `<redacted>'. Trying again in 3.58 seconds...
Try again: `<redacted>'. Trying again in 4.30 seconds...
Try again: `<redacted>'. Trying again in 5.16 seconds...
Try again: `<redacted>'. Trying again in 6.19 seconds...
Terminated
[#] ip link delete dev wg0
[#] ip link delete dev wg0
Cannot find device "wg0"

When I try to to then manually start wg1/run-vpn.sh, it's then able to remove the blackhole routes because of REMOVE_STARTUP_BLACKHOLES=1. Then I stop it and then when I start both VPNs via the main run script everything is working fine. But this fails during boot.

Also, for manual restarts - I tried adding below in the VPN2's run-vpn script right before the force-down.

ip route replace blackhole 0.0.0.0/1
ip route replace blackhole 128.0.0.0/1
ip route replace blackhole ::/1
ip route replace blackhole 8000::/1

The problem with that though is the when I stop both VPNs, and then start them using the main run-vpn.sh script the first VPN1 comes up which leads to traffic leaking through VPN1 even for domains which are forced via VPN2, because no blackhole routes are setup until VPN2 script starts.

So to overcome that I figured that I need to add the blackhole routes after VPN2 down but before I start any VPN tunnels at all. I tried testing it by adding those routes as the first step in main run-vpn.sh script. But if I had done just a normal down of the tunnels then rule watch is still running and it removes the blackhole routes as soon as it sees it. If I do a force-down of both VPNs, then rule watcher is also killed, so then the script is able to add these blackhole routes which then again leads me to same issue that I face during boot where the VPN1 is not able to establish connection at all.

patelniket commented 1 year ago

Hey @peacey - So I figured this one out. Turns out I need to tightly couple both VPNs' setup into a single script and apply all the routing policies/rules first and then bring up both of the tunnels as a last step.

Below is now what I have working as a main /etc/split-vpn/run-vpn.sh script which gets called during boot:

#!/bin/sh

########### Add startup blackhole routes ##########
# This helps prevent leaking any traffic to WAN during any manual vpn restarts
ip route replace blackhole 0.0.0.0/1
ip route replace blackhole 128.0.0.0/1
ip route replace blackhole ::/1
ip route replace blackhole 8000::/1
############################################

########### Set up the WireGuard kernel module and tools ###########
cd /mnt/data/wireguard
./setup_wireguard.sh >setup-wireguard.log 2>&1
############################################

########### Add dnsmasq ipsets for domain-based routing ###########
cd /mnt/data/split-vpn/ipsets
./add-dnsmasq-ipset.sh >add-dnsmasq-ipset.log 2>&1
############################################

########## Set up the wireguard wg0 vpn tunnel rules ##########
# Load configuration and run wireguard
cd /mnt/data/split-vpn/wireguard/wg0
. ./vpn.conf

# Add killswitch/iptables rules
/etc/split-vpn/vpn/updown.sh ${DEV} pre-up >pre-up.log 2>&1
############################################

########### Set up the wireguard wg1 vpn tunnel rules ###########
# Load configuration and run wireguard
cd /mnt/data/split-vpn/wireguard/wg1
. ./vpn.conf

# This is needed to remove KILLSWITCH & re-apply (overwrite) policy rules (pre-up) for wg1 during manual vpn restarts
/etc/split-vpn/vpn/updown.sh ${DEV} force-down >force-down.log 2>&1

# Add killswitch/iptables rules
# and delete system blackhole routes - REMOVE_STARTUP_BLACKHOLES=1
/etc/split-vpn/vpn/updown.sh ${DEV} pre-up >pre-up.log 2>&1
############################################

########### Connect vpn tunnels ###########
# Connect wg0 tunnel
cd /mnt/data/split-vpn/wireguard/wg0
. ./vpn.conf

wg-quick up ./${DEV}.conf >wireguard.log 2>&1
cat wireguard.log

# Connect wg1 tunnel
cd /mnt/data/split-vpn/wireguard/wg1
. ./vpn.conf

wg-quick up ./${DEV}.conf >wireguard.log 2>&1
cat wireguard.log
############################################

The only minor caveat is that during manual VPN restarts (stop-vpn.sh and then run-vpn.sh) for few seconds traffic which is supposed to go through VPN2 does leak through to VPN1 (since we need a force-down in VPN2 which removes KILLSWITCH, because we need to make sure rules are overwritten), but that's not a big issue as I will rarely need to restart the tunnels.

This working during startup & it's great! - blocks all internet traffic (blackhole) up until both VPN tunnels are up and the traffic is routed based on domain-based routing policy rules via the respective vpn tunnel as expected.