Script needs manual restart each day

Miniclubbin commented 2 years ago

I have this running on a UDMP via next hop to an externally managed USG and it works awesome.

However, I'm finding that it drops some point within the day and I need to reissue both the mount command and up command to restore it. This has happened over the past few days since setting it up. The UDMP doesn't reboot, but I have noticed that it renegotiates PPPoE once a day. If you can tell me where to look, I can check logs to see if that triggers it.

I have not set up any boot scripts.

Is there some type of keepalive setting I might have missed? Let me know what you need to see.

peacey commented 2 years ago

Hi @Miniclubbin,

That's very weird because nexthop on your LAN shouldn't have anything to do with PPPoE. Can I see your vpn.conf? You can upload it as a .txt to your post.

Can you trigger the problem by unplugging/replugging your WAN connection or bringing the WAN interface down/up?

When the problem happens and before you restart the script, can you show me the output of these commands?

ip rule
ip route show table 101
iptables -t mangle -S | grep VPN
ps aux | grep updown

Thanks!

Miniclubbin commented 2 years ago

Happened again today. I have no idea when since I wasn't home, but let me know if there's a log that watches the status.

# ip rule
0:      from all lookup local
99:     from all fwmark 0x9 lookup 101
220:    from all lookup 220
32000:  from all lookup main
32500:  from [PUBLIC IP] lookup 201
32766:  from all lookup 201
32767:  from all lookup default

# ip route show table 101
blackhole default

# iptables -t mangle -S | grep VPN
-N VPN_FORWARD
-N VPN_OUTPUT
-N VPN_POSTROUTING
-N VPN_PREROUTING
-A PREROUTING -j VPN_PREROUTING
-A FORWARD -j VPN_FORWARD
-A OUTPUT -j VPN_OUTPUT
-A POSTROUTING -j VPN_POSTROUTING
-A VPN_PREROUTING -i br76 -j MARK --set-xmark 0x9/0xffffffff
-A VPN_PREROUTING -d 192.168.4.1/32 -m mark --mark 0x9 -j MARK --set-xmark 0x0/0xffffffff

# ps aux | grep updown
12904 root     {updown.sh} /bin/sh /etc/split-vpn/vpn/updown.sh tun1 up site1
24016 root     grep updown

### SPLIT VPN OPTIONS ###
FORCED_SOURCE_INTERFACE="br76"
FORCED_SOURCE_IPV4=""
FORCED_SOURCE_IPV6=""
FORCED_SOURCE_MAC=""
FORCED_SOURCE_IPV4_PORT=""
FORCED_SOURCE_IPV6_PORT=""
FORCED_SOURCE_MAC_PORT=""
FORCED_DESTINATIONS_IPV4=""
FORCED_DESTINATIONS_IPV6=""
FORCED_LOCAL_INTERFACE=""
EXEMPT_SOURCE_IPV4=""
EXEMPT_SOURCE_IPV6=""
EXEMPT_SOURCE_MAC=""
EXEMPT_SOURCE_IPV4_PORT=""
EXEMPT_SOURCE_IPV6_PORT=""
EXEMPT_SOURCE_MAC_PORT=""
EXEMPT_DESTINATIONS_IPV4=""
EXEMPT_DESTINATIONS_IPV6=""
FORCED_IPSETS=""
EXEMPT_IPSETS=""
PORT_FORWARDS_IPV4=""
PORT_FORWARDS_IPV6=""
DNS_IPV4_IP="DHCP"
DNS_IPV4_PORT=53
DNS_IPV4_INTERFACE=""
DNS_IPV6_IP="REJECT"
DNS_IPV6_PORT=53
DNS_IPV6_INTERFACE=""
BYPASS_MASQUERADE_IPV4="ALL"
BYPASS_MASQUERADE_IPV6=""
KILLSWITCH=0
REMOVE_KILLSWITCH_ON_EXIT=1
REMOVE_STARTUP_BLACKHOLES=1
VPN_PROVIDER="nexthop"
VPN_ENDPOINT_IPV4="192.168.4.2"
VPN_ENDPOINT_IPV6=""
GATEWAY_TABLE="disabled"
MSS_CLAMPING_IPV4=""
MSS_CLAMPING_IPV6=""
WATCHER_TIMER=1
ROUTE_TABLE=101
MARK=0x9
PREFIX="VPN_"
PREF=99
DEV=tun1

AFTER RESET: WAN cable has no effect. Internet drops, but comes right back. Routing table keeps routes during WAN unplug.

# ip route show table 101
0.0.0.0/1 via 192.168.4.2 dev tun1
blackhole default
128.0.0.0/1 via 192.168.4.2 dev tun1

peacey commented 2 years ago

Thanks @Miniclubbin. I know what the problem is now. Your VPN tunnel tun1 is getting deleted every time your PPPoE renegotiates/WAN resets. So we just need to add the routes whenever they're deleted.

I have implemented a fix in #74. Can you update your script to the nexthop-autoadd branch by issuing the following command on your UDMP?

curl -LSsf https://raw.githubusercontent.com/peacey/split-vpn/main/vpn/install-split-vpn.sh | sed s/"main"/"nexthop-autoadd"/g | sh

Then bring the script down/up again and check if the problem still happens.

Miniclubbin commented 2 years ago

Alrighty, implemented the update and it went down again. I had left putty open and there is a message since running the script yesterday that says 'Cannot find dev "tun1"' but there is no timestamp or other ID.

peacey commented 2 years ago

Sorry @Miniclubbin. I forgot to add error checking. I've updated it again. Can you issue the same curl command to update and try again?

Now, the errors will be recorded in rule-watcher.log in the same folder as the vpn.conf. But the script will keep trying to add the routes until it works, and won't die instead.

Miniclubbin commented 2 years ago

Done, I'll let you know how it goes. Thanks!

Miniclubbin commented 2 years ago

OK, I ran a tail on that watcher log and saw a bunch of "cannot find device "tun1"" after a short ISP drop, but it came back up on it's own. There must be some sort of process reload in the UDMP, and your new script retries until it takes. I'm gonna watch it for another day before I mark it closed, but I have a good feeling about this one.

Is there a way to get timestamps for the log messages?

peacey commented 2 years ago

It's possible to add timestamps for the errors, but it is a bit convoluted due to how the output is redirected. I've updated the commit again to add timestamps. You can issue the curl command again to update for timestamps on the messages.

Miniclubbin commented 2 years ago

All good! Thank you for all your help.

peacey commented 2 years ago

No problem @Miniclubbin. I have merged this to the main branch now, so you can update normally going forward.

Have a nice day!

peacey / split-vpn

Script needs manual restart each day #72