runfalk / synology-wireguard

WireGuard support for some Synology NAS drives
MIT License
949 stars 132 forks source link

Synology DSM 6.2.2-24922 breaking WireGuard? #10

Closed runfalk closed 4 years ago

runfalk commented 5 years ago

Two days ago my NAS restarted and upgraded to 6.2.2-24922 (from 6.2.1-23824-6) by itself. Since then I can no longer connect using WireGuard.

Changelog doesn't seem to list anything obvious (https://www.synology.com/en-global/releaseNote/DS218j#ver_24922). My kernel compile is fresh, but the version is the same Linux Poseidon 3.10.105 #24922 SMP Fri May 10 02:48:35 CST 2019 armv7l GNU/Linux synology_armada38x_ds218j.

Does anyone else experience the same?

mistersandman commented 3 years ago

I debugged this issue for a bit and found something interesting. These are the commands that wg-quick up wg0 issues when I run it with my wg0.conf:

ip link add wg0 type wireguard
wg setconf wg0 /dev/fd/63
ip -4 address add 10.0.0.1/24 dev wg0
ip link set mtu 1420 up dev wg0

so I played around with manually issuing these commands and found that switching the last two commands does not expose the issue, meaning when the second last command is issued only after the last command, the following route is correctly added to the routing table as desired (output from ip route):

10.0.0.0/24 dev wg0  proto kernel  scope link  src 10.0.0.1

Moreover, I monitored the routing table using ip monitor all while running wg-quick up wg0 and this gave me the following output:

[LINK]70: wg0: <POINTOPOINT,NOARP> mtu 1420 qdisc noop state DOWN
    link/none
[ADDR]70: wg0    inet 10.0.0.1/24 scope global wg0
       valid_lft forever preferred_lft forever
[ROUTE]local 10.0.0.1 dev wg0  table local  proto kernel  scope host  src 10.0.0.1
[LINK]70: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN
    link/none
[ROUTE]10.0.0.0/24 dev wg0  proto kernel  scope link  src 10.0.0.1
[ROUTE]broadcast 10.0.0.0 dev wg0  table local  proto kernel  scope link  src 10.0.0.1
[ROUTE]broadcast 10.0.0.255 dev wg0  table local  proto kernel  scope link  src 10.0.0.1
[LINK]70: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420
    link/none
[ROUTE]Deleted 10.0.0.0/24 dev wg0  proto kernel  scope link  src 10.0.0.1

where the first [LINK] log belongs to the ip link add command, the following [ADDR] and [ROUTE] logs belong to the ip -4 address command and finally the remaining [LINK] and [ROUTE] logs except for the last [ROUTE] log belong to the ip link set command. The last [ROUTE] log always appears a bit delayed clearly well after the wg-quick command finished. It deletes the route we're missing and that has previously and correctly been set up by the ip link set command bringing up the wg0 interface!

Finally, manually issuing the four commands shows, that the route deletion only happens, when the last command is issued quickly enough after the second last one. So when waiting 3-5 seconds after the ip -4 address command, before issuing the final ip link set command, the route does not get deleted and stays in the routing table as desired.

In conclusion, it actually seems like wg-quick is setting up all its routing correctly, but somehow something triggers the removal of this one route. I suspect that some running process (likely unrelated to WireGuard) is monitoring the routing tables for [ADDR] events and then queues this (delayed) route deletion. I did not have any luck yet in identifying which process this might be though, so the last part remains just a guess due to lacking hard evidence.

nApucco commented 2 years ago

@mistersandman

I debugged this issue for a bit and found something interesting. These are the commands that wg-quick up wg0 issues when I run it with my wg0.conf: ...

Thank you so much for this hint! After hours of trying to understand why routing didn't work for me, I finally found your comment and fixed the issue by manually performing and understanding the steps of wq-quick.