yggdrasil-network / yggdrasil-go

An experiment in scalable routing as an encrypted IPv6 overlay network
https://yggdrasil-network.github.io
Other
3.53k stars 242 forks source link

path selection is wonky, routing packets over the internet instead of keeping them on the local lan #998

Open dilinger opened 1 year ago

dilinger commented 1 year ago

So I'm at home on my local lan. I've got an access point that I don't control (provided by my ISP) that doesn't seem to properly handle multicast for wifi hosts. Connected to that access point are a mix of wireless hosts running yggdrasil, as well as 2 other devices also running yggdrasil that are on wired ethernet.

All of my yggdrasil hosts have a single entry for Peers in yggdrasil.conf; a VPS out on the internet running yggdrasil, so that when I'm not at home I can still reach all of my hosts at home by routing through that VPS. However, when I'm at home and communicating with another host on my local network, I obviously don't want to route through the VPS.

On a random wireless host, 'yggdrasilctl getPeers' shows exactly 3 peers: 1 [wired device 1, a fast amd ryzen computer] 2 [wired device 2, a raspberry pi 4] 3 [the remote VPS]

Directly pinging those hosts, I see 1-4ms for the two local machines, and 80-100ms for the remote VPS. So clearly there's some latency differences there.

From a wireless device if I attempt to connect with another wireless device via yggdrasil ipv6 address, due to the lack of multicast support, it will bounce through one of those 3 peers. The path it takes seems completely random, however.

Here's the paths (with pub keys and ips truncated):

dilinger@5410:~$ sudo yggdrasilctl getpaths
                           Public Key                                             IP Address                 Path   
77  8138:125a   [1 4 0] 
ff  8138:125a   []      
74  d1d4:ebd1   [3 0]   
6b  c711:a531   [3 7 0] 
6b  2d5d:a8e6   [1 2 0] 
ff  2d5d:a8e6   []      
a4  808f:320b   [3 6 0] 
ff  808f:320b   []   

So 8138:125a is bouncing through the fast ryzen machine, that's fine and good. d1d4:ebd1 is the remote VPS, and that's a direct peer connection. c711:a531 is actually yggdrasil running inside a virtualbox instance, that can be ignored because it's not even seeing the wired hosts, it only knows about the VPS. 2d5d:a8e6 is also being routed through the fast ryzen machine.

And then we come to 808f:320b, which is my android phone, and.. for some reason that's going out through the internet, to the VPS, and then coming back. That should never happen.

It's not just my phone, either. I saw this happen between two laptops, earlier.

I can force it by restarting yggdrasil on my local machine. Now, the routes between wifi devices are bouncing through the raspberry pi. If I bring down yggdrasil on the raspberry pi, rather than switching to the ryzen machine, some routes are now going through the VPS. I understand that the route selection will switch over time to whichever route has the lowest latency, but switching to a peer with an order of magnitude higher latency and then sticking with that is not great.

Here's what it looks like after I shut down the pi (ryzen is still peer 1, and the VPS is still peer 3):

                           Public Key                                             IP Address                 Path   
77  8138:125a   [3 1 0] 
74  d1d4:ebd1   [3 0]   
6b  c711:a531   [3 7 0] 
6b  2d5d:a8e6   [1 2 0] 
ff  2d5d:a8e6   []  

2d5d:a8e6 and 8138:125a are both laptops that are on wifi, but packets to one are routed locally through the ryzen machine, and packets to the other bounce out through the VPS. Even around 30 minutes later (long after getpeers no longer shows the route until I manually ping the host), packets bound for 8138:125a are being routed out through the internet.

I don't know if this is a bug with the latency selection, or it's expected behavior and after some amount of time it will fix itself, but.. Ideally, I'd like yggdrasil to either recognize that a peer has an order of magnitude more latency than the other peers and only use that slower peer as a last resort, OR I'd specifically like a way to tell yggdrasil "use this peer as a last resort only", OR yggdrasil might be smart enough to realize that peers discovered via multicast are probably faster/preferable to peers manually specified in yggdrasil.conf. (That last one is kind of iffy.)

dilinger commented 1 year ago

Oh, and as far as yggdrasil versions: all hosts are using 0.4.7-1~bpo11+1 on Debian except for the pi, which has 0.4.6-1~bpo11+1, and the phone, which has 0.4.3-15-g42d4298 from f-droid.

It's now been 35 mins since I filed the bug, and at least an hour since I shut down the pi's yggdrasil, and the path for 8138:125a STILL routes it over the internet rather than staying on the local lan. :(

xlmnxp commented 1 year ago

If there one of Peers connect to Wide Yggdrasil Network then all other Peers will be available for WYN

dilinger commented 1 year ago

Thanks, but I don't want to join the wide network; this is a private network.

15 hours later, I boot up the laptop using 8138:125a without touching yggdrasil on any of the peer machines, and:

77  8138:125a   [3 4 0] 

It's still going out over the internet.