Open arhue opened 2 years ago
I am working around the issue with VRRP and it seems to be working well, but having failover at L3 in Zerotier should be much better.
This seems like something that would also be valuable with a priority, so you could say 198.168.192.80 is the primary (say priority 10), and 192.168.192.217 is the failover (priority 0). ZeroTier would then only ever keep the highest priority route (of the online nodes).
One challenge is ZeroTier itself doesn't know if a node is up or down. It's a peer to peer system. Down according to who? I think VRRP, OSPF, etc are a good work-around.
Hm... Last night I was thinking this might be useful even if the failover was in a peer-to-peer scope (e.g., peer A sees that its route to X via peer B is unavailable because there's not a working connection to peer B, so peer A switches over to peer C).
In the morning light, VRRP and OSPF might lead to a better result than what ZeroTier can directly provide anyways (because the routing of the remote LAN into ZeroTier -- assuming this isn't just a masquerade -- would need to know which node is up). That said, I've never used those tools, and it's not super clear how to make such a solution work -- do you run two ZeroTier nodes that impersonate each other but only one is ever on? How do you make that happen with VRRP? etc.
This might be something worth writing a guide for. As a general point of feedback, I think ZeroTier while often superior in function is lacking guides. e.g., https://tailscale.com/kb/1115/subnet-failover/ describes how this can be accomplished with TailScale -- in a masquerading setup, which granted, is not what I want. This is closer to what I want (and have running on ZeroTier): https://tailscale.com/kb/1214/site-to-site/ ... however I'd like failover (because of unforeseen circumstances like https://github.com/zerotier/ZeroTierOne/issues/2105).
i.e., it seems one quality option would be to have a guide introducing a site-to-site setup and a guide on using site-to-site setup with a failover (and any ZeroTier specific integrations/pieces).
Another might be, having this functionality built in for the ZeroTier side (as it seems to be in TailScale) and requiring (for site-to-site setups) whatever router is there be responsible for managing its own routing table.
iBGP and a route reflector or two (or route server if using eBGP) would be one way of doing it but a node would have to be designated as the reflector which means SPOF so would ideally need two of them. Also the hassle of configuration. If you use things like peer groups and BGP listen range then it simplifies the config somewhat.
see also #2223
I have Zerotier deployed on 2 routers and have same managed route to them since I want the other one to take over when 1 fails. But when I take down one of the routers, the managed route is not withdrawn.
Screenshot 1: Member down.
Screenshot 2: Route on Zerotier UI still active.
Screenshot 3: Route on Windows still pointing to member which is down, even after restart.
Route to be automatically removed when member goes offline.
Route is not getting removed when member goes offline.
Setup Zerotier on 2 member nodes with same managed route to them. Then turn off one of them.
See above.
Zerotier version 1.6.6 on Mikrotik routers. Zerotier 1.10.1 on Windows host.