Open JonTheNiceGuy opened 3 years ago
After having a chance to look at this, I can see where the issue lies now!
The way the DO Floating IP is implemented means that the packets initiated at the interface exit with the IP based on the next hop route (e.g. if you've got an IP of 192.0.2.1 and an IP of 203.0.113.1, and your default route would use the 203.0.113.1 address, then the packet will have a source of 203.0.113.1).
I suspect that because we're using UDP with Nebula, it's treating each responded packet as a new connection, and thus selects the non-floating IP as the source.
As I don't write code in go, I don't know how Nebula packets are presented in the code, but it feels like maybe the response is just saying "reply out of that interface" rather than "reply using that IP", and so a possible fix (if that is the case) would be to check what the target IP is, and then create the response using that IP?
Of course, I might be entirely wrong, but DO's work-around is to change the default route on your host, from the device-specific IP, to the floating IP's "anchor IP" (basically a NAT-interface IP). This does, indeed, solve the issue in the short-term (if you want to use a floating IP rather than a dedicated IP)... but YMMV.
One final thought for those who really don't want to mess with things like this, is that you could use Policy Based Routing for anything targeting the floating IP and the port for your nebula service... but ugh, I can't even imagine how you'd feel about troubleshooting that at 2AM when Bob from accounting can't access his management portal.
Is it possible to explicitly bind to a specific or series of specific IP addresses to test this out?
listen:
host: 1.1.1.1
Nebula isn't selecting the source ip to send from, the kernel is. Listening on the specific ip or adjusting the routes in use for nebula sound like the appropriate fix.
Floating IPs have a few drawbacks as well, this being one, another being that you can apparently attach them to multiple machines which will also break nebula traffic.
I ended up reverting the routing change I made, because outbound email stopped working at that point! But, for the short-term it worked fine. I'm now addressing each node directly, by using the non-floating IP. Is it worth having a "Troubleshooting FAQ" somewhere, and adding this as an item?
Is it worth having a "Troubleshooting FAQ" somewhere, and adding this as an item?
Absolutely! The question really comes down to where we want to host the said FAQ. We will be discussing this internally next week.
This floating ip routing bite me as well. As this lighthouse is dedicated machine I added netplan config to route all traffic to go trough this floating ip. After this Nebula worked like a charm.
#/etc/netplan/99-custom-route.yaml
network:
version: 2
ethernets:
eth0:
routes:
- to: 0.0.0.0/0
via: <the actual default route>
from: <floating-ip-here>
Reconfiguring routing fixed me up, too.
I couldn't use the netplan
tip, but it led me to the official DigitalOcean docs for routing outbound traffic over a droplet's reserved IP. (Note that "floating IPs" have been renamed to "reserved IPs".) Thanks for the lead, @Troyhy!
https://docs.digitalocean.com/products/networking/reserved-ips/how-to/outbound-traffic/
I have an issue with Digital Ocean with a Floating IP address, where packets bound for the Internally presented address for the floating IP (an RFC1918 address in the range 10.16.0.0/16) is responded to using the public (non-RFC1918) "real" address for the VPS.
----- Full detail -----
As mentioned on slack, I have the following Nebula environment:
Lighthouse ("VPS") configuration file:
Remote node ("debianqnap") configuration file here:
Note: For sanitization purposes, assume "REAL_IP_AS_0.0.0.0_FORMAT" and "FLOATING_IP_AS_0.0.0.0_FORMAT" are both non-RFC1918 address (e.g. 123.123.123.123), but "FLOATING_INTERNAL_IP_AS_0.0.0.0_FORMAT" is an RFC1918 address (e.g. 10.1.1.1). "REMOTE_IP_AS_0.0.0.0_FORMAT" is the IP address outside the NAT home DSL for the remote node. Fingerprint has also been masked by replacing the bulk of the start of the fingerprint with "1"'s
When the remote node tries to connect to the lighthouse with the floating IP address in the configuration file, using
journalctl -xefu nebula
, I see in the debianqnap:However, if I change this to the "real" IP (avoiding the floating IP), I get this:
With the floating IP as the target, and running tcpdump, I see the following:
With the real IP as the target, running tcpdump:
Here's the result of running
ip -4 addr
on the VPS:Digital Ocean have replied when I've asked for their advice as follows: Our floating ip addresses are set up in a way that the droplet operates without actually recognizing that the floating ip address exists. It sits in front of this droplet and is connected via an anchor ip that is pre-configured on each droplet. The most common reason that users run into errors with connecting via floating ip is due to the application set up to listen to this floating ip, rather than listening to this anchor ip address.