Closed lamarios closed 4 years ago
IMHO it's the LAN nodes (behind firewall? NAT?) that should have punchback set to true. I have a similar setup and had initially similar problems. By setting punchback to true on the LAN nodes, the problem was solved.
All the nodes have the same config (except the light house and the certificates) so punch_back is set to true.
I noticed the following in your C node config
IMPORTANT: THIS SHOULD BE EMPTY ON LIGHTHOUSE NODES
hosts:
- "192.168.42.99"
Since C node is not a light house, you should comment out these lines (as it says in the comment above). At least ... that's what I did on all non lighthouse nodes and my setup works.
It says it should be empty on lighthouse nodes. C is not a lighthouse.
I'll give it a try anyway.
Please disregard my last comment, I was mistaken. You DID comment out the static host on your light house node and left it in on the other nodes. This is similar to what I have. My apologies.
I have the same issue. The lighthouse has a public IP at Digital Ocean. I have two nodes at different locations both behind NAT. If I try and ping the nebula private address of the other I can see in the logs that they are both trying to create a tunnel. And they are both trying to send Handshake messages on each others internal docker addresses.
level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3364254650 remoteIndex=0 udpAddr="192.168.200.2:4242" vpnIp=10.22.0.21 <-- LAN Interface
level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3364254650 remoteIndex=0 udpAddr="172.17.0.1:4242" vpnIp=10.22.0.21 <-- Docker Interface
level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3364254650 remoteIndex=0 udpAddr="172.18.0.1:4242" vpnIp=10.22.0.21 <-- Docker Interface
level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3364254650 remoteIndex=0 udpAddr="<external-ip-here>:60892" vpnIp=10.22.0.21 <-- External Interface
level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3364254650 remoteIndex=0 udpAddr="192.168.200.2:60892" vpnIp=10.22.0.21 <-- LAN Interface
level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3364254650 remoteIndex=0 udpAddr="172.17.0.1:60892" vpnIp=10.22.0.21 <-- Docker Interface
level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3364254650 remoteIndex=0 udpAddr="172.18.0.1:60892" vpnIp=10.22.0.21 <-- Docker Interface
And that is properly what you see in your logs to @lamarios The 172..x.x.x addresses.
If you have a lot of docker containers running, setting local_range
and punch_back
is a must. Otherwise it would take like forever for one machine finding a path to another.
@nfam local_range
has to be set to the overlay subnet right ?
I don't run nebula in a container. I tried anyway to set the local_range to 192.168.1.0/24 but that didn't help. On which node should i set the local range ? Only the lighthouse ?
@lamarios That was not what i meant.. Do you run other containers on the nebula node ?
Only on my node A. light house and B are not (nodes from the LAN) the node from outside the lan (C) is not running any). C can only reach light house but not A nor B
I manage to "fix" the issue by setting up fixed port on the node I want to connect often, opening port on my router for that and adding in the host list on node C.
Could it be an issue if the network of C is using same network ip range as my home LAN ? (192.168.1.0/24)
@Kerwood local_range
is the real (usually physical) network that you want your nebula traffic to run through.
@lamarios
Could it be an issue if the network of C is using same network ip range as my home LAN ? (192.168.1.0/24)
Nebula network cannot use the same ip range as you home LAN.
it is not, Maybe I was not clear sorry. Nebula IPs: A - 192.168.42.198 B - 192.168.42.200 C - 192.168.42.10 Light house - 192.168.42.99
Physical LAN range for C: 192.168.1.0/24 (office network) Physical LAN range for A,B,Light house 192.168.1.0/24 (home network)
@nfam Setting local_range
on a rather static server is easy. But setting it on a more dynamic machine like a laptop, where the subnet change from place to place is not optimal.
@Kerwood that's why whitelist/blacklist network interface #52 is superior to local_range
. From network interface, nebula should easily get IP range.
@nfam So whitelist/blacklist interfaces is an upcoming feature ?
how does nebula handle local_range
s that overlap private ip space but are not the same network?
If I have 3 nodes on network A - 192.168.1.0/24 and 2 nodes on network B - 192.168.1.0/24
how does it handle discovery?
I would think you would want a unique network ID as well as a CIDR address
something like local_range: "7:192.168.1.0/24"
and local_range: "4:192.168.1.0/24"
and the 7 and 4 would be the network ID, so that 2 distinct but overlapping private ranges can be used.
that would allow nodes that are on network 4 to discover each other and not have to try and discover network 7 addresses because they are not on the same LAN
Nebula will notice and attempt the best looking local path first, if it fails to stand up a tunnel it will begin handshaking with the other known/learned ip addresses. This is why having a lighthouse on the internet is effectively a requirement, unless you use static_host_map
.
@nbrownus Why would nebula fail to setup a tunnel? I can see my nodes trying alot a handshaking on some public and private ip addresses but they never connect. They just connect to the lighthouse.
Is nebula meant to be used to connect NodeA (LAN-A with Internet) and NodeB (LAN-B with Internet) to each other?
@lamarios Your issue is potentially two-fold:
The way I've troubleshooted my Nebula setup is by looking at each node's log for handshake messages sent vs handshake messages received. If A and B are sending handshake messages and C doesn't receive them (and vice versa), then that specific Nebula flow's not going to work.
@lamarios set your C Config section from
static_host_map:
"192.168.42.99": ["ftpix.com:4242"]
am_lighthouse: false
interval: 60
hosts:
- "192.168.42.99"
to
static_host_map:
"192.168.42.99": ["ftpix.com:4242"]
lighthouse:
am_lighthouse: false
interval: 60
hosts:
- "192.168.42.99"
Hmm, not sure why my original post is like that, i must have messed up my copy/paste but my config is actually set as you're telling me to.
@lamarios Have you taken a look at the A,B and C logs for handshake messages? The ones going to the lighthouse should show success. But what about messages to the other nodes?
ping from C (outside home network) to B logs on C
Jan 06 10:39:44 gz-t480 nebula[6656]: time="2020-01-06T10:39:44+08:00" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3435170973 remoteIndex=0 udpAddr="172.17.0.1:51930" vpnIp=192.168.42.203
^CJan 06 10:39:45 gz-t480 nebula[6656]: time="2020-01-06T10:39:45+08:00" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3435170973 remoteIndex=0 udpAddr="10.244.1.0:51930" vpnIp=192.168.42.203
Jan 06 10:39:46 gz-t480 nebula[6656]: time="2020-01-06T10:39:46+08:00" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3435170973 remoteIndex=0 udpAddr="10.244.1.1:51930" vpnIp=192.168.42.203
Jan 06 10:39:48 gz-t480 nebula[6656]: time="2020-01-06T10:39:48+08:00" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3435170973 remoteIndex=0 udpAddr="192.168.1.203:4242" vpnIp=192.168.42.203
logs on B
2:39:36 k8-node-3 nebula[7026]: time="2020-01-06T02:39:36Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1608714499 remoteIndex=0 udpAddr="183.171.67.174:40536" vpnIp=192.168.42.10
Jan 06 02:39:38 k8-node-3 nebula[7026]: time="2020-01-06T02:39:38Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1608714499 remoteIndex=0 udpAddr="172.20.10.2:4242" vpnIp=192.168.42.10
Jan 06 02:39:41 k8-node-3 nebula[7026]: time="2020-01-06T02:39:41Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1608714499 remoteIndex=0 udpAddr="172.18.0.1:4242" vpnIp=192.168.42.10
Jan 06 02:39:45 k8-node-3 nebula[7026]: time="2020-01-06T02:39:45Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=2437276409 remoteIndex=0 udpAddr="202.187.183.124:4242" vpnIp=192.168.42.10
Jan 06 02:39:45 k8-node-3 nebula[7026]: time="2020-01-06T02:39:45Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=2437276409 remoteIndex=0 udpAddr="202.187.183.124:4242" vpnIp=192.168.42.10
Jan 06 02:39:46 k8-node-3 nebula[7026]: time="2020-01-06T02:39:46Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=2437276409 remoteIndex=0 udpAddr="202.187.183.124:4242" vpnIp=192.168.42.10
202.187.183.124 being C's public address
I initially prepared a few questions, but then went back up the history of the issue and noticed you mentioned that the lighthouse is in the same LAN as nodes A and B. I take this to mean that A and B don't have to traverse NAT to reach the lighthouse. This could be problematic; here's why:
Based on my observations and testing I think that as nodes connect to the lighthouse, they send information about their local IP addresses to the lighthouse. If they traverse a NAT to reach the lighthouse the lighthouse also keeps track of the public IP and port the handshake came from. Then, when nodes want to connect to other nodes outside their LAN they can get the reachability info for other nodes from the lighthouse. They'll then try all IPs in succession, hoping that one of them will be reachable and respond to a handshake request. However, if you've got some nodes on the same LAN as the lighthouse they will never use NAT to reach the lighthouse, and therefore the lighthouse can never learn what NAT'ted IP those nodes might be behind in order to relay it to other nodes outside the LAN. I think this is why they suggest putting the lighthouse out onto the Internet, so that every node that needs to talk to it goes through a NAT.
That being said, I think that only one of the nodes in a pair of nodes that want to communicate with each other need to be reachable; if one can reach the other one successfully they should be able to set up a bidirectional tunnel. Therefore, we can focus on B trying to reach C, as we know the reverse is likely not going to work based on the explanation above.
Is C's public IP of 202.187.183.124 configured directly on C, or is it a NAT'ted IP address that C uses when it goes out onto the Internet? You can see in B's logs that it is trying to reach C at IP address and port 202.187.183.124:4242. Is this an address:port combo that you know for a fact will lead directly to C? Have you tcpdump'ed on C to determine if you actually receive UDP handshake traffic from B?
I see explained like this it makes a lot of sense.
For C public address, it is a NAT'ted IP address that C uses when it goes out on the internet.
I just TCP dumped and I don't receive anything on C from B. So C network probably doesn't let C open ports on 4242.
Thanks for the help, it was very informative
From the lighthouse's point of view C connected to it using source IP 202.187.183.124 and source port 4242, so it basically tells any node asking how to reach C: "You can try reaching C at IP address 202.187.183.124 and UDP port 4242, maybe that'll work for you". So that's what B is trying to do. Now, depending on the type of NAT/firewall behind which sits C, that may or may not work out (it doesn't in your case). In the case of a Full-cone NAT and a permissive firewall that might work. But with other types of NAT that's not likely (see https://en.wikipedia.org/wiki/Network_address_translation#Methods_of_translation for various NAT types).
One thing that would likely work if you have control of the NAT/firewall box behind which sits C is to set up port forwarding so that incoming traffic destined to external IP 202.187.183.124 and port 4242 redirects to C's internal IP and port 4242. However, if you've got more than one Nebula node behind that NAT/firewall you'll have to set up other port forwarding, and you obviously won't be able to re-use port 4242, so you'll have to configure Nebula on those other nodes to bind to another port and hope that the NAT you're behind will also keep that source port intact once the packet goes out on the Internet.
Yeah that's more or less what i've done for the nodes I need to access the most, for the one I can't access it's only to ssh to them so I use proxy jumps. I don't control the network for C but I do for network Lighthouse, A & B,
So I opened a different port for A that I need to access often on the router and use it as a known host in C's config.
Thanks for your help !
I have a bunch of computers on my LAN with one light house that is accessible from the outside world Lighthouse: 192.168.42.99 (mydomain.com:4242) Lan Machine 1 (A) : 192.168.42.200 Lan Machine 2 (B): 192.168.42.203
Outside lan machine (C): 192.168.42.10
using the 192.168.42.0 IPs:
Light house config:
C config:
Logs from C: