slackhq / nebula

A scalable overlay networking tool with a focus on performance, simplicity and security
MIT License
14.57k stars 980 forks source link

Lighthouse can connect to everything; other nodes can't connect cross-WAN #518

Closed robpomeroy closed 3 years ago

robpomeroy commented 3 years ago

My home network uses a pfSense router to "balance" broadband and mobile data circuits. Rules on the pfSense router force my work laptop to use the 4G circuit by default, with everything else using broadband.

image

Port forwarding allows internal connections to a lighthouse VM running within my LAN. I have three nebula nodes: two at home and one on a remote (work) network.

My home machines can connect to each other, but the remote node at work cannot connect to the home nodes (and vice versa).

image

I enabled punchy > punch and punchy > respond, just in case. I temporarily disabled firewalls on all Windows nodes and enabled "allow all" inbound and outbound in the Nebula configs..

The lighthouse can connect to everything, without issue. I believe that when my personal and work laptops connect to each other, they are effectively routing out of the LAN and back in again (sort of like hairpin NAT), but could be wrong. I don't know if Nebula is capable of local discovery though?

I'm seeing lots of "Handshake timed out" messages in the Nebula output for the failing connections.

Is Nebula capable of routing over this network, despite the double-NAT situation (pfSense router, plus my WAN routers)? Have I missed an obvious step?

Could be duplicate of #85?

PS Given my use of pfSense, I have tried this, to no avail: https://blog.ktz.me/punching-through-nat-with-nebula-mesh/

wildardoc commented 3 years ago

Someone else may know better but I believe your problem is that since your lighthouse is inside your lan, when it hands out the "here is how I talk to the host" info to your machine at work, it is probably giving out a port and IP that is on the lan. The work computers wouldn't be able to use those connection details. However that isn't to say that if that is cleared up that the firewall at work won't be an issue.

I'd move the lighthouse to an address outside your home and office lans first and see if that works.

On Wed, Sep 1, 2021 at 10:05 AM Rob Pomeroy @.***> wrote:

My home network uses a pfSense router to "balance" broadband and mobile data circuits. Rules on the pfSense router force my work laptop to use the 4G circuit by default, with everything else using broadband.

[image: image] https://user-images.githubusercontent.com/736624/131694561-7e3b6a23-1fe4-4edd-a6fb-925995fb5976.png

Port forwarding allows internal connections to a lighthouse VM running within my LAN. I have three nebula nodes: two at home and one on a remote (work) network.

My home machines can connect to each other, but the remote node at work cannot connect to the home nodes (and vice versa).

[image: image] https://user-images.githubusercontent.com/736624/131694847-ffd8486d-3721-460e-a53e-338cf8fc248c.png

I enabled punchy > punch and punchy > respond, just in case. I temporarily disabled firewalls on all Windows nodes and enabled "allow all" inbound and outbound in the Nebula configs..

The lighthouse can connect to everything, without issue. I believe that when my personal and work laptops connect to each other, they are effectively routing out of the LAN and back in again (sort of like hairpin NAT), but could be wrong. I don't know if Nebula is capable of local discovery though?

I'm seeing lots of "Handshake timed out" messages in the Nebula output for the failing connections.

Is Nebula capable of routing over this network, despite the double-NAT situation (pfSense router, plus my WAN routers)? Have ai missed an obvious step?

β€” You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/slackhq/nebula/issues/518, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIK3CW4TPRSNLHU5DCQ4IOTT7Y6MVANCNFSM5DGWKKPA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

robpomeroy commented 3 years ago

@wildardoc I moved the lighthouse node to AWS this morning. All nodes can connect to the lighthouse, but still not to each other, alas (handshakes time out). I presume my double-NAT at home continues to present an insurmountable problem for Nebula?

Thanks for the suggestion though - it was worth trying.

What's the significance of "state:dead"? This is the node at work, behind an ordinary single DrayTek router:

INFO[2341] Tunnel status                                 certName=SEC-1-Win10 tunnelCheck="map[method:active state:dead]" vpnIp=192.168.120.4

If I bring up my work-home VPN, I can then connect between home and work nodes. But that's presumably because I'm then using the work router as my default gateway for my nodes at home. That VPN is exactly what I'm trying to deprecate. πŸ™‚

robpomeroy commented 3 years ago

Huh. I brought up a second node on my home LAN and now everything can connect to everything else (without the VPN). I swear this wasn't working before (when my personal laptop was switched off).

wildardoc commented 3 years ago

I've noticed in the past that I've had to reset once in a while.

On Thu, Sep 2, 2021 at 3:58 AM Rob Pomeroy @.***> wrote:

Huh. I brought up a second node on my home LAN and now everything can connect to everything else (without the VPN). I swear this wasn't working before (when my personal laptop was switched off).

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/slackhq/nebula/issues/518#issuecomment-911416353, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIK3CW6YDHZRGM3X5P4WGMLT744CTANCNFSM5DGWKKPA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

robpomeroy commented 3 years ago

@wildardoc - that was a good shout. I added an Android node over the weekend and it couldn't see any other node until I restarted the lighthouse. Weird.

wildardoc commented 3 years ago

I like it for home. Everything seems good but once in a while one machine has a problem and I reset the lighthouse and all is good again. It would likely redistribute the keys at that point. If you reuse a ip with new keys I would expect there would be issues as well. I'm not a developer on this I've just been using it for networking between co-workers while teleworking since March last year. Actually we had to leave nebula for work and switch to a different project that handled the firewalls at the office better.

On Mon, Sep 6, 2021 at 9:17 AM Rob Pomeroy @.***> wrote:

@wildardoc https://github.com/wildardoc - that was a good shout. I added an Android node over the weekend and it couldn't see any other node until I restarted the lighthouse. Weird.

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/slackhq/nebula/issues/518#issuecomment-913684327, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIK3CWZXV2ONBVKD2KT3IULUATEOBANCNFSM5DGWKKPA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

robpomeroy commented 3 years ago

@wildardoc Agreed - I had a rock-solid day yesterday of fast RDP to a remote WIn10 VM, thanks to Nebula. Can't recall achieving this speed and stability with any VPN.

Since Slack is using this at (massive) scale, I assume there must be a way of fettling it such that it doesn't display these symptoms. It's a really promising start for me.

Thanks for your helpful input!