ntop / n2n

Peer-to-peer VPN
GNU General Public License v3.0
6.27k stars 943 forks source link

connected to supernode ,cannot ping other edge nodes #975

Closed felix-git-hub closed 2 years ago

felix-git-hub commented 2 years ago

with version 3.1, 1 supernode with nearly 7 edge nodes, only 1 node sometimes cannot connect to other node, while other nodes can ping each other successfully.

sometimes only reboot can solve this problem,

N2N_KEY="abc"  /usr/sbin/edge -c community  -a IP   -l supernode -A3 -H  -d edge1 -m  mac  -i 5 -I nikename -t 5662 -p 50001

 ### | TAP             | MAC               | EDGE                  | HINT            | LAST SEEN |     UPTIME
=============================================================================================================
SUPERNODE FORWARD
   1 | XXXXXXX   | XXXXXXX | XXXXXXX   | XXXXXXX    |        58 |
   2 | XXXXXXX   | XXXXXXX | XXXXXXX   | XXXXXXX    |        58 |
   3 | XXXXXXX   | XXXXXXX | XXXXXXX   | XXXXXXX    |        58 |
-------------------------------------------------------------------------------------------------------------
PEER TO PEER
-------------------------------------------------------------------------------------------------------------
SUPERNODES
3.1.0.r1073.03ce1e2 l* | XXXXXXXXXXX | XXXXXXXXXXXX   | load =       16 |        30 |     695185
=============================================================================================================
uptime 135 | pend_peers 3 | known_peers 0 | transop 2,1
super 2,1 | p2p 0,0
last_super 3 sec ago | last_p2p 1648735943 sec ago
Logan007 commented 2 years ago

One thing that sticks a bit out to me is the -i 5. Could you try to use -S2 (TCP – not working on Windows) instead?

Furthermore, I assume that all other nodes also have header encryption -H enabled.

Also, is the supernode local or public address?

felix-git-hub commented 2 years ago

i wrote a scipt to monitor the connection, if ping failed it will restart, after nearly 1 hour retry, the connection rebuilt. but this situation bother me a lot. -H are enabled on all the node. supernode is public address with ddns

Logan007 commented 2 years ago

I suspect a connection problem then. Have you tried TCP connection -S2 (and omitting -i 5) with that edge? Does it change anything?

Is there anything special about this edge's network situation? So, there is a NAT between this particular edge and the supernode? Or even more levels of NAT? The edge and the supernode do not share the same network, right?

felix-git-hub commented 2 years ago

I tried -S2 yestoday, so far so good. the supernode is in a NAT network, and is forwarded to public by openwrt, however, other nodes outside the NAT can connect to the supernode.
I think the network is ok. Since I also built a supernode server on the special node.

N2N network 1: A supernode - B openwrt- C node/ other public nodes, N2N network 2: A node - C supernode - other public nodes

(A is behind B, B &C have public address) A and B have both supernode & edge servers, N2N network 2 is stable, while N2N network 1 sometimes failed for the C server

Logan007 commented 2 years ago

Is this one edge in the same network with the supernode?

Do I understand correctly that you use 2 supernodes? Then you should definitely use latest dev (or release 3.1.1).

felix-git-hub commented 2 years ago

on the same server

On 4/1/2022 19:37,Logan oos @.***> wrote:

Is this one edge in the same network with the supernode?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Logan007 commented 2 years ago

on the same server

Ah, I remember issues with this situation reported earlier. The only additional advice (apart from -S or -S2) I can give is to make sure that -l supernode does not point to local loop 127.0.0.1 but LAN address (or even public). But maybe -S (UDP) or -S2 (TCP) are already working well for you.

felix-git-hub commented 2 years ago

I tried to ping from the node to the supernode ,about 3% - 4% loss rate,does this affect the conection? I tried -S2, it seems to be stable

Logan007 commented 2 years ago

I tried to ping from the node to the supernode ,about 3% - 4% loss rate,does this affect the conection?

n2n uses UDP for underlying transmission by default. On some routes around the world, I experience a UDP packet loss, too. UDP can "legally" be dropped, e.g. when other traffic loads are too high. So yes, this can happen. And it will especially affect UDP connection over n2n, e.g. some VoiP (UDP over UDP then), or ping (ICMP over UDP), because neither UDP nor ICMP require re-sending packets in case of loss.

TCP however takes care of lost packets by requiring re-transmission. So, if you mainly have TCP connections over n2n, you will be good (TCP over UDP). Same holds true for all protocols, if you use TCP as underlying protocol (-S2, everything over TCP). Note that n2n curently uses TCP only in conjunction with supernode-forwarding, so no peer-to-peer then for that edge anymore (which should be very acceptable if supernode and edge reside on same server).

As a funny side note, in some cases, you could see real ICMP pings between the physical network interfaces just working fine with no loss at all while the n2n's UDP-encapsulated show some drops.

felix-git-hub commented 2 years ago

with -S2 the log show buffer overflow detected : terminated why this happen?

N2N_KEY="XXX" /usr/sbin/edge -c community -a IPADDRESS -l supernode -A3 -H -d edge1 -m macaddress -I name -t 5662 -S2

Logan007 commented 2 years ago

buffer overflow detected : terminated

why this happen?

The only explanation is that the TCP buffer overflows which is absolutely not supposed to happen.

TCP transmission prepends the number of following bytes before sending them. We have included numerous checks on the size for that the case you observed does not happen... We will need to investigate further then.

Does it happen on a regular basis? Are you able to trigger this by some special situation? Does it also happen at the supernode?

felix-git-hub commented 2 years ago

i initiated this command by crontab( with sudo crontab ), however if I run this command directly with sudo bash, this will not happen. does this have anything to do with this situation?

Logan007 commented 2 years ago

crontab does this have anything to do with this situation?

Hard to say... and definitely an interesting observation!! But if that's the only difference you make, it probably is the reason for it – although I am not able to explain... :slightly_smiling_face:

Instead of a cron job, maybe use a VPN-ping-and-restart-if-network-not-reachable-anymore script, see here for example (no ping but node count)?

felix-git-hub commented 2 years ago

ok i try to use your script

Logan007 commented 2 years ago

Note that I use a systemd service (n2nEdge) to fire up the edge (the service itself does nothing more than starting the edge and setting the routes). Makes it easier to restart the service if required (as done by the ping script)

Also, this script is run by a another service (vpnPing). It checks if at least 2 nodes can be found (-le 2 part, twice). There are simpler solutions just using a ping out there...

Note that it also takes care of the same route as set in my n2nEdge script because I sometimes found the device still being up but the route gone... If you do not work with additional routes, you can of course ignore that part.

felix-git-hub commented 2 years ago

it seems cron service cannot start edge normally, somehow i cannot identify the reason.

felix-git-hub commented 2 years ago

I use script to monitor, so far so good `

ping -c2 192.168.1.2 >>/dev/null
result=$?
if [[ "$result" -eq 0  ]];then
        echo "$(date) ping ws ok"
else
        kill  $(pgrep -f "edge.*network" )
        sleep 2
        #start edge here
        echo "$(date) start  edge"
fi

`

felix-git-hub commented 2 years ago

50 packets transmitted, 48 received, 4% packet loss, time 49091ms i try to ping the server, 4% packet loss rate ,does this affect the conection? On 4/1/2022 20:36,Logan oos @.***> wrote:

on the same server

Ah, I remember issues with this situation reported earlier. The only additional advice (apart from -S or -S2) I can give is to make sure that -l supernode does not point to local loop 127.0.0.1 but LAN address (or even public). But maybe -S (UDP) or -S2 (TCP) are already working well for you.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>