ntop / n2n

Peer-to-peer VPN
GNU General Public License v3.0
6.29k stars 943 forks source link

Improve the stability of n2n-v2 #93

Closed lucktu closed 5 years ago

lucktu commented 5 years ago

Run edge_v2 in A\B\C\D\E (linux or based on linux, for example, padavan x3 , openwrt x1, ubuntu x1)

Ping each other A --> E ok B --> E ok, but A --> B blocked sometimes, the probability is about 20%

At the same time, n2n_v1 (or meyerd's n2n) is less than 1%.

emanuele-f commented 5 years ago

With edge_v2 you mean the version of n2n from this repository, right? Please specify if the problem occurs with twofish, aes or both. Are the embedded systems you are using big endian or little endian? Can you experience the issue with two ubuntu hosts?

lucktu commented 5 years ago

Edge_v2 in A, other machines are similar. `edged -d A001 -a 172.0.1.11 -c test -k test -l n2n.lucktu.com:10086 -p 31381 -brEfA -t 30100 & # n2n_v2_dev, the newest in this repository.

edge2 -d A002 -a 172.0.2.11 -c test -k test -l n2n.lucktu.com:10086 -p 31382 -brEf -t 30100 & # n2n_v2_master`

Their stability is similar (n2n_v2_dev / n2n_v2_master)

This is my system (padavan): BusyBox v1.24.2 Linux Youku-L1 3.4.113 #1 Sat Apr 13 09:36:55 CST 2019 mips GNU/Linux

I have no condition to experiment between two ubuntu hosts

Thank you !

ghost commented 5 years ago

Try the meryed edition which works much better on my mixed Windows/Debian/VMware mix:

https://github.com/meyerd/n2n

emanuele-f commented 5 years ago

@lucktu so you are using the AES version (-A option). We've done some changes to the implementation to provide better security (see #72), so please ensure that all the nodes (including the windows/ubuntu nodes) are updated to the latest version from this repository. In order to understand where the issue is, I need to be able to reproduce it. I can try to setup n2n on openwrt but it will take some time. In the meantime, please provide the following information:

In order to improve this software we need your help with the details above, please take your time to perform and document with detail the tests

lucktu commented 5 years ago

@Pummelchen Yes, but n2n_v2_dev provides better speed, and the authorities are updating it, and we have an obligation to work with them to make it better.

emanuele-f commented 5 years ago

Saying that one software is faster than another without providing details while a dev (me in this case) is available for improvements and feedback is not a good way to help in open source. Some time ago we've opened https://github.com/meyerd/n2n/issues/40 to join the forces since it's nonsense to maintain two different repositories with the same underlying codebase. You can see by the commit logs that development is actually going futher in this repository.

lucktu commented 5 years ago

Yes, my method is not professional, is not accurate, can only be a feeling.

Please give me an email, I can provide a machine of padavan to do the experiment, I can change it to the openwrt/lede system if you need to.

lucktu commented 5 years ago

Yes,all the nodes are updated to the latest version from this repository (n2n_dev_v2.5). /opt/home/admin # lscpu | grep -i endian Byte Order: Little Endian ######### Remove the -A option? I provided the second command above, which is the master version, and the result is the same. ######### /opt/home/admin # opkg update Downloading http://bin.entware.net/mipselsf-k3.4/Packages.gz Updated list of available packages in /opt/var/opkg-lists/entware /opt/home/admin # opkg install timeout Unknown package 'timeout'. Collected errors:

lucktu commented 5 years ago

My ping.sh, test every 15 minutes.

for j in 51 52 53 54; do
date "+%H:%M:%S Ping $j host" >> /tmp/net.txt
for i in 0 1 2 3 4 5 6 7 8 9; do
     ping -c 1 10.$i.0.$j >/dev/null 2>&1
     if [ $? -eq 0 ]; then
          echo " $i OK" >/dev/null
     else
          echo "  $i ... XX" >> /tmp/net.txt
     fi
done
done
date "+%H:%M:%S --> over" >> /tmp/net.txt

10 edges in every machine (temporary).
edge1 is v1, ---------------- 0+1 ---------> ( 10.0.0.100 + 10.1.0.100 ) edged is v2_dev, ----------- 2+3+4 edge2 is v2_master, -------- 6+7 edges is meyerd's edge. ---- 5+8+9

edge1 -d N0 -a 10.0.0.100 -c ntop -k test1 -l c0.smsq.ml:10082 -p 37580 -br &
edge1 -d N1 -a 10.1.0.100 -c ntop -k test1 -l c1.smsq.ml:10082 -p 37581 -br &
edged -d N2 -a 10.2.0.100 -c ntop -k test2 -l c0.smsq.ml:10086 -p 37582 -brEfA -t 37500 &
edged -d N3 -a 10.3.0.100 -c ntop -k test2 -l c1.smsq.ml:10086 -p 37583 -brEfA -t 37500 &
edged -d N4 -a 10.4.0.100 -c ntop          -l c1.smsq.ml:10086 -p 37584 -brEf -t 37500 &
edges -d N5 -a 10.5.0.100 -c ntop          -l c1.smsq.ml:10088 -p 37585 -brEf -t 37501 &
edge2 -d N6 -a 10.6.0.100 -c ntop -k test3 -l c0.smsq.ml:10086 -p 37586 -brEf -t 37500 &
edge2 -d N7 -a 10.7.0.100 -c ntop -k test3 -l c1.smsq.ml:10086 -p 37587 -brEf -t 37500 &
edges -d N8 -a 10.8.0.100 -c ntop -k test4 -l c0.smsq.ml:10088 -p 37588 -brEf -t 37501 &
edges -d N9 -a 10.9.0.100 -c ntop -k test4 -l c1.smsq.ml:10088 -p 37589 -brEf -t 37501 &
emanuele-f commented 5 years ago

Thank you for your feedback, please contact me at black.silver@hotmail.it for remote access information. The tests above show the test but not the result, please also post /tmp/net.txt so that I can read some statistics.

lucktu commented 5 years ago

The email has been sent, please check

emanuele-f commented 5 years ago

The connection to the device is really slow and I get disconnected many times, so I cannot perform the test from here. Please perform the following tests and report the ping statistics. Run ping for 30 seconds for test and then press CTRL+C, read the stats and copy them here:

  1. No encryption:

    • edge A: edged -d n2ntest -c n2ntest00 -a 192.168.194.1 -f -l dns.ntop.org:7777
    • edge B: edged -d n2ntest -c n2ntest00 -a 192.168.194.4 -f -l dns.ntop.org:7777
    • ping 192.168.194.1
  2. Twofish encryption:

    • edge A: edged -d n2ntest -c n2ntest00 -k n2ntest01 -a 192.168.194.1 -f -l dns.ntop.org:7777
    • edge B: edged -d n2ntest -c n2ntest00 -k n2ntest01 -a 192.168.194.4 -f -l dns.ntop.org:7777
    • ping 192.168.194.1
  3. AES encyption:

    • edge A: edged -d n2ntest -c n2ntest00 -k n2ntest01 -A -a 192.168.194.1 -f -l dns.ntop.org:7777
    • edge B: edged -d n2ntest -c n2ntest00 -k n2ntest01 -A -a 192.168.194.4 -f -l dns.ntop.org:7777
    • ping 192.168.194.1

Thank you

lucktu commented 5 years ago

Their stability is similar ( No encryption /Twofish encryption / AES encyption), I have did it above.

emanuele-f commented 5 years ago

Please perform the same exact test with edges (without the AES test of course since meyerd does not have it) and post here both the ping results.

lucktu commented 5 years ago

I am sorry that the Internet environment in China is poor, and it is normal that it is not easy to connect. You can only use the domain name or edge_ip (the ones I posted above, e.g. 10.0.0.100) to try more, otherwise, there is no better way.

No encryption:

PING 192.168.194.1 (192.168.194.1): 56 data bytes 64 bytes from 192.168.194.1: seq=0 ttl=64 time=259.235 ms 64 bytes from 192.168.194.1: seq=1 ttl=64 time=259.978 ms ... ... 64 bytes from 192.168.194.1: seq=29 ttl=64 time=261.355 ms

--- 192.168.194.1 ping statistics --- 30 packets transmitted, 30 packets received, 0% packet loss round-trip min/avg/max = 259.159/260.126/261.679 ms

Twofish encryption:

PING 192.168.194.1 (192.168.194.1): 56 data bytes 64 bytes from 192.168.194.1: seq=0 ttl=64 time=1011.185 ms 64 bytes from 192.168.194.1: seq=1 ttl=64 time=260.171 ms ... ... 64 bytes from 192.168.194.1: seq=29 ttl=64 time=260.244 ms

--- 192.168.194.1 ping statistics --- 30 packets transmitted, 30 packets received, 0% packet loss round-trip min/avg/max = 260.171/285.551/1011.185 ms

AES encyption:

PING 192.168.194.1 (192.168.194.1): 56 data bytes 64 bytes from 192.168.194.1: seq=0 ttl=64 time=1011.402 ms 64 bytes from 192.168.194.1: seq=1 ttl=64 time=261.284 ms ... ... 64 bytes from 192.168.194.1: seq=29 ttl=64 time=262.225 ms

--- 192.168.194.1 ping statistics --- 30 packets transmitted, 30 packets received, 0% packet loss round-trip min/avg/max = 260.288/286.555/1011.402 ms

lucktu commented 5 years ago

I don't know programming, I'm just a user. Let me talk about my feelings:

+++++++++++++++++++++

  1. Meyerd's n2n is easy to connect directly while the official n2n is difficult.

  2. After direct connection, the flow through the central node of meyerd is almost zero, while the official v2 still has a considerable part. If the center node is far away, then the ping of meyerd is small and the official one is roughly equal to an edge ping center node. the channel established through n2n, using rsync to synchronize, v1 and meyerd's v2 can, but the official v2 can not.

  3. After a period of time, the connection of meyerd and v1 can still be ping, while the connection of v2 can easily be "disconnected". However, when a new edge run, it is likely to be connected with it.

  4. meyerd: Information of edge is more frequent, while information of supernode is less (-v or -vv)

  5. Pros: the official speed is the faster

Thank you!

emanuele-f commented 5 years ago

I don't know programming, I'm just a user. Let me talk about my feelings: ( In the finishing)

I'm not asking you to write code, just to perform precise tests :) We need precise data otherwise we are talking about nothing.

+++++++++++++++++++++

1. Meyerd's n2n is easy to connect directly while the official n2n is difficult.

2. After direct connection, the flow through the central node of meyerd is almost zero, while the official v2 still has a considerable part. If the center node is far away, then the ping of meyerd is small and the official one is roughly equal to an edge ping center node. the channel established through n2n, using rsync to synchronize, v1 and meyerd's v2 can, but the official v2 can not.

I have asked meyerd if there have been patches to imporve p2p communications on its branch, let's wait his response. However, the fact that rsync is not working at all is a bug, please report the two edge nodes OS version.

3. After a period of time, the connection of meyerd and v1 can still be ping, while the connection of v2 can easily be "disconnected". However, when a new edge run, it is likely to be connected with it.

Can you verify if the issue occurs while the communication stays idle or even with packets in transit? E.g. if you let a ping running in background, do the connection stay up?

4. meyerd: Information of edge is more frequent, while information of supernode is less (-v or -vv)

I have recently reduced the default log verbosity, because supernode registration information were too verbose and printed too often. If you have other suggestions please open a separate issue

5. Pros: the official speed is the faster

And also pro: we've implemented standard AES encryption which is much better than homemade twofish ;)

Thank you!

Thank you for your effort, please provide more information on the points above so that we can improve!

lucktu commented 5 years ago

Thank you very much for your attention, I can't answer all these questions at once, I can only answer those with answers first.

  1. The rsync server is the padavan system (BusyBox v1.24.2 /// Linux Youku-L1A 3.4.113 #1 Sat Apr 13 09:36:55 CST 2019 mips GNU/Linux), while the client has synology (BusyBox v1.16.1 /// Linux bak 2.6.32.12 #5967 Fri Nov 3 17:20:31 CST 2017 armv5tel GNU/Linux synology_88f6281_212j) and ubuntu (16.04x64). -------- I'll leave that aside for now (rsync), because I've been testing on ubuntu now and it's working fine.

  2. It seems that the problem of not being able to connect after a while has been around for a long time. Some of my friends have used constant ping to ensure connectivity. I will provide my test results in the future maybe here. It also takes a long time to verify.

2019-4-29, It has been proven that maintaining a ping in the background provides stability to the network.

  1. It's ok to reduce the default log verbosity, However, perhaps it is meyerd's edge that increases the frequency of connections and guarantees better performance (direct connection and stability)? It's not something I can understand.

  2. Even without the -A parameter, the official v2 is faster than the meyerd's v2, in the same case. before meyerd's answer, take a look at the picture in my article (in Chinese) and guess what it means ^_^. annotation: v2-A:aes, v2-w & v2s-w:No encryption, v2 & v2s:twofish, su:supernode http://www.lucktu.com/archives/771-3.html

emanuele-f commented 5 years ago

I will backport the changes made in the Meyerd repository via https://github.com/emanuele-f/n2n/blob/backport/meyerd_diff.diff. This will hopefully improve n2n stability.

lucktu commented 5 years ago

Well, it's worth looking forward to!

In addition, I hope that these will not affect your work (Work First!), because only in this way can it be sustainable.

Thanks a lot !!

emanuele-f commented 5 years ago

I'm proceeding with small backports as there are issues while mixing the two repos code together. I've found issue #103 which could be related to your use case, please check out the new n2n version and see if this fixes your problem too.

lucktu commented 5 years ago

edge built on May 6 2019 07:57:41

  1. It doesn't work properly without encryption key.

    06/May/2019 08:08:04 [edge_utils.c:835] ERROR: invalid transop ID: 1, expected 2
    
  2. Much harder to p2p? Originally p2p, it is easy to change forwarding. Maybe so? Official V2 is always forward, meyerd's v2 is always p2p, ... ...

  3. The latest changes have no effect on my network improvements.

Thank you for your hard work!

emanuele-f commented 5 years ago

@lucktu please check out https://github.com/ntop/n2n/commit/52d33ed880d6594976de77592d9a247db5e0b131 as with this P2P works properly for me. Regarding the bug without encryption key, I'm making a fix, thank you for reporting it.

Update: please also check https://github.com/ntop/n2n/commit/3aec02d3e64be01c6e3ec5caeed7e61df31cec46

lucktu commented 5 years ago

@emanuele-f

  1. It works very well without encryption key now.
  2. In my environment, it's also easy to p2p.

    This is a very big progress, thank you very much!

    In addition, are there some minor bugs ?

When you ping 10.2.0.X, you will get this message (repeated): Example: (Said: with "218.89.10.138:37582" can establish P2P, does not mean that has established P2P?)

edged -d N2 -a 10.2.0.X-c ntop -k test2 -l c0.smsq.ml:10086 -p 37582 -brEfA -t 37500 > /opt/log/1.txt
... ...
23/May/2019 13:52:21 [edge_utils.c:442] P2P TX connection enstablished: AE:C3:34:12:2D:A6 [218.89.10.138:37582]
23/May/2019 13:52:21 [edge_utils.c:442] P2P TX connection enstablished: AE:C3:34:12:2D:A6 [218.89.10.138:37582]

23/May/2019 13:52:51 [edge_utils.c:876] ERROR: invalid transop ID: expected AES-CBC(3), got twofish(2)
23/May/2019 13:52:51 [edge_utils.c:876] ERROR: invalid transop ID: expected AES-CBC(3), got twofish(2)
23/May/2019 13:52:51 [edge_utils.c:876] ERROR: invalid transop ID: expected AES-CBC(3), got twofish(2)
23/May/2019 13:52:51 [edge_utils.c:876] ERROR: invalid transop ID: expected AES-CBC(3), got twofish(2)
23/May/2019 13:52:51 [edge_utils.c:876] ERROR: invalid transop ID: expected AES-CBC(3), got twofish(2)

23/May/2019 13:52:53 [edge_utils.c:876] ERROR: invalid transop ID: expected AES-CBC(3), got twofish(2)
23/May/2019 13:52:53 [edge_utils.c:876] ERROR: invalid transop ID: expected AES-CBC(3), got twofish(2)
23/May/2019 13:52:53 [edge_utils.c:876] ERROR: invalid transop ID: expected AES-CBC(3), got twofish(2)
23/May/2019 13:52:53 [edge_utils.c:876] ERROR: invalid transop ID: expected AES-CBC(3), got twofish(2)
23/May/2019 13:52:53 [edge_utils.c:876] ERROR: invalid transop ID: expected AES-CBC(3), got twofish(2)

when you run multiple edged, you will get this message (increase)

23/May/2019 13:02:32 [edge_utils.c:876] ERROR: invalid transop ID: expected null(1), got twofish(2)
23/May/2019 13:02:32 [edge_utils.c:876] ERROR: invalid transop ID: expected null(1), got twofish(2)
23/May/2019 13:02:32 [edge_utils.c:876] ERROR: invalid transop ID: expected null(1), got twofish(2)
23/May/2019 13:02:32 [edge_utils.c:876] ERROR: invalid transop ID: expected null(1), got twofish(2)

My question is coming to an end and I am very satisfied. Thanks again!

lucktu commented 5 years ago

Ping is faster to establish p2p, while FTP is not.

emanuele-f commented 5 years ago

You are welcome. We need to thank @realjiangms for his contribution on this!

23/May/2019 13:52:51 [edge_utils.c:876] ERROR: invalid transop ID: expected AES-CBC(3), got twofish(2) 23/May/2019 13:02:32 [edge_utils.c:876] ERROR: invalid transop ID: expected null(1), got twofish(2)

This happens if you have edge nodes in the same community but using different encryptions. To fix this, you need to ensure that all the edge nodes in a community use the same encryption mode (twofish, aes, or plaintext).

Ping is faster to establish p2p, while FTP is not.

Do you have some statistics for this?

lucktu commented 5 years ago

I agree with your first two points.

I know by doing this test about that Ping is faster to establish p2p while FTP is not.

A ping C : 200ms ( supernode: C, edge: A & B ) B ping C : 200ms A ping B : 10ms

When you use the latest n2n_V2_DEV for FTP transfer data, you will find that the ping will soon reach 10ms, but at the same time the data is almost all forwarded via super (by looking at the traffic on the super). It will take about 2 minutes, or even more than 10 minutes, and the data will no longer go through the super, and the speed of FTP will also come up. If you've stopped using edge for more than an hour, you may never use P2P.

This is the test result within 2 minutes, when v2-dev has not established p2p yet (-W: No encryption) image

Here's how I look at the traffic on supernode (Padavan Router C, traffic-A also comes from padavan A) image

lucktu commented 5 years ago

This is my router ( padavan ) image This is how edge is used in the router ( padavan ).

edge -d k2 -a 10.2.1.21 -c ntop -k 123 -l c0.smsq.ml:10086 -p 37599 -brEfA -t 37500 > 1.txt

This is the information in 1.txt

03/Jun/2019 16:42:37 [edge.c:602] Starting n2n edge 2.5.0 Jun  3 2019 14:55:35
03/Jun/2019 16:42:37 [edge_utils.c:1758] Adding supernode[0] = c0.smsq.ml:10086
03/Jun/2019 16:42:37 [edge.c:621] ip_mode='static'
03/Jun/2019 16:42:37 [tuntap_linux.c:43] Interface k2 has MAC 0A:0A:D8:B8:CC:86
03/Jun/2019 16:42:37 [edge_utils.c:180] supernode 0 => c0.smsq.ml:10086
03/Jun/2019 16:42:37 [edge_utils.c:1684] Binding to local port 37599
03/Jun/2019 16:42:37 [edge.c:672] edge started
03/Jun/2019 16:45:03 [edge_utils.c:880] ERROR: invalid transop ID: expected AES-CBC(3), got twofish(2)
03/Jun/2019 16:45:03 [edge_utils.c:446] P2P TX connection enstablished: 02:F8:19:41:E6:30 [192.168.22.4:37586]
03/Jun/2019 16:45:03 [edge_utils.c:446] P2P TX connection enstablished: 02:F8:19:41:E6:30 [192.168.22.4:37586]
03/Jun/2019 16:45:18 [edge_utils.c:880] ERROR: invalid transop ID: expected AES-CBC(3), got twofish(2)
03/Jun/2019 16:45:42 [transform_aes.c:248] WARNING: UDP payload decryption failed.
03/Jun/2019 16:45:44 [transform_aes.c:248] WARNING: UDP payload decryption failed.
03/Jun/2019 16:45:46 [transform_aes.c:248] WARNING: UDP payload decryption failed.
03/Jun/2019 16:45:48 [transform_aes.c:248] WARNING: UDP payload decryption failed.
03/Jun/2019 16:46:07 [edge_utils.c:1625] Peer removed: pending=2, operational=0
03/Jun/2019 16:46:22 [edge_utils.c:880] ERROR: invalid transop ID: expected AES-CBC(3), got twofish(2)
03/Jun/2019 16:46:24 [edge_utils.c:880] ERROR: invalid transop ID: expected AES-CBC(3), got twofish(2)
03/Jun/2019 16:46:26 [edge_utils.c:880] ERROR: invalid transop ID: expected AES-CBC(3), got twofish(2)
03/Jun/2019 16:46:28 [edge_utils.c:880] ERROR: invalid transop ID: expected AES-CBC(3), got twofish(2)
03/Jun/2019 16:46:37 [edge_utils.c:1625] Peer removed: pending=1, operational=0
03/Jun/2019 16:48:26 [edge.c:563] Shutting down...
03/Jun/2019 16:48:26 [edge_utils.c:1515] **********************************
03/Jun/2019 16:48:26 [edge_utils.c:1516] Packet stats:
03/Jun/2019 16:48:26 [edge_utils.c:1517]     TX P2P: 0 pkts
03/Jun/2019 16:48:26 [edge_utils.c:1518]     RX P2P: 0 pkts
03/Jun/2019 16:48:26 [edge_utils.c:1519]     TX Supernode: 0 pkts
03/Jun/2019 16:48:26 [edge_utils.c:1520]     RX Supernode: 12 pkts
03/Jun/2019 16:48:26 [edge_utils.c:1521] **********************************

You can test the latest edge program on the router I gave you under /opt/new. Thanks!

emanuele-f commented 5 years ago

Thank you for the information, I will try to reproduce this

lucktu commented 5 years ago

Tests for newly compiled edge (Compile time: Jun 9 2019 07:56:00), but use the old supernode_v2 : image

The time it took to become p2p was too long. In my tests, it was 1-15 minutes. P2p later transmission speed becomes slower than before. (This time, someone else is using routers A, B and C slightly.)

Here is a screenshot of the test: image image image image image image image image image

emanuele-f commented 5 years ago

You need to update the supernode, please post the results after the supernode update, thank you.

lucktu commented 5 years ago

Tests for newly compiled n2n (Compile time: Jun 9 2019 07:56:00): image

Feeling: they have the same speed with Twofish encryption and AES encyption. Great progress, Very well, thank you !!

Here is a screenshot of the test: image image image image image image image image

emanuele-f commented 5 years ago

Great! Your tests are very important since you have a good test environment, thank you! When you are satisfied, please close this and issue #126.

lucktu commented 5 years ago

Ok, don't forget to that they have the same speed with Twofish encryption and AES encyption.

emanuele-f commented 5 years ago

Do you mean that aes is now slower? Please look at the cpu, if you see n2n reaching 100% load on a single core when you send high speeds then you have reached the limit. You can also run the benchmark program to verify, in my pc AES is still way faster than twofish:

Run enc[transop_null] for 3s (512 bytes):            1875288 packets       625.1 Kpps      320.0 MB/s
Run enc[transop_twofish] for 3s (512 bytes):           64909 packets        21.6 Kpps       11.1 MB/s
Run enc[transop_aes] for 3s (512 bytes):              639017 packets       213.0 Kpps      109.1 MB/s
lucktu commented 5 years ago

Yes, Maybe that's what happened? The difference is small when CPU is limited: image

Some other comparisons: image image image

emanuele-f commented 5 years ago

Yes, this is normal as you won't notice difference when cpu is low.

lucktu commented 5 years ago

Ok, I see. Thank you