Logan007 commented 3 years ago

This pull request shall implement some experimental idea to enhance p2p experience.

Basically, it extends the idea of port prediction as found with -L to a much bigger extent, namely by increasing the number of listening sockets as well as replacing the prediction algorithm by using random and a lot more REGISTER attempts.

Work in progress (thus draft) and being built in increments, will definitely take a while. For testing, the current code can always be downloaded here as provided by github's autopacker.

The first commit only adds the capability to pass through a specially flagged QUERY_PEER to the queried edge itself (so, no effect on p2p yet). As QUERY_PEER is sent out only in case of non-working p2p, this can signal the other edge to take some action. The also flagged PEER_INFO answer lets the originally sending edge know when to start taking some action as well.

codecov-commenter commented 3 years ago

Codecov Report

Merging #839 (f55b408) into dev (94e6f4a) will decrease coverage by 0.36%. The diff coverage is 0.00%.

@@            Coverage Diff             @@
##              dev     #839      +/-   ##
==========================================
- Coverage   18.83%   18.46%   -0.37%     
==========================================
  Files          39       39              
  Lines        8305     8470     +165     
==========================================
  Hits         1564     1564              
- Misses       6741     6906     +165

Impacted Files	Coverage Δ
src/edge_utils.c	`0.96% <0.00%> (-0.08%)`	:arrow_down:
src/sn_utils.c	`0.00% <0.00%> (ø)`
src/wire.c	`35.71% <0.00%> (-0.44%)`	:arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 94e6f4a...f55b408. Read the comment docs.

Logan007 commented 3 years ago

12246f8 adds receptor sockets. They will be temporarily opened while the other end tries to send REGISTERs. If we are lucky, we get a match. This is indicated by the !!! incoming at receptor socket !!! message. No further handling implemented yet.

If you want to test, you could try some not-so-p2p edges and ping between them for a few minutes (also replace the supernode with version from this branch). If you see that message at one of the edges, chances are that p2p can be established – later, following this pull request.

Let me know what you find.

lucktu commented 3 years ago

Thank you for trying!

I got it, but it’s hard, and it didn’t turn into p2p.

001

The following images were taken in 10 minutes, and those "!!!" were picked up within the first 2 minutes of the ping.

002-10m

003-10m

The following images were taken in 60 minutes, and those "!!!" were picked up within the first 1 minutes of the ping, it seems so easy at first (after restarting 2 edges and ping), but so hard later(never get the chance? it's worth studying)? always psp.

003-60m

Edge has only a 10% chance of seeing it, and zerotier has always seen it, by "netstat -ap|grep xxx" 004

Logan007 commented 3 years ago

Thank you for testing!

I got it

If you get the message at one end at least, it means that you likely will get p2p once this pull request is done.

and it didn’t turn into p2p

Not yet, but hopefully when this pull request is done.

The following images were taken in 60 minutes, and those "!!!" were picked up within the first 1 minutes of the ping, it seems so easy at first (after restarting 2 edges and ping), but so hard later(never get the chance? it's worth studying)? always psp.

However, while testing, I found that it sometimes turns into n2n "by accident", that is when randomly trying to send REGISTERs to the receptor sockets happens to hit the original peer connection port (once newly opened by the NAT for connecting to the peer – instead of using the main socket). But this is very rare. If it happens, no more PEER_QUERies (as we have p2p) are sent and thus no more messages output to the screen later. You might already have gotten p2p in this case!

This receptor socket method does not immediately jump in as it uses some network resources. It starts with every other PEER_QUERY request in case p2p can't be established, that is after 40 to 60 seconds or so which makes it more suitable for longer lasting connections (shorter connections forwarded by supernode if STUN-like method does not work, not worth the effort of receptor socket method). And not every try is successful. Depending on the NATs, it might even never be successful.

lucktu commented 3 years ago

Yes, it’s still to be explored.

When I test in the local area network(in lan; edge ... -e xxx -E), even if it is p2p, will not show "!!!", but at this time the value of ping is larger, is 4 ms, normal should be 1. x ms. when psp, it is 10 ms.

I think such efforts should be made in the corridors of affluence, and could be more violent, when it’s not p2p.

Logan007 commented 3 years ago

if it is p2p, will not show "!!!"

This is expected because the receptor sockets will only be opened upon (every other) QUERY_PEER which will only be sent when there is no p2p.

but at this time the value of ping is larger, is 4 ms, normal should be 1. x ms. when psp, it is 10 ms.

I have no explanation for this. Does the management port show the local addresses for the peers or the router's external addresses as p2p-address? Though, it should not as you use -e.

such efforts should be made in the corridors of affluence

How to determine if there is enough capacity available?

could be more violent

Opening 400 sockets and sending 2000 packets to random ports, I already consider it extremely violent... :wink:

Just imagine, this happens in parallel for several remote edges... local sockets might become scarce and remote firewalls might shutdown because they are under the impression that they experience attacks.

lucktu commented 3 years ago

After "cedbdce", I can’t get p2p or "!!! incoming at receptor socket !!!".

I falled back to "12246f8" to test.

Can you add a time before “!!! incoming at receptor socket !!!” for observation. I found that "!!! incoming at receptor socket !!!" sometimes appeared in edge_A, sometimes appeared in edgeB, (and the success rate is high, not that it’s hard like before --- continue to verification)_.

Can you give a message (p2p/psp) on edge's output (not edge's management port output) when they change.

./edget -a 10.30.10.02 -c n2n -k test -l a.b.c:10090 -d 123 -Ef -t 7775 -e auto -vv|grep '!!!\|==' &

Logan007 commented 3 years ago

Thanks for the feedback, I will check later.

Logan007 commented 3 years ago

After "cedbdce", I can’t get p2p or "!!! incoming at receptor socket !!!".

Not sure why this happens. I have had it working here. Make sure to use make clean...

Latest commit removes this message anyway. Better use -v and grep for "through receptor" which also contains local time.

Can you give a message (p2p/psp) on edge's output (not edge's management port output) when they change.

I will think about that one later because as of now, switching from pSp to p2p is more like a side-effect. The really intended effect still requires some more coding effort, see the !!!-remarks in the code.

sometimes appeared in edge_A, sometimes appeared in edge_B

That's a good sign indicating that you might get those to p2p later – no matter at what side it appears.

lucktu commented 3 years ago

The latest test results are here:

002

Test method:

With the exception of edge's verison, others no change, both using commit 2's supernode;

edge_A -a 10.30.10.2 -c n2n -k test -d 2342 -E -f -t 7775 -e auto -l a.b.c:10090 -vv|grep '==\|through\|!!!'
edge_B -a 10.30.10.3 -c n2n -k test -d 2342 -E -f -t 7775 -e auto -l a.b.c:10090 -vv|grep '==\|through\|!!!'

Keep ping, continuous;
Stop edge's running for 3 minutes between each test;
Restart edge_A and edge_B before each test;
The maximum duration of each test is 3 minutes. if it's p2p (from edge's "-t 7775") or "through"(fron the run window output of edge), interrupt edge_A and edge_B.
T2 is the message of commit 2(12246f8), t4 is commit 4(90f08f5)
"===" is the start_time of edge_A or B, "through" is the time of the first occurrence.
67% is the rate of p2p for t2, 43% is the rate of p2p for t4 (what If "through" is p2p).

Here are the conclusions:

From commit 2 to 4, the rate of p2p is lower;
When the program was first connected, it was easy to p2p, and then it's hard;
Your 400 virtual ports don’t seem to be working very well after a while.

Suggestions:

Reorientation optimization;
After 25s later(circulates or other methods), if there is't p2p, then change the edge(initiator)’s communication port to come again ---- As it turns out, rebooting an edge doesn’t seem to work ---- or new port, new arp cache?

but at this time the value of ping is larger, is 4 ms, normal should be 1. x ms. when psp, it is 10 ms.

That’s because I used "edge ... -vv", it's for vv.

A new experiment, I keep edge_A and edge_B (both from commit 2) running all the time, while interrupting ping

001

When I stop pinging for 3 minutes(at the arrowhead), then ping, I can get messages of "connection through receptor socket", but if I pings continuously, I can't get it forever(maybe; no "p2p" on edge's "-t 7775" all the way through) I think this is the center of the problem! study it please!

Logan007 commented 3 years ago

The latest test results are here

Thank you for testing.

At current state, this pull request does not increase the p2p rate yet. Although, there are some very, very rare cases when this happens – it will be more once this is finished.

When the program was first connected, it was easy to p2p, and then it's hard;

The indicator message does not indicate successful p2p yet. It only indicates that p2p could be possible (when the rest of the implementation is done).

The later missing message does not mean that it got harder, it only means that you probably have hit one of the p2p-by-accident cases.

After 25s later(circulates or other methods), if there is't p2p, then change the edge(initiator)’s communication port to come again ---- As it turns out, rebooting an edge doesn’t seem to work.

This already happens, right now it is with every other QUERY_PEER message having N2N_AFLAGS_PASS_THROUGH set.

67% is the rate of p2p for t2, 43% is the rate of p2p for t4

One of the reasons might be that I had to stretch the REGISTERs over time. Instead of sending 2000 at once, I stretched it to 15 seconds. So, the matching REGISTER could arrive somewhat later and seemingly decreases the rate if watched over a short total period of time. But it still is the same amount of REGISTERs and thus should be the same rate of success – it just happens a few seconds later.

lucktu commented 3 years ago

Thank you for your detailed explanation!

I think I’ve done enough with this experiment, I may not have much time to do the experiment like old times in the future.

I’m sure you’ll do fine and join it in n2n_v3.0.

Thank you for your hard work and dedication! Good luck for you!

Logan007 commented 3 years ago

I may not have much time to do the experiment like old times in the future.

Thank you very much for testing! It is good indicator of what will be possible.

I’m sure you’ll do fine and join it in n2n_v3.0.

Further implementation will take a while as internal connection handling gets a bit tricky, it might end up in 3.2.

lucktu commented 3 years ago

Further implementation will take a while as internal connection handling gets a bit tricky, it might end up in 3.2.

Oh, I see. It must have been made with tcp. looking forward to it.

Logan007 commented 3 years ago

The point is that we have several ways for an edge to reach another edge:

via the supernode (through standard socket eee->sock)
STUNned p2p (through standard socket eee->sock)
the locally preferred address (-e, through the standard socket eee->sock)
TCP via the supernode (through TCP socket filled into eee->sock, replacing the standard UDP socket)
the eventually successfully found receptor port (through the corresponding local receptor socket)
local multicast (at least for edge detection)

These are different connections with partly different starting points (socket) and endpoints (addresses, ports) and even protocols. And their number would double with IPv6 support. So, yes, I guess we need to think a connection layer checking and handling all these possible connections and let it switch between the different possibilities. Not sure if there is quick (and dirty) approach to make use of the receptor socket instead of the standard sock for the corresponding edge and especially, to fall back when not working anymore.

lucktu commented 3 years ago

You are too ambitious!

Logan007 commented 3 years ago

Just thinking long term... :wink:

Logan007 commented 3 years ago

Up to this point, I have secretly been hoping to get the receptor sockets work even without explicit connection handling. I have tried to find all the points where the general eee->sock might be overridden by the more specific one, the additionally opened peer->socket_fd.

In my test scenario, the connection repeatedly hangs, after seemingly establishing a p2p-connection. I was not able to hunt it down to the (last?) missing piece.

So, I do stop here and stall this idea until we have a more general connection handling. I will leave it open for a while, just in case a sharp-eyed person is able to jump in.

ntop / n2n

experimental #839

Codecov Report

That’s because I used "edge ... -vv", it's for vv.