lucktu commented 2 years ago

001

all of edge1/2/3/4 have sn1 as their supernode, and there’s no communication between them;
sn2 begins to work, and add sn1 to it's federationes;
after a few moments, edge1 ping edge2(or 3/4), when the ping goes through, stop ping;
stop sn1, edge1 ping edge2(or 3/4), when the ping goes through, stop ping, and restart sn1;
stop sn2’s service, and cancel the federation with sn1, and change federation's name, and turn on sn2;

Now, edge1 loses contact with edge2/3/4 unless edge1 restart; if we kill the sn2, edge1 can contact with edge2/3/4, but when we restart sn2, it loses again.

Logan007 commented 2 years ago

A shot in the dark, are the supernodes' clocks somewhat in sync?

lucktu commented 2 years ago

The reason for this is that edge1 only exist in sn2's management_port_out and not in sn1, and edge2/3/4 exist in sn1's management_port_out and not in sn2.

I think there’s some logic here that needs to be properly sorted out.

Discuss

When begin trying to connect, I think should always sn1 as their center unless sn1 has a problem, then can turn to federation for help (for example, for greater bandwidth).
When B joins in A's federation, A can see B (and it's communities, edges, and take advantage of it’s resources), but B only can see A's pending. when A joins in B's federation too, at this point, B can have full authority. At present, when A leak it's federation_name, or default, then maybe A is out of control (bandwidth allocation, too much useless information). ===== Or, Hide the parts of federation_name, and (out of A & B) others of the same_name_federation do not show, much less contact.
Do not display the ips and ports of other supernodes on edge's management_port_output.

Logan007 commented 2 years ago

The reason for this is that edge1 only exist in sn2's management_port_out and not in sn1, and edge2/3/4 exist in sn1's management_port_out and not in sn2.

With a view the management port output, this is expected behavior. Edges shall only connect to one supernode at a time.

The supernodes know about each other's edges anyway and forward packets to the corresponding supernode (#640) in case p2p not working.

Concerning the federation name, it actually is the private federation key. So, it needs to be handled confidentially as key material always deserve.

We assume the management port being available locally only to be accessible to legit users. Unless the management port is not password-protected which it is not, one would need to disable the port in the code if you fear leaking any information through this channel. Alternatively, we just remove the federation name from output.

Do not display the ips and ports of other supernodes on edge's management_port_output.

How else shall an administrator identify the other supernodes and check the federation for legit connections?

When begin trying to connect, I think should always sn1 as their center unless sn1 has a problem, then can turn to federation for help (for example, for greater bandwidth)

The federation feature was developed aiming at load balancing. However, an alternative already is available.

The selection criterion can easily be adapted to your needs and further schemes can be added as required. The supernode selection scheme code is contained at just one place (src/sn_selection.c). Imagine, to have one fixed supernode and letting act the others in a hot-standby mode, it only would require to use the supernode's MAC address as selection criterion, i.e. connect to the supernode with the lowest MAC address. If that supernode were not available anymore, edges would connect to the one with the next-highest MAC address. Any volunteers?

lucktu commented 2 years ago

1. Alternatively, we just remove the federation name from output.
2. How else shall an administrator identify the other supernodes and check the federation for legit connections?

You are right, I thought we could just show parts.

The federation feature was developed aiming at load balancing.

I think it’s an important backup supernode too.

Account for, only just add B to A's federation and using the default federation_name, now I can see "supernode.ntop.org:7777" on sn's management_port_out.

With a view the management port output, this is expected behavior. Edges shall only connect to one supernode at a time.

So what’s the solution to my problem?

Logan007 commented 2 years ago

I think it’s an important backup supernode too.

Actually, it serves as backup at the same time. Because load-balancing also means, that if one supernode is not available anymore, the edges look for another one.

So what’s the solution to my problem?

Not sure, I would expect your scenario be working easily. If you shared your logs with markers to the specific events and your exact command lines and IP / MAC addresses, maybe someone could volunteer to get deeper into it.

Did you check the system clocks at edges and supernodes?

lucktu commented 2 years ago

I think it’s better if you experiment, and also look at the output of edge1, sn1 and sn2.

edge3 -d xrg -a 172.16.0.5 -c xtdg -k fhfghfg -l n2n.lucktu.com:10090 -t 31570 -Efr -e auto #

I can offer you a sn1: n2n.lucktu.com:10090

Did you check the system clocks at edges and supernodes?

They’re identical in pc1 / pc2 / pc5. Hope it’s okay if it’s different. I don't use -H.

Logan007 commented 2 years ago

Did you check the system clocks at edges and supernodes?

They’re identical in pc1 / pc2 / pc5. Hope it’s okay if it’s different.

Supernodes communicate with header encryption enabled, so they should be somewhat in sync +/- 16 seconds. If your edges also use header encryption (-H), they should also be in sync with the other egdes and all supernodes at the same range.

Logan007 commented 2 years ago

I think it’s better if you experiment, ...

I can offer you a sn1: n2n.lucktu.com:10090

I am sorry that I currently am not able to offer this level of support.

... and also look at the output of edge1, sn1 and sn2.

Do you want to share those? Someone might jump into it.

Logan007 commented 2 years ago

... and also look at the output of edge1, sn1 and sn2

One more thought: If you inspect the management port output of edges and supernodes, please make sure that edges / supernodes do not connect using the local addresses (could happen due to name resolution configuration) but the public ones. You can ensure this by providing all -l ... with public IP addresses – at the edges as well as at the supernodes, at least for testing.

lucktu commented 2 years ago

I didn’t plan for you to jump in. ^_^

Yes, all -l ... with public ip addresses.

Might have something to do with the fact that sn3 was already running on pc2, even though it had nothing to do with sn1 & sn2?

Let’s put this question aside. you’d better test it yourself. the main problem is that edge1(Initiator) is easy to become someone else’s child and not easy to come back. 001

I’m more interested in #839 now!

fengdaolong commented 2 years ago

I think that if DHT technology is introduced in the future, the "federation" is only a transitional function, and it is not worth the excessive effort in the "federation".

Logan007 commented 2 years ago

the "federation" is only a transitional function

Indeed. Moving more towards p2p does not need the federation concept anymore... well, it will look differently and will not be limited to the supernode side. We might need a master key signing every peer's key (consider it private, just like today's federation name) and a public key to verify, just like current -P.

We obviously are on the same sheet of paper... :wink:

Logan007 commented 2 years ago

Given the extremely long list of (random) supernodes seen on your screenshot, I need to ask if all supernodes (and edges) are from the same and current dev code.

741 brought some changes to packet format (federation related) which could very well explain the random looking list if instances of different versions happened to be mixed.

lucktu commented 2 years ago

I need to ask if all supernodes (and edges) are from the same and current dev code.

I don’t recognize any of them except for the first supernode (223.133.68.170:10090), this is from sn2's management_port_out. those supernodes must be different versions from all over the world.

Logan007 commented 2 years ago

those supernodes must be different versions from all over the world

With a view to their IP addresses like 0.0.0.0, 8.0.0.0 or 0.0.1.0, it is very unlikely to see actual supernodes. I still assume that the supernodes are not built from the same source version, thus incompatible and hence this erroneous behavior. Can you check the supernode versions?

lucktu commented 2 years ago

Can you check the supernode versions?

v.2.9.0.r999.b735ad6.

Let’s put this question aside please.

Logan007 commented 2 years ago

Is this still an issue?

lucktu commented 2 years ago

I do not have the condition test at present, I shut down first, in the future, if it still exists, I’ll come back to it.

ntop / n2n

Some questions about federation #843

741 brought some changes to packet format (federation related) which could very well explain the random looking list if instances of different versions happened to be mixed.