Network self-organisation

SionoiS commented 2 years ago

Hi,

I had a thought that IPFS nodes may self-organize into higher level clusters because of the way connections are formed and maintained.

More specifically, knowing the high level of node churn, do longer running nodes tend towards connecting to each others? I would think not because of the way latency is prioritised, which result in nodes organising based on distance. Is this good?

More generally, how do we measure self-organization, and could we not use this to our benefit too?

yiannisbot commented 2 years ago

That's a great point! Using this to our benefit is probably a long shot as we'd need to implement some sort of metadata being communicated between the nodes. But it's worth finding out nevertheless.

Nodes are organised based on XOR distance, but every node keeps several k-buckets, so it's not unreasonable to think that a node might stay connected to a particular set of peers across the hash space.

@dennis-tra is that possible to derive from the crawler logs? 🤔 The script would have to go through multiple crawls and cross-check connectivity between nodes over time/snapshots of the network.

SionoiS commented 2 years ago

Nodes are organised based on XOR distance, but every node keeps several k-buckets, so it's not unreasonable to think that a node might stay connected to a particular set of peers across the hash space.

Ah! I was wrong, I thought that node connected to geographically close nodes. I need to read the specs again.

edit: from the paper

The longer a node has been up, the more likely it is to remain up another hour. By keeping the oldest live contacts around, k buckets maximize the probability that the nodes they contain will remain online. A second benefit of k-buckets is that they provide resistance to certain DoS attacks. One cannot flush nodes’ routing state by flooding the system with new nodes. Kademlia nodes will only insert the new nodes in the k-buckets when old nodes leave the system.

dennis-tra commented 2 years ago

Hi @SionoiS,

I think it's a great idea to look deeper into the dynamics of how the DHT overlay organises itself!

When it comes to adding remote peers to ones routing table, the relevant logic is documented here. I could imagine that stable peers form higher connected sub-graphs because IIRC, Kademlia prioritises stable, long-running nodes. However, there is also a maxLatency check in the code, so the latency is not completely irrelevant here.

The crawl data can certainly be used to correlate the uptime of peers with their presence in other peers' routing tables. However, when it comes to latency, the crawler doesn't have vantage into the latency between two peers, so the crawl data won't give insights on that :/

SionoiS commented 2 years ago

Is it not possible to "map" the nodes using latency from know sources?

With 3 nodes pinging each other, one could start "triangulating" all the nodes and build a map of the network.

Not super reliable as latency can change but maybe good enough for long running nodes?

guillaumemichel commented 2 years ago

Concerning the DHT:

By design, we expect nodes to form small locality clusters. The routing table of a peer P is organized in k-buckets: in bucket[i] all peers share a common prefix of length i with P. The buckets sharing the longest common prefix length (CPL), expressing closeness are not expected to be full, thus the peers inside these buckets will never get flushed. These buckets will most likely to be identical on all hosts sharing this long CPL, for the XOR operation is commutative (i.e at peer p0, bucket[CPL]=[p1, p2, p3] then it is probable that at peer p1 bucket[CPL]=[p0, p2, p3]). This implies that nodes will organize in small ( <=20 which is the bucket size), disjoint clusters with other close nodes.

yiannisbot commented 2 years ago

Thanks everyone for the input!

This implies that nodes will organize in small ( <=20 which is the bucket size), disjoint clusters with other close nodes.

I believe this holds for every k-bucket, so each node is part of several disjoint clusters. Do you agree @guillaumemichel ?

Trying to think what useful information we could get out of that, the following two come up:

Approximate the dynamicity of the network, by monitoring how frequently do clusters change by 1, 2 or more peers. This is closely related to (and influenced by) the DHT Routing Table refreshing.
Develop a cluster co-efficient, or some similar parameter that correlates cluster participants with their geographic position in terms of network latency, or AS number. The longer the average latency within a cluster the slower the performance of lookups will be.

Any other ideas?

I think this is worth an RFM, but not sure what priority it will take on our list of items. @SionoiS if you're keen to work towards a solution on this using the crawler data, I'd be definitely interested to discuss.

guillaumemichel commented 2 years ago

This implies that nodes will organize in small ( <=20 which is the bucket size), disjoint clusters with other close nodes.

I believe this holds for every k-bucket, so each node is part of several disjoint clusters. Do you agree @guillaumemichel ?

No, I don't believe this would be the case. For buckets with a low CPL, each node will have plenty of choices to fill its buckets, for there are many nodes that would fall in this bucket. Thus, according to the bucket replacement policy, it is very unlikely that multiple nodes share the same low CPL bucket (i.e the set [Peers in this bucket + self] is identical for multiple nodes). If you have a node far away in your bucket[X], you won't necessarily be in its bucket[X].

When it comes to adding remote peers to ones routing table, the relevant logic is documented here

However, it is probable that a peer shares a lot of its 20 closest peers with its 20 closest peers, as they have to pick the only nodes that match the high CPL, thus forming a Common Prefix Cluster.

dennis-tra commented 2 years ago

Is it not possible to "map" the nodes using latency from know sources?

With 3 nodes pinging each other, one could start "triangulating" all the nodes and build a map of the network.

Not super reliable as latency can change but maybe good enough for long running nodes?

@SionoiS Absolutely! :) We are just sitting on a bunch of data waiting to be analysed and I wanted to point out that more work is needed to include the latency somehow 👍

@guillaumemichel I also think that peers with high CPLs will likely be in each other's routing table and hence form a highly connected sub-graph - and as you said, this is by design.

This implies that nodes will organize in small ( <=20 which is the bucket size), disjoint clusters with other close nodes.

I probably miss something here, why strictly "<= 20"? Let's say, I had three consecutive high-CPL buckets that are filled with 12, 8, and 4 peers. Because the buckets are not full it's likely that each peer in these buckets is also present in the routing tables of the other peers, hence forming a >20 peers mini-cluster, right?

I think, more interesting would be larger cluster topologies - something that's unrelated to the deliberate design choices and just happens due to other dynamics.

Approximate the dynamicity of the network, by monitoring how frequently do clusters change by 1, 2 or more peers. This is closely related to (and influenced by) the DHT Routing Table refreshing.

Develop a cluster co-efficient, or some similar parameter that correlates cluster participants with their geographic position in terms of network latency, or AS number. The longer the average latency within a cluster the slower the performance of lookups will be.

@yiannisbot Sound good, and will try to come up with more. For now, I think as a first step we would need to have a condition that defines when we consider a set of peers being a cluster.

SionoiS commented 2 years ago

I think this is worth an RFM, but not sure what priority it will take on our list of items. @SionoiS if you're keen to work towards a solution on this using the crawler data, I'd be definitely interested to discuss.

Sorry, I have already too much to do. I just wanted to share my random thought. I guess it sparked some more thoughts from you guys :)

guillaumemichel commented 2 years ago

@guillaumemichel I also think that peers with high CPLs will likely be in each other's routing table and hence form a highly connected sub-graph - and as you said, this is by design.

This implies that nodes will organize in small ( <=20 which is the bucket size), disjoint clusters with other close nodes.

I probably miss something here, why strictly "<= 20"? Let's say, I had three consecutive high-CPL buckets that are filled with 12, 8, and 4 peers. Because the buckets are not full it's likely that each peer in these buckets is also present in the routing tables of the other peers, hence forming a >20 peers mini-cluster, right?

@dennis-tra yes, you are right! I think the size of the largest high-CPL buckets cluster/full mesh graph that we can expect is 40 (2x bucket size). The existence of these small buckets could be a metric to the DHT routing table health.

I think, more interesting would be larger cluster topologies - something that's unrelated to the deliberate design choices and just happens due to other dynamics.

I agree! Maybe we could even identify better bucket replacement policies using this information.

Trying to think what useful information we could get out of that, the following two come up:

1. Approximate the dynamicity of the network, by monitoring how frequently do clusters change by 1, 2 or more peers. This is closely related to (and influenced by) the DHT Routing Table refreshing.

2. Develop a cluster co-efficient, or some similar parameter that correlates cluster participants with their geographic position in terms of network latency, or AS number. The longer the average latency within a cluster the slower the performance of lookups will be.

Any other ideas?

@yiannisbot we could evaluate the effects of having clusters in low CPL buckets, in order to propose bucket replacement policy improvements.

probe-lab / network-measurements

Network self-organisation #11