probe-lab / network-measurements

MIT License
50 stars 13 forks source link

Impact of peers that rotate their PeerIDs #31

Open yiannisbot opened 1 year ago

yiannisbot commented 1 year ago

I'm wondering what is the impact of peers that join the IPFS DHT and rotate their PeerIDs excessively. We've seen in recent reports, e.g., Week 5 Nebula Report, that there are 5 peers which rotate their PeerID 5000 times each, within the space of a week. This comes down to peers having a separate PeerID every couple of minutes. The number of rotating PeerIDs seen are roughly as many as the relatively stable nodes in the network (aka network size). The routing table of DHT peers is updated every 10mins, so the impact is likely not sticking around for longer than that, but given the excessive number of rotations, I feel that this requires a second thought.

I can see three cases where this might have an impact (although there might be more):

  1. In the GET process, when looking for closer peers and hitting a peer that has disappeared from the network (rotated their PeerID)
  2. In Provider Record availability, when looking for a record that has been stored with a peer that rotated their PeerID
  3. In content availability, when the peer that advertises their PeerID has advertised some content and then is not reachable anymore.

The first case should be covered by the concurrency factor, although the large number of rotations might be causing issues. We could check the second case through the CID Hoarder - @cortze it's worth spinning up an experiment to cross-check what happens with previous results. Not sure what can be done for the third case :)

Thoughts on whether this is actually a problem or not:

It's worth checking whether those PeerIDs co-exist in parallel in the network, or whether when we see a new PeerID from the same IP address, the previous one(s) we've seen from the same IP address have disappeared. @dennis-tra do we know that already? Is there a way to check that from the Nebula logs?

Also, from @mcamou:

re: thousands of PeerIDs with the same IP, I don't think that we can completely rule out that they are different peers mainly due to NAT. On the one hand, some ISPs implement CG-NAT, where they do use a single IP for multiple customers. On the other hand, you might have large companies who have a single Internet PoP for their whole network.

Depending on how many IP's we have in this state, we might want to make a study regarding the above 2 cases (and others that we might think about). One thing to look at would be whether the same PeerID shows up consistently or whether it's a one-off.

Extra thoughts more than welcome.

guillaumemichel commented 1 year ago

It's worth checking whether those PeerIDs co-exist in parallel in the network, or whether when we see a new PeerID from the same IP address, the previous one(s) we've seen from the same IP address have disappeared. @dennis-tra do we know that already? Is there a way to check that from the Nebula logs?

+1 It would be interesting to know whether they are rotating their PeerID in the first place, or if it is just many IPFS entities running on the same IP address.

Concerning (3), IMO it isn't our problem, if a user decides to rotate its PeerID (assuming it does), the user cannot expect the Content it provides to be reachable.

IMO (1) and (2) are legit concerns if peers are actually rotating their PeerID AND are acting as DHT Servers.

yiannisbot commented 1 year ago

Concerning (3), IMO it isn't our problem, if a user decides to rotate its PeerID (assuming it does), the user cannot expect the Content it provides to be reachable.

Agreed. I'm just thinking that if this is due to a bug/misconfiguration of the hosts and they intend to publish their content on IPFS, but after a few mins it's not findable, then they get terrible experience from using IPFS. I.e., I'm thinking of a non-malicious case in (3) ;-)

mcamou commented 1 year ago

I'm just wondering whether (3) is obvious to someone who is currently just running IPFS without a fairly good understanding of how the DHT works. A user might think that changing the node's ID (and restarting the node) should not stop providing the content. Do we have anything in the docs to that effect?

yiannisbot commented 1 year ago

I really doubt we have anything along those lines in the docs. But I would guess that if someone has the technical knowledge to change the node's ID, they would also have made an effort to understand how things work a little deeper :)

cortze commented 1 year ago

We could check the second case through the CID Hoarder - @cortze it's worth spinning up an experiment to cross-check what happens with previous results.

Sure @yiannisbot , I just spawned a new Hoard to see how it affects to the PRL.