Rfm 17.1 - Sharing Provider Records with Multiaddress

cortze commented 1 year ago

This is the first draft of the report that extends RFM17 to measure if the Multiaddresses of a content provider are being shared during the retrieval process of a CID process.

It includes the study's motivation, the methodology we followed, the discussion of the results we got out of the study, and a conclusion.

All kind of feedback is appreciated, so please, go ahead to point out improvements!

Also, should I be running a more extensive set of CIDs for extended periods?

cc: @yiannisbot @guillaumemichel @dennis-tra

yiannisbot commented 1 year ago

@cortze I've just done a thorough review of this - great work! My main worry is that the claim of: "if a multiaddress is returned together with the PeerID for the TTL period (10mins or 30mins), then we can extend the TTL to the PR expiry interval" doesn't really hold. Why would we arrive to this conclusion?

The main argument in order to increase the multiaddress TTL to the PR expiry interval would be to show that the multiaddress of the PR holder doesn't usually change. It would be great to have some experiments along the lines of the comment I inserted above: https://github.com/protocol/network-measurements/pull/22#discussion_r1022489894

I'd love to hear your thoughts on this. Basically, similar to the CID Hoarder, what we need here is a PeerID Hoarder :-D This tool would get a lot of PeerIDs, record the multiaddress by which we first saw the peer and then periodically ping the peer to figure out if it changed its Multiaddress within the PR Expiry Interval. I'm not sure if this functionality can easily be included in Nebula @dennis-tra ? This is what would give us a solid justification to argue for the extension of the TTL.

Other thoughts?

cortze commented 1 year ago

Thanks for the feedback @yiannisbot , I really appreciate it!

My main worry is that the claim of: "if a multiaddress is returned together with the PeerID for the TTL period (10mins or 30mins), then we can extend the TTL to the PR expiry interval" doesn't really hold. Why would we arrive to this conclusion

I will try to make it a bit more explicit in the conclusion (my bad). It's not an "it won't hold" statement. It is an "It won't have as much impact as we are expecting" statement.

As far as your network has different TTL values for Multiaddresses (like in the current network), the smallest TTL will be the one limiting negatively the final result of the DHT lookup process (at least the go-libp2p-kad-dht one). So unless the largest part of the network updates to that TTL, we will still face the same problem, and there will still be sporadic problems originated from those remaining "old" clients. (The double-hashing implementation would be a nice incentive to force a total network update)

Basically, similar to the CID Hoarder, what we need here is a PeerID Hoarder :-D This tool would get a lot of PeerIDs, record the multiaddress by which we first saw the peer and then periodically ping the peer to figure out if it changed its Multiaddress within the PR Expiry Interval.

I left you a comment as well in the #22 comment I think that we have a few options here. The hoarder already does this indirectly (it contacts the PR Holders to the Multiaddress that we stored while storing the PRs). Also, I think that Nebula already tracks IP rotation. We could have a deeper chat about this :)

I'll iter again over your comments and suggestions, will ping you back whenever I make a commit!

dennis-tra commented 1 year ago

I'm not sure if this functionality can easily be included in Nebula @dennis-tra ?

Sorry for the late reply! The information is already recorded by Nebula and would just need to be analyzed :)

I'll iter again over your comments and suggestions, will ping you back whenever I make a commit!

Just ping here or in Discord and I'll also have a proper read. I just skimmed it in the past 🙈

cortze commented 1 year ago

I already added some explanations and most of the changes that @yiannisbot suggested. I set up another Hoarder run with 20k CIDs for 60 hours, so the plots and some numbers might change.

If you can go through and give me some thoughts @dennis-tra , I would appreciate your feedback as well 😄

yiannisbot commented 1 year ago

The hoarder already does this indirectly (it contacts the PR Holders to the Multiaddress that we stored while storing the PRs). Also, I think that Nebula already tracks IP rotation.

Great that the Hoarder contacts the original Multiaddress! That's what we need. So if we run the experiment for long enough and monitor that, then we have what we're looking for.

This ^ together with an analysis of logs from Nebula will tell us what is the rate of PR Holders that switch IP addresses over the republish interval. I think with those two, this will be complete and ready for merging.

yiannisbot commented 1 year ago

I set up another Hoarder run with 20k CIDs for 60 hours, so the plots and some numbers might change.

@cortze do we have any results from this experiment? I think with these results and addressing Guillaume's question, this should be ready to be merged, right?

cortze commented 1 year ago

@cortze do we have any results from this experiment? I think with these results and addressing Guillaume's question, this should be ready to be merged, right?

@yiannisbot The results of this run were not as good as I expected. To track such a large set of CIDs, I had to increase the concurrency parameters of the hoarder, and as we spotted in our last meeting (link to the Issue describing the bottleneck) the code is not that prepared to support such a high degree of concurrency.

However, I think that even with such a low number of CIDs and a lower ping-interval between pings (3 minutes), we can conclude that increasing provider Multiaddress' TTL would improve content fetching times. And the impact would be much higher if we merge it with go-libp2p-kad-dht#802.

RFM17 already proved that the IP rotation of PRHolders barely happens:

cortze commented 1 year ago

@yiannisbot I've updated the document with your suggestions and with two extra paragraphs describing:

The observed IP-churn of DHT servers in IPFS (from the RFM-17)
Contribution section, where I aggregated all the pull requests related to this RFM

I've also updated the figures. The new ones have the DHT lookup limited to 2 mins - which shows a reasonable number of peers that return the PRs as pointed out by @guillaumemichel .

The new data still faces a lower number of online PR Holders due to a problem storing the records in a part of the network. However, I consider them more than good enough to conclude that increasing the TTL of the Provider's Multiaddres would avoid the second DHT lookup to map the PeerID of the Provider with its Multiaddres.

Let me know what do you think about the update :) Cheers!

probe-lab / network-measurements

Rfm 17.1 - Sharing Provider Records with Multiaddress #22