Allow DHT crawler to be swappable

libp2p / go-libp2p-kad-dht

A Kademlia DHT implementation on go-libp2p

https://github.com/libp2p/specs/tree/master/kad-dht

MIT License

524 stars 224 forks source link

Allow DHT crawler to be swappable #823

Closed masih closed 1 year ago

masih commented 1 year ago

The accelerated DHT client uses a fixed implementation of DHT crawler with a parallelism hardcoded to 200. This offers a reasonable general case default behaviour but limits experimentation with alternative crawling strategies with varied or dynamic degree of parallelism.

To avoid this the changes here:

refactor the existing Crawler to DefaultCrawler
introduce Crawler as interface type, and
add option to set the Crawler in the accelerated DHT client with fallback to DefaultCrawler at parallelism of 200.

As a result the net functionality of the code remains unchanged while it allows the crawling logic to be swapped entirely with an alternative implementation.

Additionally, the changes here parameterize the dial addr extend duration with fallback to the previously hardcoded value of 30 minutes.

aschmahmann commented 1 year ago

Haven't looked closely, but LGTM making the crawler more configurable/swappable seems good.

Note that unfortunately there's not much in the way of testing here, so if you're going to deploy changes on the live network you will want to do some monitoring to make sure you're doing a good job of covering the network space as it evolves over time (e.g. by comparing against something more known like the default crawler, nebula, etc.).

@guillaumemichel @dennis-tra do you have any tooling advice here for if @masih wants to test out alternatives here.

masih commented 1 year ago

Thank you both for the speedy reviews; much appreciated.

In the first iteration I aim to aggregate crawling across multiple nodes: the crawling logic remains the same, but will only run on a single node and all other instances would use the same rt for lookup.

This is a specific case where the DHT client is run as a service replicated across multiple instances (much like caskadht).

Later, hopefully I'll get to experiment with alternative crawling strategies, at which point i'd love the help of you folks on lookup performance comparison, probably in a simulation of sorts where ground truth is known.

dennis-tra commented 1 year ago

A checksum would be the total network size which you could compare with our Nebula crawls. If this number differed too much, there's something wrong on either side. Judging from the code changes I don't think there is too much risk here.