lbryio / lbry-sdk

The LBRY SDK for building decentralized, censorship resistant, monetized, digital content apps.
https://lbry.com
MIT License
7.19k stars 483 forks source link

More efficient blob announcement using bloom filters #3181

Open lyoshenka opened 3 years ago

lyoshenka commented 3 years ago

tldr

When announcing a lot of blobs, you'll often be announcing many blobs to the same peer. Instead of doing many iterative-find-and-store cycles (one for each hash), do a single cycle and store a bloom filter of all the hashes together.

When answering a findNode call, if k peers aren't found by checking exact hash matches, the node will check it's bloom filters for matches.

details

bloom filter idea: split all hashes in k shards find k peers for each shard announce a bloom filter for every 200 items for k peers in each set

split all hashes i

to announce many hashes:

  1. determine k closest peers for all hashes, build set of hashes per peer
  2. build bloom filter for hashes to announce to each peer
  3. send filters for each peer

to find a peer for a hash:

  1. find the k closest to the hash in the rt
  2. send findvalue requests to them a. peer checks datastore for the hash, returns k peers if hit b. peer checks stored bloom filters for if they contain the hash, returns up to k peers that have hits
cristi-zz commented 3 years ago

Is lbry-sdk the only app that interacts with DHT network? As far as I read, we need to adapt the protocol (store bloom filters).

Also, bloom filters must be re-propagated once significant changes are made (peer drops many blobs)

shyba commented 3 years ago

Hi! :wave:

Yes, LBRY has its own DHT and you are correct that this task is to add support for bloom filters announcements and discovery.

Propagation as you describe is currently done by re-announcing every stream every 24 hours. Since bloom filters are going to group many announcements into k shards, we need to update the shards that changed and probably more often. We probably need to think more about that...