lbryio / lbry-sdk

The LBRY SDK for building decentralized, censorship resistant, monetized, digital content apps.
https://lbry.com
MIT License
7.19k stars 484 forks source link

DHT: improve data hosting metric #3633

Closed shyba closed 1 year ago

shyba commented 2 years ago

this issue is considered completed when the dashboard has an automatically-updating metric for how many TB of data is available for download

Today, we listen for queries under a shard (node id prefix) and calculate the data availability from what got announced vs the amount that got claimed. This is efficient but inaccurate, because:

This issue proposes a new way using the script from #3625 like:

Not in this PR idea:

fixes #3633

moodyjon commented 1 year ago

pick 2 random bytes query hub for all streams starting with those 2 random bytes (should be 270-350 claims)

Are you talking about searching by stream name or stream ID?

Claim names are human-meaningful, and the distribution of characters will not be uniform. The claim IDs would be uniformly random (IIUC) hex characters.

I worry that searching by name would produce widely varying numbers of claims (or claims that are correlated in some way).

shyba commented 1 year ago

Hello there,

LBRY DHT is based on Kademlia with sha384 hashing. Items are only searchable by content hash (sd_hash in a claim). This step searches the hub for sd_hashes samples. Check https://github.com/lbryio/lbry-sdk/blob/cc6cdc07f5067aa3a8e40b5421e0fd50fffbe0e7/scripts/sd_hash_sampler.py