ipni / storetheindex

A directory of CIDs
Other
78 stars 22 forks source link

Support a new prefix-map entries format for ingest #420

Closed willscott closed 2 years ago

willscott commented 2 years ago

currently, the entries cid of an advertisement being ingested is expected to be structured as a list in the form [[mh, mh,...], next].

In addition, it would be useful to allow support for entries to be structured in the form {'prefix':next, 'prefix':next, ...}.

This would allow for efficient distribution of subsets of advertisement data to different index shards handling different prefix-spaces.

gammazero commented 2 years ago

Different indexers (indexer federations) will make different sharding choices based on the number of nodes, and the sharding may change as nodes are added or removed. This means that one prefix length will not be the best for all indexers or for all time. So, how should the prefix length be chosen? Or, are the advertisement entries communicated in something more like a trie (prefix tree) letting the indexers decide how far down the tree to walk to determine which shard a multihash is part of?

willscott commented 2 years ago

I think i was imagining a trie, or that even if the hamt-ness of a provider doesn't quite match the layering of a given index instance, it'll still be better than no sharding