wnfs-wg / spec

Other
39 stars 3 forks source link

Directory Sharding #8

Open matheus23 opened 2 years ago

matheus23 commented 2 years ago

In WNFS 1 in the public partition we re-used UnixFS for the directory listing, thus "borrowed" their mechanisms for sharding directories. However, that leads to more indirections and small nodes (the "userland" lived as its own UnixFS directory instead of within the WNFS directory header node), so we now inline the directory entry table in WNFS 2 directories.

In both WNFS 1 and WNFS 2 we're missing a specification for how to deal with large private directories. Ideally they should be sharded as well.

We should specify

  1. which data structure to use for sharding directories in general and
  2. in the private partition we should decide how the pieces of a sharded directory get addressed inside the HAMT and which keys are used for encrypting them.

UnixFS has decided to use HAMTs for (1). However, the downside of using HAMTs is you either balance the HAMT by hashing each directory name and thus lose range queries or you have a badly performing non-balanced tree.

We should consider search-tree based constructions like Merkle-B-Trees or Merkle Search Trees.

matheus23 commented 2 years ago

Hitchhiker tree to be considered: https://github.com/datacrypt-project/hitchhiker-tree

matheus23 commented 2 years ago

Prolly Trees may be an improvement over Merkle Search Trees, but with the same goal, via @mikeal (also see https://github.com/mikeal/prolly-trees).