ipfs / notes

IPFS Collaborative Notebook for Research
MIT License
402 stars 31 forks source link

Sharding IPLD objects #76

Open jbenet opened 8 years ago

jbenet commented 8 years ago

This note gathers the constraints + will drive toward a design of object sharding in IPFS and IPLD. Object sharding is the algorithms and formats used to represent a single (virtual) large object out of many smaller ones. Think of this like the way large directories are represented in modern filesystems (RB-Trees, B-Trees, HTrees, etc).

Sharding IPLD objects in general is a useful thing. instead of implementing it for unixfs and other datastructs each time, we could implement it once. it could be a datastruct the others employ, or maybe -- if it is simple enough -- it belongs as part of IPLD itself.

Constraints to support:

For large fanouts, look at


case for supporting it on-top of IPLD


cc @whyrusleeping @lgierth @diasdavid @cryptix @ion1 @mildred @tv42 @wking

ion1 commented 8 years ago

It might be useful to take advantage of chunking by a rolling hash with large flat data structures receiving small arbitrary changes, such as a directory.

davidar commented 8 years ago

Radix trees have the benefit of being deterministic (invariant to the order of insertions), which helps with deduplication.

Edit: also see http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.5452

jbenet commented 8 years ago

cc @whyrusleeping comment here.

jbenet commented 8 years ago

From the IPLD spec, one very trivial but extensible thing to do all-IPLD-object-wide is something like this:

{
  "@extends": [
    "/ipfs/<hash1>",
    "/ipfs/<hash2>",
    "/ipfs/<hash3>",
  ],
  ... // more properties
}

which would merge the objects like so:

o1 = get(/ipfs/<hash1>)
o2 = get(/ipfs/<hash2>)
o3 = get(/ipfs/<hash3>)
o4 = get(/ipfs/<hash4>) // our object above

merged = _.extends(o1, o2, o3, o4)

meaning that we start with the first, and we patch in order. so a property in o4 replaces those preceding it. this is very simple and can likely be used to support the desires mentioned earlier in this issue (more complicated structures).


It would be useful to have examples of directory sharding with:

expressed as the corresponding JSON or YML syntax for IPLD. (i use JSON above because it tends to be clearer, but i prefer YML). that way we can see what it takes to support it natively in IPLD.

davidar commented 8 years ago

Personally I'd like it if this was a core part of IPLD, so that large objects were transparently sharded without the user having to explicitly do anything (much like the transparent merkle-linking). Here's an example of a manual radix tree though:

r:
  om:
    an:
      e: 1
      us: 2
    ulus: 3
  ub:
    e:
      ns: 4
      r: 5
    ic:
      on: 6
      undus: 7

Sub-trees can be split into separate (merkle-linked) objects, analogously to what we already do with chunking linear streams.

jbenet commented 8 years ago
mildred commented 8 years ago

I think this should not be implemented below the basic IPLD data model: you want, depending on the application, to control where to split objects. For example, metadata should be easily accessible while the complete file data could take a little more time to load.

I think this should be an extension of IPLD. And you could have the choice to:

It might be necessary to think how exactly we are going to reassemble the objects. Which algorithms. In particular. Especially considering that's not just about tree merging, but also data string merging.

For example, an object might be composed of a huge binary data string (for example the object representing a huge file). This data string shoud be able to be split among different physical IPLD objects.

The way I would implement it is to have index objects that links to sub parts, and specify with the link the way they are merged. Something like jbenet suggested

@davidar I disagree with you when you say that it should be completely transparent. The application should have control over the splitting of files. We could have a library that does a good job of splitting when the application author doesn't want to bother, but we should be able to have full control.

jbenet commented 8 years ago

Agree in full with @mildred

jbenet commented 8 years ago

Some links for references.

A tricky problem here is tuning the trade off between "good datastructures that create very small nodes" and "good sized objects to minimize the hashing costs, cache misses, and network ops". Basically, there's sharding, which represents one object with many small ones. and then there's aggregation, to represent many small objects with a big one. Most good approaches to sharding from data struct literature yield excellent datastructs with many small objects, which we'd want to aggregate. aggregation is tricky, because making it stable (convergent, for hashing) and efficient (minimize update costs) is challenging. Aggregation is not straight forward; because "bundling many objects in the underlying transport instead of in the data model" is not enough, as that does not minimize hashing and random access advertisements.

jbenet commented 8 years ago

More on HAMTs

whyrusleeping commented 8 years ago

@jbenet patricia trees would be cool. But figuring out how to aggregate them in a way that is stable after random insertion orders is going to be very tricky.

The HAMT sounds a lot like what you were hacking on in SF, except for the part about counting bits for non-nil pointers. With that trick, we avoid having hugely bloated nodes which was my primary complaint with your previous idea.

whyrusleeping commented 8 years ago

looking at implementation details for the HAMT and found this absurd 'population count' implementation: https://play.golang.org/p/U7SogJ7psJ

jbenet commented 8 years ago

Yeah HAMT is the way to go. i love when we find the trails of other researchers and we realize they've been paved pretty well, and were after exactly the same thing.

jbenet commented 8 years ago

btw i pushed the js thing i was making: https://github.com/jbenet/ipld-ht-experiment

here were some interesting questions raised:

Parameters

Paging

Writes

Hashing

use non-cryptographic hash functions

use seeds in children/intermediate HT nodes deterministic to their position in the tree/branch.

Differentiating Leaves

whyrusleeping commented 8 years ago

@jbenet should the names on links pointing to the intermediate nodes just be their index? I think that makes the most sense.

jbenet commented 8 years ago

@whyrusleeping not sure what the index buys you, but sure, that works for the protobuf version. but not the IPLD version.

I think actually that we should make ONE version (IPLD naturally), and project it into protobuf, so that the things are interoperable with IPLD from the get go, and not have two different HAMT implementations.

jbenet commented 8 years ago

Great article on map data structures for external memory — i.e. ways of storing indexed data that are optimized for storage on disk (or networks :) ).

whyrusleeping commented 8 years ago

@jbenet How does a sharded directory work with paths? Should paths be resolved in the context of unixfs? or strictly with merkledag links?

i.e. a sharded node has links:

{
  Links: {
    "AB" : { 'child shard' },
    "FE" : { 'child shard' },
    "12" : { // this is a child shard too, just showing it for the example
      "Links" : {
         "D4ActualDirName" : "dirhash",
      }
    }
}

this would be a sharded directory that contains (among other things) a directory named "ActualDirName". To access this, I'd expect to be able to do /ipfs/<shardhash>/ActualDirName, but /ipfs/<shardhash>/12/D4ActualDirName is the actual path through the raw DAG.

jbenet commented 8 years ago

unixfs should allow resolving the virtual dag transparently.

jbenet commented 8 years ago

https://github.com/krl/ipfs-crdts

daviddias commented 7 years ago

And update on efficient immutable collections that takes on HAMT and CHAMT:

https://michael.steindorfer.name/publications/phd-thesis-efficient-immutable-collections.pdf

HN discussion

Kubuxu commented 7 years ago

Getting HAMT into IPLD would be major step in increasing its usability. It can be done in two ways: by introducing it at resolver layer (it would be transparent), or buy writing out clear spec and few reference implementations on application layer.

I was helping @Magik6k to work on Wikipedia Search using IPLD, he had to build his own hashmap structure (which was static and needed full rebuild everything, also parameters weren't autotuned). This increased time needed for this project significantly.

Kubuxu commented 7 years ago

Also AFAIK we already use partially use principles of CHAMT as we don't allocate full array, and we use bitmap masking and hashing.

daviddias commented 7 years ago

@Kubuxu agreed, I think the direction that everyone wants to is have two layers of IPLD, the first being the one we implemented (The resolver) and the second one that understands things like sharding and other data structs for custom path handling. I know that a few people put a lot of thought into this and we ended up deciding to implement the base layer first, now it seems the right time to start figuring out that second layer.