ipld / go-ipld-prime

Golang interfaces for the IPLD Data Model, with core Codecs included, IPLD Schemas support, and some handy functional transforms tools.
MIT License
133 stars 50 forks source link

RFC: DagPB / UnixFS V1 Support #33

Closed hannahhoward closed 4 years ago

hannahhoward commented 5 years ago

What

This issue describes potential support for DagPB as an encoding format in go-ipld-prime.

Why

Filecoin is will work with IPLD data structures encoded in DagPB and it needs to Graphsync them, which mean it needs to apply selectors to them, which mean it needs go-ipld-prime to be able to read them. It is not yet clear if it needs to be able to select against paths within the Data portion of DagPB used by UnixFS v1.

How

It seems there are at a couple things needed to successfully do a selector traversal against a DagPB-encoded IPLD data structure.

  1. We need to build an encoder/decoder for DagPB and possibly a custom NodeBuilder/Node

  2. We need to register said encoder/decoder for DagPB with the encoder/decoder registry defined in https://github.com/ipld/go-ipld-prime/blob/master/linking/cid/multicodecRegistry.go

Encoder/Decoder vs Node/NodeBuilder

It's possible to build a very limited implementation of DagPB encoding/decoding with a generic NodeBuilder/Node. We could decode the protobufs directly and essentially build a node of type:

type PBLink struct {
   Hash Cid
   Name String
   Tsize Integer   
} representation map

type PBNode struct {
    Links [PBLink]
    Data Bytes
} representation map

However--- this would mean we could not path with link names, only indexes per https://github.com/ipld/specs/blob/master/block-layer/codecs/dag-pb.md#pathing which current go & js implementations support.

To get more functionality, specifically full support of typical path schemes and potentially access to unix fs data, we'd also need a custom Node/NodeBuilder combo, ideally with some kind of fastpath encode/decode.

Perhaps since it's hard to implement the full DagPB spec (specifically pathing) with a generic NodeBuilder/Node + Encoder/Decoder, maybe the DagPB encoder/decoder should have no code of its own and simply error if the Node/NodeBuilder does not offer a fast path?

Inside/outside go-ipld-prime

Because the codec registry and the Node/NodeBuilder interfaces are public, all of this could potentially be implemented outside of go-ipld-prime, in the client library of go-ipld-prime that uses it. At the same time, DagPB is part of the ipld spec, so it makes sense to keep it in the library, even if DagPB is in some sense, a legacy encoding format (note the spec does not identify it as such, though it acknowledges it does not support the full IPLD data model)

My initial take is:

hannahhoward commented 5 years ago

To be clear, I'm seeking feedback on the final section (my take) -- none of those opinions are held strongly though I think it would be nice to do this in library.

warpfork commented 4 years ago

I'm inclined to rather try to do DagPB in a separate repo. JSON and CBOR have gotten a special pass because they're general purpose useful "batteries included" options and relatively low transitive dependency cost; and at least one such thing is useful to have in the main repo. 3 or beyond not so much. (I'd probably even have picked one and not two for this if the second one hadn't been 'free' so to speak, both coming via refmt token interface.)


UnixFS V1 should be assumed. There are very few other uses of DagPB.

:+1:

I'd also think it neat to have more reusable protobuf codecs in the future... but I think that's a different and bigger topic requiring many trickier bits of engineering. (And for starters, it doesn't actually jive with multicodecs varints at all, since one needs the whole protobuf schema as a parameter to fully create the codec...) So doing this at smaller scope and for specifically unixfsv1 is a reasonable proposition.


Using a custom Node implementation to manage the translation of the name-link pairs into a map of links sounds about right. And IIUC we would indeed want that so pathing by name works as expected for DagPB.

I guess the interesting question is if we want to do both...? And if that means two different kinds of Node implementation? (Can they share the same memory? I dunno!)

hannahhoward commented 4 years ago

Closing as the first implementation will be in a seperate repo

vmx commented 4 years ago

In case someone is looking for that separate repo, it's here: https://github.com/ipld/go-ipld-prime-proto/