multiformats / js-multiformats

Multiformats interface (multihash, multicodec, multibase and CID)
Other
224 stars 52 forks source link

Proposal: CID+Block=Multiblock #37

Open Gozala opened 3 years ago

Gozala commented 3 years ago

We already have multibase, multihash that all in nutshell are metada+data. We don not however have similar thing for blocks, so it becomes impossible to derive what codec to use to decode it.

In the past when I was working on https://github.com/gozala/ipdf/ I came up with cid+block thing that I called inline blocks, so that graphs could contain encrypted and concealed sub-graphs that would only reveal themselves to the key holder.

CAR format seems to also pair CID+blocks.

And this thread https://github.com/multiformats/js-multiformats/pull/36#issuecomment-694619835 I think also illustrates lack of such abstraction.

Ironically JS Block instance also contains CID+Block but when you encoded you can no longer decode it back without additionally providing 'codec' information.

I think if we do formalize such a building block it would allow for a nice and compos-able libraries around it.

mikeal commented 3 years ago

So, conceptually we have this, which is that a block is a pair of [CID, Data].

As far as a standardized binary representation, the reason we haven’t needed it yet is because the block store abstractions are already key/value storage engines. We have something that sort of matches this description in the CAR file format because we needed a binary representation of a block.

Since CID’s can already be linearly parsed, the most compact binary representation would be [ CID, VARINT(DIGEST_LENGTH), DIGEST ].

If we’re going to standardize this, one thing you’re going to want to include is the proper representation of an inline CID w/ identity multihash.

rvagg commented 3 years ago

I think it was Steven that pointed out that the CAR format is almost a complete pattern if it weren't for the header not having a prefix. I suspect if we had a formality around a CID+Block representation then it may have been more natural to make it a simple stream of CID+Binary patterns, including the header, with no particular special handling of the header at the encode/decode layer.

I'm not sure how you would create such a specification without having something to apply it to, though. Unless we pick up the CAR version and say "this thing is now a spec on its own, go over here to see how [ varint | CID | binary ] is a fully specified beast that we call Bob". Beyond that though, can you outline some tools that would make use of this thing that would make it worth specifying?