Open Gozala opened 5 years ago
This seems to be roughly the same as @mikeal's proposal except for naming changes and having everything as a static method exposed by the codec/format.
I'd like to see some use-cases from @mikeal but I'm suspecting that the ergonimics of passing around a Reader
instance might be much nicer than having to have both the opaque BinaryEncoded<a>
as well as the format/codec itself. Maybe a better name than Reader
could be found though because it's essentially making a BinaryEncodedWrapper
.
function doSomethingWithBlock(blockReader)
vs function doSomethingWithBlock(codec, block)
- the two items are (presumably) inseparable, one doesn't make sense without the other, so tying them together might be both safer and make it less clunky to use.
Regarding naming: I agree on codec
(although the distinction seems less obvious than you suggest, they're all referring to formats still) and on encode
/ decode
but mainly because I'm a non-American English speaker and the z
's always irk me. I think @mikeal's remaining
makes more sense than path
in your IPLDLinkEntry<b>
, however. It needs little or no clarification.
This seems to be roughly the same as @mikeal's proposal except for naming changes and having everything as a static method exposed by the codec/format.
Yes, except this does not operates on IPLDBlock
which was one of the major changes in there
I'd like to see some use-cases from @mikeal but I'm suspecting that the ergonimics of passing around a Reader instance might be much nicer than having to have both the opaque BinaryEncoded as well as the format/codec itself. Maybe a better name than Reader could be found though because it's essentially making a BinaryEncodedWrapper.
I am sure there are cases where passing something like Reader
is both convenient and necessary, that however doesn’t mean it should be part of core API. In fact making that separate layer would allow generelaize Reader
freeing codec implemetera from dealing with it and potential bugs
I think @mikeal's remaining makes more sense than path in your
IPLDLinkEntry<b>
, however. It needs little or no clarification.Depends on the context IMO. If you think about link target node + path makes a lot of sense. Just like in URL host + path makes sense.
If you think about trying to resolve something but not quite making it and there for having remaining piece left probably remaining
makes more sense. I don’t like that analogy however, I think it’s awkward way to think about it
I should have made that more clear though
I prefer "remaining", as "path" to me is to ambiguous it's not clear to me if it's the path to the CID or the path to the value after following the CID ("path to the CID" doesn't make sense when you think about it, I would find it non-intuitive though).
I'm fine with "encoding/decoding".
On my TODO list is still a write-up about the different IPLD layers. I need to put more thought into this, but here's the rough idea. Currently we have IPLD Formats and JS-IPLD, but we need more. I like this proposal, this would be about the lowest level where there's only binary data and the IPLD Data Model. One layer on top could then be more about what @mikeal's proposal (https://github.com/ipld/interface-ipld-format/issues/52) is about, where it is about Blocks, hence also about CIDs. The next layer would then be about IPLD Format agnostic APIs, you could e.g. call decode()
on a Block and it will automatically use the right format etc. The top layer would then also be about storing and retrieving things.
I prefer "remaining", as "path" to me is to ambiguous it's not clear to me if it's the path to the CID or the path to the value after following the CID ("path to the CID" doesn't make sense when you think about it, I would find it non-intuitive though).
But it’s not the path to CID it’s path to whatever link is so if you have a link with shape {target, path}
it’s pointer to a values at the given target and path.
I also would like to extend that further with inline blocks (if we get there) where link target could also be an inline block rather thatn CID. It also would be very composable with general access. Right now resolver needs to manually traverse decoded object, with all the proposed pieces you could instead return link with in the data structure and let other resolver take over {target: new Block(decodedData, "dag-json"), path: remainingPath) }
I think “remaining” makes sence if you’re thinking about in terms of reaching a destination. But if you think of link to something, or a poiner then “remaining” starts to sound little confusing.
Maybe {target, at}
would be a more clear than either of two regardless of the way you think about it ?
serialize is awfully long and easy to misspell, encode is shorter and lot harder to get wrong. deserialize is even longer, decode isn't.
+1, I’ll probably go and update my proposal with this as well.
format - Is the way in which something is arranged or set out, in other words it corresponds to how and not what. codec on the other hands is what that knows how to encode / decode to a specific format. There for I would argue that codec is much better term to use as it represents implementation that can encode data into binary of specific format and decode binary in that format back to the data. It also make more sense for codec to provide operations named encode / decode than any other term.
Not too long ago, we had a variety of terms being used to describe the codec, format being one of them. We also had format and codec being used to describe other things which only piled onto the confusion.
I ended up writing up our terminology in the spec repo to try and standardize the language we use.
I went with codec
for consistency, as the multicodec
project already had their naming locked in and I wasn’t going to be able to change it, and our “codecs” are in-fact references to registered codecs in multicodec.
I then specified the term “format” in our terminology to refer to the existing format we’re leveraging, as that was the thing we were most often getting confused in the terminology.
dag-json
is a “codec” that uses the json
“format.”
I’m sympathetic to your rational for wanting to see this somewhat reversed but I think it would end up causing confusion given the dependency we have on multicodec 😢
One last thing to note, the encode
interface should be optional. We have codecs that we need to read but will probably never need to write with this interface (git, eth, and bitcoin blocks for instance). In fact, there’s a whole world of content addressed data that we want to be able to address but never modify or author. This is why we wrote the “Data Model” spec, to distinguish between the codecs we’re using for building all of these higher level data-structures and those that we only want to be able to read and link into.
One last thing to note, the
encode
interface should be optional. We have codecs that we need to read but will probably never need to write with this interface (git, eth, and bitcoin blocks for instance). In fact, there’s a whole world of content addressed data that we want to be able to address but never modify or author. This is why we wrote the “Data Model” spec, to distinguish between the codecs we’re using for building all of these higher level data-structures and those that we only want to be able to read and link into.
That's fare, but should not such codec just throw exception from encode
?
dag-json
is a “codec” that uses thejson
“format.”
Wait is this a desired terminology or undesired ? I feel like this exactly the right terminology which I see as aligned with proposed naming no ?
I went with
codec
for consistency, as themulticodec
project already had their naming locked in and I wasn’t going to be able to change it, and our “codecs” are in-fact references to registered codecs in multicodec.
I'm failing to see conflict here.
Codec encodes data into specific format. Multicodec tags encoded data with corresponding codec identifier to describe it's format.
I think you register codecs with corresponding codec identifier.
Could just terminology be refined to use "codec identifier" where codec is being used now and "codec" used in place where format is today ?
Wait is this a desired terminology or undesired ?
It’s the current terminology :)
For me a Format implements a Codec.
Several of these ideas ended up in https://github.com/ipld/js-ipld-stack.
I've being asked to provide feedback on API changes / proposals & sadly that usually means I go and challenge peoples efforts. I think it might be better / only fare to propose API myself and let others challenge it instead. So here is what IMO would be a best API:
I would also like to use this opportunity to challenge naming conventions as I have following issues with it:
serialize
is awfully long and easy to misspell,encode
is shorter and lot harder to get wrong.deserialize
is even longer,decode
isn't.format
- Is the way in which something is arranged or set out, in other words it corresponds to how and not what. codec on the other hands is what that knows how to encode / decode to a specific format. There for I would argue that codec is much better term to use as it represents implementation that canencode
data into binary of specific format and decode binary in that format back to the data. It also make more sense for codec to provide operations namedencode
/decode
than any other term.We'll use opaque type alias
BinaryEncoded<a>
to refer to binary encoded data of typea
.We'll also use
CID<a>
interface to refer to CIDs for data of typea
.With that out of the way I would propose that valid IPLD Codec be an implementation that complies with a
IPLDCodec<a>
interface as defined below:Additional notes:
I have removed
cid
function, I am convinced that it's a historical artifact. All the implementations I've seen use providedhashAlgoritm
(or default if not provided) to create multihash of the encoded data, and then use passedversion
& multicodecformat
to instantiateCID
.In other words I do not see how codec could do anything else, maybe use different serialization method ? Seems unlikely, but if so maybe instead implementation should be required to provide that instead. Another thing I can imagine is a different way to hash ? If so, again should probably codec should just be required to provide optional method for that otherwise
cid
is method requirement is nothing but a way to introduce bugs.Anything returning Promise could also return value that it would resolve to. Consumer still can
await
without any impact & is also free to avoidawait
when relevant. From implementer perspective there is no need to wrap anything in promise.This incorporates
links
API proposed #52tree
is replaced bypaths
and is scoped to immediate paths, meaning it's not codec implement's job to follow links etc... Generalized tree traversal can be implemented outside and isn't codec implementer concern.get(blob, path)
returns (eventual) "entry" that is either some inline data that was encoded into the blob, or a link with-in the other IPLD node. Note that link haspath
that corresponds to path to the data with-in that node link is targeting.Different value for entry
type
is used ("link", "inline") that enables type checkers (flow, typescript) to refine type with in the union and in general makes it obvious how one figures what kind of entry is back (what ifdata
isnull
? what if for some reason returned entry also has being passed atarget
feild ?)@mikeal was suggesting in #52 to embrace IPLD Block API which is I'm all up for, however I don't think it should be made codec implementer concern. API proposed in #52 can trivially be layered on top of proposed API and be agnostic of codec implementation.
Concerns
I have brought up in the other thread https://github.com/ipld/ipld/issues/64#issuecomment-473442520 the fact that support for passing parameters should be a fundamental design concern to ensure that things like IPLD Explorer can be just as useful with parametric IPLD Nodes. That however (at least as proposed) conflicts with
links(a)
which assumes links can be revealed without any parameters.For that reason I would prefer to omit(Edit: After thinking bit more I don't thinklinks
and replacepaths
with something likelist
list
as proposed below is even going help much, I'll get back on this once I'll have more time to think through):