Open Gozala opened 6 years ago
@Gozala Sorry for the late reply (from now on I'll get notified if a new issue is opened).
Your interpretation of the dagNode
and the resolver value
is correct. It's on my todo list to make this clearer in the spec.
To the unknowns:
The result
of a resolve()
depends indeed on the implementation. If it's a link it is {"/": <CID>}
. So if you want to know whether the result is a link you check do a check like result.value['/'] !== undefined
.
The link is always a single CID. I'm not sure if I understand your question correctly. Would you like to have a link resolved to the actual values automatically? If yes, I think that's something that should be part of a higher level API and live e.g. at https://github.com/ipld/js-ipld.
Your point about resolver.tree()
is a good one. I don't know why it is an array of objects and not just a single object. I'll try to find out why resp. change it. You are correct that the value
in the tuple is the same that resolve()
returns.
There is some correlation and it's indeed implementation specific. If you implement a format, e.g. Bitcoing you'll use any JavaScript library you want. Then the node
(I prefer dagNode
as "node" is such an ambiguous word) is the object how the JavaScript library you picked wants to represent it. When you then resolve a path, it also depends on how this library represents it. Though I also want to add some guidelines. E.g. that a CID is always returned the same way (as CID object) and not sometimes as a string or a Buffer.
Please keep those questions/comments coming. They are very valuable to enable me to make things clearer/easier.
@vmx thanks for getting back on this, I have some followups below. But first let me start with that I really want to make ipfs-js stack type-checker friendly as having clear interface would make it far more accessible IMO, at least I did struggle figuring out what things passed around and returned where, I tried to push this effort https://github.com/ipfs/community/issues/278 but sadly it does not seems like is going anywhere. Most of my remarks make sense from that perspective
The result of a resolve() depends indeed on the implementation.
Thanks, so making a Resolver<value>
generic parameterized with value
type makes sense then.
If it's a link it is {"/":
}. So if you want to know whether the result is a link you check do a check like result.value['/'] !== undefined.
That is somewhat unfortunate, it is very type-checker unfriendly API because some of the invariants can't be en-typed. So if were encode result in type system that's what it would look like:
type Path = string
type Link = { "/":CID }
interface Result<value> {
remainderPath:Path;
value:Link|value;
}
interface Resolver<value> {
resolve(ArrayBuffer, Path, Callback<Error, Result<value>>):void;
//...
}
Problem here is that result.value
is either {"/":CID}
or generic value
, but given that value
is generic it very well could have shape form of {"/": b}
which makes it impossible to refine type of result.value
. Alternative more type-checker friendly way would be to have an alternate API where there is no ambiguity:
type Result<value> =
| { remainderPath:Path, value:value }
| { remainderPath:Path, link:CID }
In above case type checker can infer that you either have result.link
of type CID
or that you have result.value
of generic type value
. There are many alternative ways in which result
type can be encoded such that it would not require disabling type checker & I would really love if we could choose one such encoding.
The link is always a single CID. I'm not sure if I understand your question correctly. Would you like to have a link resolved to the actual values automatically? If yes, I think that's something that should be part of a higher level API and live e.g. at https://github.com/ipld/js-ipld.
No. It's just my understanding of a link
was incorrect I did not knew it was always {'/':CID}
I though it could have being something like this {'/some/other/path':CID}
in which case I was suggesting that having it be more like {path:'/some/other/path', cid:CID}
would've being a better structure.
Your point about resolver.tree() is a good one. I don't know why it is an array of objects and not just a single object. I'll try to find out why resp. change it.
Thanks. Here my main concern is the same invariant of having a single path
to value
mapping is impossible to entype, type checker will force you to iterate and will not complain if you put multiple mappings either. Doing something like {path: '/foo', value:value}
or [path, value]
would be free of that. If each entry can contain multiple mappings than using dictionary makes sense but raises other questions like could different entries contain conflicting values for the same path and if so which one takes precedence & if so it would probably make to return single dictionary vs array of dictionaries as that would eliminate that problem as well.
There is some correlation and it's indeed implementation specific. If you implement a format, e.g. Bitcoing you'll use any JavaScript library you want. Then the node (I prefer dagNode as "node" is such an ambiguous word) is the object how the JavaScript library you picked wants to represent it. When you then resolve a path, it also depends on how this library represents it.
Hmm.. What I meant is that according to my interface definition you have Format<node>
and Resolver<value>
and from what observed two work together which implies there is correlation that is missing from type signatures, or more specifically this interface definition is incorrect I suspect that Format.serialize
does not passes just Buffer
into callback but rather a value
as in Resolver<value>
which in all existing implementations maybe be a Buffer
but still. If so it, would make sense as there will be Resolver<value>
and corresponding Format<node, value>
where value
type parameter is encoding of that correlation and there for types will be like:
interface Format<node, value> {
serialize(node, Callback<EncodeError, value>): void;
deserialize(Buffer, Callback<DecodeError, node>): void;
cid(node, Callback<CIDError, CID>): void;
}
interface Resolver<value> {
multicodec: CodecName;
resolve(CID, Path, Callback<ResolveError, Resolve<value>>): void;
tree(CID, Callback<TreeError, Entry<value>[]>): void;
tree(
CID,
{ level: number, values?: boolean },
Callback<Error, Entry<value>[]>
): void;
isLink(Buffer, Path, Callback<LinkError, Link>): void;
}
interface IPLDResolver<value> {
support: {
add <node>(CodecName, Resolver<value>, Format<node, value>):void
}
}
Though I also want to add some guidelines. E.g. that a CID is always returned the same way (as CID object) and not sometimes as a string or a Buffer.
Oh yes please!! I also would love if methods were also consistent in what they are taking it seems that all over the place they take some combination of CID
object, string
or a Buffer
and then trying to guess which one is it. If they were just accepting CID
objects than type checker would be able to enforce and eliminate any runtime errors associated with invalid CID encodings as strings
or buffers
. If it is really convenient to accept all three at least having different method would be my recommendation as it would also make interface more clear and at least one of them would be guaranteed to have no runtime errors.
Please keep those questions/comments coming. They are very valuable to enable me to make things clearer/easier.
I had ton of questions as I read through the code base to create interface / type definitions across all of the IPFS libraries (see https://github.com/Gozala/ipfs-drive/tree/master/flow-typed/npm). I was hoping to eventually get these into the actual libraries, but as mentioned https://github.com/ipfs/community/issues/278 seems stall and few of the pulls I submitted to add type annotations also did went nowhere so I moved to doing other things. As mentioned in that thread type checker tends to guide you toward cleaner and less ambiguous API design and I think there's room for improvement as things stand now. In those type interface defs there are also some // TODO:
comments in areas where I was unable to make sense of the API.
If you or someone on the core team has time to review my interface definitions that would be great & in fact we could go ahead and push them into https://github.com/flowtype/flow-typed to help ppl use ipfs stack and arguably those interface definitions are less ambiguous than markdown version for interface-*
packages so maybe those could be included there as well.
My experience with IPFS has being, given the number of layers involved it was unclear how those pieced fit together. And reading code was exhausting as it was hard to guess what were things passed around across those layers in fact I end up looking up same things over & over until I decided to write types for them so I could make sense of it all. While I still can't claim I do understand it all, at least I get some vague picture and having those types really help especially since code editor can provide insight right as I type.
Sorry it came out this long & thank for elaboration.
First of all, thanks for your effort. I felt similar 4 weeks ago when I tried to make sense of the code base. My immediate thought when reading this spec was also "with types in the source, it would be much simpler". I then also looked into TypeScritpt and flow (I haven't followed them too closely) and from my quick look I also liked flow for the typing purpose more. So I think we are on the same page here.
I had a quick read over https://github.com/ipfs/community/issues/278. I agree with most thing you say. I'll also will talk with @diasdavid about it again next week.
Now to the things related to this repository where I have more say :)
Problem here is that result.value is either {"/":CID} or generic value, but given that value is generic it very well could have shape form of {"/": b} which makes it impossible to refine type of result.value.
From a type checker perpective that's correct. Though implementers will/need to make sure that {"/": ...}
is always a link. Additionally remainderPath
only makes sense for links. This isn't apparent from the spec (yet), but that's a change I'd like to make. So the type is something like (please note that I haven't looked into flow, this is just so you get the idea):
type Result<value> =
| { value:value }
| { remainderPath:Path, value: {"/": CID} }
The advantage is that you always get a value
property without any need for further checking. An open question is how often that is desired. I guess time will tell.
Here my main concern is the same invariant of having a single path to value mapping is impossible to entype, type checker will force you to iterate and will not complain if you put multiple mappings either.
I don't think a type checker can help here. The returned object will have a path as key and any possible JS Object as value. For one specific implementation I expect it to return a specific type for a specific path. But of course different implementations will contain different paths (or paths with the same name, but implementation specific values).
I suspect that Format.serialize does not passes just
Buffer
into callback but rather avalue
as inResolver<value>
serialize()
will always return a Buffer
as it is some unspecified binary blob. It could be some CBOR encoded JSON or a Bitcoin block. value
can be any JS object.
So I'm a fan of types and I agree that they make APIs cleaner. Though it's also important to keep an API practical and sadly this often only visible when people actually use it. For this repo we have https://github.com/ipld/js-ipld as a consumer which is also likely to be the only entry point for external users. You won't use these formats directly.
Lot's of really valid points here ππ½ππ½ I'm loving these conversations and review of the APIs and proposals on how to make dev simpler, both by users and by the internal team.
We sort of "abuse" JavaScript lack of a proper TypeSystem to make the Resolver generic enough that enable us to traverse through multiple types, especially types we don't know in advance.
I followed up on https://github.com/ipfs/community/issues/278#issuecomment-361002950 with a practical approach for us to figure out what we want as a project and level up our understanding at the same time.
So here is my interpretation of the interface described in this repo encoded in flow types
Some notes on my interpretation:
Format utils API spec refers to
dagNode
but it is unclear what is the type or a shape, or whether it has specific one. My understanding is thatdagNode
representation is specific to the resolver implementation which is whyFormat<node>
has type parameternode
to representingdagNode
specific implementation would probably do something likeWhere
GitNode
will have a specefic type / shape.Resolver API refers to resolved
value
but to me it's unclear what the type / shape of the value is. I assumed here as well that it is specefic to the resolver implementation which is whyResolver<value>
uses generic type parametervalue
here I also assume that specific implementation would define the type / shape of the value.Unknowns
resolver.resolve(binaryBlob, path, callback)
mentions thatresult
passed to callback hasvalue
that is "The value resolved or an IPLD link". How do I figure which on is it ? Presumably I know what thevalue
type is and there for can figure out if it's that or a link, but then again I'm worried about the ambiguity here. I would much rather have result with type{ value:value, remainderPath:Path } | { link:Link, remainderPath:Path }
so it's clear which one I get back. Also is a link in this context a tuple with path and CID as described inisLink
{'/': <CID> }
resolver.tree
passes back array of tuples like[ { '/foo': 'bar' } ...]
does that mean each item in that array contains object with single property where it's name represents a path ? If so why not just use a data structure that would not require iteration to access data ? Also my guess was thatvalue
in such a tuple is of the same type as one returned inresolve
isn't it ?Resolver<value>
andFormat<node>
there is some correlation betweenvalue
andnode
type parameters but I can't figure it out. Maybe that relation is implementation detail of the actual resolver ?