ipld / interface-ipld-format

A interface you can follow to implement a valid IPLD format, resolvable through the IPLD Resolver (available in IPFS)
Other
29 stars 5 forks source link

Trying to figure out actual interface described in this repo #15

Open Gozala opened 6 years ago

Gozala commented 6 years ago

So here is my interpretation of the interface described in this repo encoded in flow types

// https://github.com/ipld/interface-ipld-format#ipld-format-utils
interface Format <node> {
  serialize(node, Callback<Error, ArrayBuffer>):void;
  deserialize(ArrayBuffer, Callback<Error, node>):void;
  cid(node, Callback<Error, CID>):void;
}

// https://github.com/ipld/interface-ipld-format#local-resolver-methods
interface Resolver<value> {
  resolve(ArrayBuffer, Path, Callback<Error, Result<value>>):void;
  tree(ArrayBuffer, Callback<Error, Entry<value>[]>):void;
  tree(ArrayBuffer, {level:number, values?:boolean}, Callback<Error, Entry<value>[]>):void;
  isLink(ArrayBuffer, Path, Callback<Error, Link>):void;
}

type Path = string
type Link = { [Path]:CID }

interface Result<value> {
  remainderPath:Path;
  value:Link|value;
}

type Entry <value> = {[Path]:value}

// Details of CID aren't actually important in this context
interface CID {
}

// Callbacks is a function that is either passed an error as first argument,
// or passed just a second `value` argument if successful.  
interface Callback <error, value> {
  (error):void;
  (null|void, value):void;
}

Some notes on my interpretation:

Unknowns

vmx commented 6 years ago

@Gozala Sorry for the late reply (from now on I'll get notified if a new issue is opened).

Your interpretation of the dagNode and the resolver value is correct. It's on my todo list to make this clearer in the spec.

To the unknowns:

Please keep those questions/comments coming. They are very valuable to enable me to make things clearer/easier.

Gozala commented 6 years ago

@vmx thanks for getting back on this, I have some followups below. But first let me start with that I really want to make ipfs-js stack type-checker friendly as having clear interface would make it far more accessible IMO, at least I did struggle figuring out what things passed around and returned where, I tried to push this effort https://github.com/ipfs/community/issues/278 but sadly it does not seems like is going anywhere. Most of my remarks make sense from that perspective

The result of a resolve() depends indeed on the implementation.

Thanks, so making a Resolver<value> generic parameterized with valuetype makes sense then.

If it's a link it is {"/": }. So if you want to know whether the result is a link you check do a check like result.value['/'] !== undefined.

That is somewhat unfortunate, it is very type-checker unfriendly API because some of the invariants can't be en-typed. So if were encode result in type system that's what it would look like:

type Path = string
type Link = { "/":CID }

interface Result<value> {
  remainderPath:Path;
  value:Link|value;
}

interface Resolver<value> {
  resolve(ArrayBuffer, Path, Callback<Error, Result<value>>):void;
  //...
}

Problem here is that result.value is either {"/":CID} or generic value, but given that value is generic it very well could have shape form of {"/": b} which makes it impossible to refine type of result.value. Alternative more type-checker friendly way would be to have an alternate API where there is no ambiguity:

type Result<value> =
  | { remainderPath:Path, value:value }
  | { remainderPath:Path, link:CID }

In above case type checker can infer that you either have result.link of type CID or that you have result.value of generic type value. There are many alternative ways in which result type can be encoded such that it would not require disabling type checker & I would really love if we could choose one such encoding.

The link is always a single CID. I'm not sure if I understand your question correctly. Would you like to have a link resolved to the actual values automatically? If yes, I think that's something that should be part of a higher level API and live e.g. at https://github.com/ipld/js-ipld.

No. It's just my understanding of a link was incorrect I did not knew it was always {'/':CID} I though it could have being something like this {'/some/other/path':CID} in which case I was suggesting that having it be more like {path:'/some/other/path', cid:CID} would've being a better structure.

Your point about resolver.tree() is a good one. I don't know why it is an array of objects and not just a single object. I'll try to find out why resp. change it.

Thanks. Here my main concern is the same invariant of having a single path to value mapping is impossible to entype, type checker will force you to iterate and will not complain if you put multiple mappings either. Doing something like {path: '/foo', value:value} or [path, value] would be free of that. If each entry can contain multiple mappings than using dictionary makes sense but raises other questions like could different entries contain conflicting values for the same path and if so which one takes precedence & if so it would probably make to return single dictionary vs array of dictionaries as that would eliminate that problem as well.

There is some correlation and it's indeed implementation specific. If you implement a format, e.g. Bitcoing you'll use any JavaScript library you want. Then the node (I prefer dagNode as "node" is such an ambiguous word) is the object how the JavaScript library you picked wants to represent it. When you then resolve a path, it also depends on how this library represents it.

Hmm.. What I meant is that according to my interface definition you have Format<node> and Resolver<value> and from what observed two work together which implies there is correlation that is missing from type signatures, or more specifically this interface definition is incorrect I suspect that Format.serialize does not passes just Buffer into callback but rather a value as in Resolver<value> which in all existing implementations maybe be a Buffer but still. If so it, would make sense as there will be Resolver<value> and corresponding Format<node, value> where value type parameter is encoding of that correlation and there for types will be like:

  interface Format<node, value> {
    serialize(node, Callback<EncodeError, value>): void;
    deserialize(Buffer, Callback<DecodeError, node>): void;
    cid(node, Callback<CIDError, CID>): void;
  }

  interface Resolver<value> {
    multicodec: CodecName;

    resolve(CID, Path, Callback<ResolveError, Resolve<value>>): void;
    tree(CID, Callback<TreeError, Entry<value>[]>): void;
    tree(
      CID,
      { level: number, values?: boolean },
      Callback<Error, Entry<value>[]>
    ): void;
    isLink(Buffer, Path, Callback<LinkError, Link>): void;
  }

  interface IPLDResolver<value> {
    support: {
      add <node>(CodecName, Resolver<value>, Format<node, value>):void
    }
  }

Though I also want to add some guidelines. E.g. that a CID is always returned the same way (as CID object) and not sometimes as a string or a Buffer.

Oh yes please!! I also would love if methods were also consistent in what they are taking it seems that all over the place they take some combination of CID object, string or a Buffer and then trying to guess which one is it. If they were just accepting CID objects than type checker would be able to enforce and eliminate any runtime errors associated with invalid CID encodings as strings or buffers. If it is really convenient to accept all three at least having different method would be my recommendation as it would also make interface more clear and at least one of them would be guaranteed to have no runtime errors.

Please keep those questions/comments coming. They are very valuable to enable me to make things clearer/easier.

I had ton of questions as I read through the code base to create interface / type definitions across all of the IPFS libraries (see https://github.com/Gozala/ipfs-drive/tree/master/flow-typed/npm). I was hoping to eventually get these into the actual libraries, but as mentioned https://github.com/ipfs/community/issues/278 seems stall and few of the pulls I submitted to add type annotations also did went nowhere so I moved to doing other things. As mentioned in that thread type checker tends to guide you toward cleaner and less ambiguous API design and I think there's room for improvement as things stand now. In those type interface defs there are also some // TODO: comments in areas where I was unable to make sense of the API.

If you or someone on the core team has time to review my interface definitions that would be great & in fact we could go ahead and push them into https://github.com/flowtype/flow-typed to help ppl use ipfs stack and arguably those interface definitions are less ambiguous than markdown version for interface-* packages so maybe those could be included there as well.

My experience with IPFS has being, given the number of layers involved it was unclear how those pieced fit together. And reading code was exhausting as it was hard to guess what were things passed around across those layers in fact I end up looking up same things over & over until I decided to write types for them so I could make sense of it all. While I still can't claim I do understand it all, at least I get some vague picture and having those types really help especially since code editor can provide insight right as I type.

Sorry it came out this long & thank for elaboration.

vmx commented 6 years ago

First of all, thanks for your effort. I felt similar 4 weeks ago when I tried to make sense of the code base. My immediate thought when reading this spec was also "with types in the source, it would be much simpler". I then also looked into TypeScritpt and flow (I haven't followed them too closely) and from my quick look I also liked flow for the typing purpose more. So I think we are on the same page here.

I had a quick read over https://github.com/ipfs/community/issues/278. I agree with most thing you say. I'll also will talk with @diasdavid about it again next week.

Now to the things related to this repository where I have more say :)

Problem here is that result.value is either {"/":CID} or generic value, but given that value is generic it very well could have shape form of {"/": b} which makes it impossible to refine type of result.value.

From a type checker perpective that's correct. Though implementers will/need to make sure that {"/": ...} is always a link. Additionally remainderPath only makes sense for links. This isn't apparent from the spec (yet), but that's a change I'd like to make. So the type is something like (please note that I haven't looked into flow, this is just so you get the idea):

type Result<value> =
  | { value:value }
  | { remainderPath:Path, value: {"/": CID} }

The advantage is that you always get a value property without any need for further checking. An open question is how often that is desired. I guess time will tell.

Here my main concern is the same invariant of having a single path to value mapping is impossible to entype, type checker will force you to iterate and will not complain if you put multiple mappings either.

I don't think a type checker can help here. The returned object will have a path as key and any possible JS Object as value. For one specific implementation I expect it to return a specific type for a specific path. But of course different implementations will contain different paths (or paths with the same name, but implementation specific values).

I suspect that Format.serialize does not passes just Buffer into callback but rather a value as in Resolver<value>

serialize() will always return a Buffer as it is some unspecified binary blob. It could be some CBOR encoded JSON or a Bitcoin block. value can be any JS object.

So I'm a fan of types and I agree that they make APIs cleaner. Though it's also important to keep an API practical and sadly this often only visible when people actually use it. For this repo we have https://github.com/ipld/js-ipld as a consumer which is also likely to be the only entry point for external users. You won't use these formats directly.

daviddias commented 6 years ago

Lot's of really valid points here πŸ‘πŸ½πŸ‘πŸ½ I'm loving these conversations and review of the APIs and proposals on how to make dev simpler, both by users and by the internal team.

We sort of "abuse" JavaScript lack of a proper TypeSystem to make the Resolver generic enough that enable us to traverse through multiple types, especially types we don't know in advance.

I followed up on https://github.com/ipfs/community/issues/278#issuecomment-361002950 with a practical approach for us to figure out what we want as a project and level up our understanding at the same time.