Open alanshaw opened 5 years ago
I like this approach. The added API surface area is entirely reasonable. Overloading .add
, .put
, etc, as you mentioned just leads to confusion. We currently have a lot of methods that try to overload for simplicity, but it just makes it harder to follow and harder for us to maintain that code. Having clearly defined function contracts makes it easier for users to make informed decisions about which one they need to use. Since js doesn't have proper overloading, I'm a big π to this.
This is a kind of proposal that once it is on front of us, one goes like "why haven't we realized this sooner!" :D You have my π
I copied the original API, where the methods always return an Iterable for the new IPLD API. I was concerned about the same issue as mentioned here. After porting a lot of code, using .first()
didn't feel as bad as I thought, which was surprising to me. The consistency is a big win.
For me it is easy to remember that those functions always return an Iterable. If it's a single item only I have the convenient .first()
call. I only need to learn one API primitive, I can even forget about .first()
if I want to, as long as I remember that the return values are an Iterable. With this change I need to remember that there's also an *each()
version of the functions.
Though I can also see that the "single item version" is probably used more often, so it does make sense to make that as easy as possible. As you can see on my reply, I'm a bit torn.
I share @vmx's hesitation. We're trading off calling .first()
on the return value for having 2 api entries for every insert-ish operation. API surface area has a cost in docs, and cognitive load, even if it may simplify the implementation. IPFSx greatly improves that, so perhaps we have some credit in the api-surface-area bank, but I'd like to see the minimal version of the ipfsx api get released into the wild before we explore addding these.
Anecdotally, the code I'm writing has to deal with the requirement "let the user add 1 or more things" so it ends up being more convenient to always pass in an array, rather than handling the special case of a single item. I'm sure there are cases where having a streamlined "add one" api would be nice, but I'm not sure it warrents the extra surface area yet, as I'm not running into that need.
Perhaps this should be decided just on "what is easist to teach?". For the purposes of deciding on this issue, I should clarify that I am essentially an abstain, as I think there are pros and cons to either.
I personally like the idea. Although I agree with @vmx and @olizilla. I think that it isn't necessarily bad to always return an iterable on .add
. If the API for .add
is to receive one or more things to add, I wouldn't expect it to return different types depending on the arguments.
Iβm +1 on degenerelizing APIs as in practice itβs a lot easier to reason when reading such code and type checkers (if you use one) are generally more helpful than if you accept arbitrary inputs and produce output depending in the input passed.
I would even go as far as suggest to ditch add many, unless it does some kind of optimization I fail to see. IMO it fallls in a realm of utility functions that are agnostic of IPFS and itβs fairly useful to have such combinator library independently
To further elaborate my point - I think different strategy may make sense in different instances sometimes prallel puts and other times sequential and everything in between. Backing put many will likely end up choosing one strategy and in consequence make any other startegy a second class
I would even go as far as suggest to ditch add many, unless it does some kind of optimization I fail to see. IMO it fallls in a realm of utility functions that are agnostic of IPFS and itβs fairly useful to have such combinator library independently
I like this approach and think it keeps the api simple. If there are methods where we'd recommend parallelization or serialization for arrays, I think that is something that could be added to the docs for that method. jsdoc examples could be a really useful here.
I would even go as far as suggest to ditch add many, unless it does some kind of optimization I fail to see
Ok so, adding multiple/many allows you to create a directory structure, i.e. unixfs directory nodes will be created automatically with links pointing to directory contents. So, for example adding something like this:
ipfs.add([
{ path: 'root/file1', content: fs.createReadStream(/*...*/) },
{ path: 'root/dir/nestedfile', content: fs.createReadStream(/*...*/) },
{ path: 'root/file2', content: fs.createReadStream(/*...*/) }
])
Would yield something like:
[
{ path: 'root/file1', hash: 'QmHashFile1' },
{ path: 'root/dir/nestedfile', hash: 'QmHashNestedfile' },
{ path: 'root/dir', hash: 'QmHashDir' },
{ path: 'root/file2', hash: 'QmHashFile2' },
{ path: 'root', hash: 'QmHashRoot' }
]
And then you can use QmHashRoot
to address content like:
/ipfs/QmHashRoot/file1
/ipfs/QmHashRoot/dir/nestedfile
/ipfs/QmHashRoot/file2
The only way to achieve this same structure (without the ability to add multiple) would be to manually create the protobuf DAG nodes that will act as the directories and add them to IPFS using the DAG API.
Let me fist state that I totally understand that some of the API decisions might be irreversible at this point or too costly to be worth it. That being said as I was asked for opinion let me elaborate all on the concerns I have with current API and propose a way they could be addressed even though I understand doing some of that may not be visible for various reasons.
ipfs.add([
{ path: 'root/file1', content: fs.createReadStream(/*...*/) },
{ path: 'root/dir/nestedfile', content: fs.createReadStream(/*...*/) },
{ path: 'root/file2', content: fs.createReadStream(/*...*/) }
])
//=>
AsyncIterable.of([
{ path: 'root/file1', hash: 'QmHashFile1' },
{ path: 'root/dir/nestedfile', hash: 'QmHashNestedfile' },
{ path: 'root/dir', hash: 'QmHashDir' },
{ path: 'root/file2', hash: 'QmHashFile2' },
{ path: 'root', hash: 'QmHashRoot' }
])
foo(suff, ...other) => { .... ipfs.add(stuff) ... }
I'm forced to do bunch of runtime checks on stuff / return value to figure out what hash do I need. Separate APIs for batch add and single add does help for single file but not necessarily with batch. n
-th item ? Does the whole operation fail or does prior ones still succeed ? I hope it's atomic otherwise user needs to deal with the fact that it's partially succeeded, which means figuring out what succeeded and what failed etc...
Ok so, adding multiple/many allows you to create a directory structure, i.e. unixfs directory nodes will be created automatically with links pointing to directory contents. So, for example adding something like this:
I would argue alternative API might be able to accomplish the same goal without raising all of those concerns e.g.:
const batch = ipfs.batch()
await batch.write("root/file1", stream1);
await batch.write("root/dir/nestedfile", stream2);
await batch.write("root/file2", stream3);
ipfs.add(batch) // => 'QmHashRoot'
If you do really want to capture all the CIDs it might be better to return instance like following instead:
class CID {
constructor(cid, links) {
this.cid = cid
this.links = links
}
toString() {
return this.cid
}
get(path) {
for (const link of links) {
if (link.path == path) {
return link.hash
}
}
return null
}
}
Note that above is clearly atomic operation that either succeeds or fails. User does not need to do any mapping between inputs and outputs and most of the time can use return value as is. However there is still a chance that return will not contain all entries.
There is still one use case that I think above API does not address, which is does not allow tracking a progress. If allowing that is a goal, there is yet another alternative that can be considered to address that as well. Which is instead of returning Promise<CID>
you could instead return something like this:
interface AddResult {
result:Promise<CID>
progress:AsyncIterable<{ path:string, cid:CID }>
}
I think it's also worth considering that there are web APIs that have batch semantics like FileList
and FormData
APIs. So technically you don't even need ipfs.batch()
you could instead do following:
const bundle = new FormData()
bundle.append("root/text", "HelloWorld")
const response = await fetch(myURL)
const blob = await response.blob()
bundle.append("root/dir/nested", blob)
ipfs.add(bundle)
Unfortunately FormData
, File
and Blob
APIs don't yet support ReadableStream
s but I'm confident they will soon enough, furthermore I'm not sure it really matters here either way.
Consolidated my thoughts on API design here https://gozala.hashbase.io/posts/Constraints%20of%20an%20API%20design/
Thanks @Gozala - lets continue the IPLD API design convo here though! https://github.com/ipld/js-ipld/pull/191
Here in IPFSX adding content always returns an iterator, which is a nice consistent API, but when adding a single file calling
.first()
or.last()
on the iterator feels a little clunky and it seems to me that the more usual case is actually adding a single file, so I'd like to keep that API as simple and clean as possible.I've ruled out:
Promise
if we detect a single item is being added and anIterable
otherwise. This is too weird, and I think people would get confused.Promise
that is monkey patched to also beIterable
π±. This makes me feel weird, and it's not obvious what the promise would resolve to when adding multiple and I think people will accidentially await when they should be iterating i.e. its also confusingPromise
that resolves to anIterable
if adding multiple - similar confusion ranking π. There's also no need to force execution onto the next tick if we're going to resolve immediately to an async iterable.So, what if this instead:
ipfs.add
(single, returns a Promise) &ipfs.addEach
(multiple, returns an async iterator)ipfs.add
accepts aBuffer
orIterable
that yields buffers and returns aPromise
that resolves to a CID e.g.ipfs.addEach
accepts an iterable (optionally async) where each yieldeditem
is a file to be added. Eachitem
can be aBuffer
orIterable
that yields buffers or a{ path, content }
object. It returns an async iterable of{ cid, path }
e.g....and pushing this out to other API methods that write DAG nodes:
ipfs.block.put
&ipfs.block.putEach
ipfs.dag.put
&ipfs.dag.putEach
I want to keep the API surface area as small as possible so this is a step away from that. In general I believe that functions should be permissive in what they accept and consistent in what they return. However, we're not really talking about that here, what we're talking about is splitting on difference in operation -
ipfs.add
behaves very differently when you're adding multiple files than it does when adding a single file and combining the two together makes it very difficult to return a consistent value with a "nice" API for both cases.Interested to hear opinions and alternatives and settle on the most convenient and best API. Adding stuff programmatically to IPFS needs to be way easier. I think what we have here in IPFSX is a good start but I'd like to make it even better!
cc @olizilla @lidel @hacdias @achingbrain @hugomrdias @daviddias @jacobheun @vasco-santos @mikeal @vmx @gozala (plz mention others if I've missed anyone that might be interested...)