Closed wking closed 9 years ago
Thinking this over some more, we probably want a pre-chunk/trickle callback as well as the current post-chunk/trickle NodeCallback
. That would let us handle functionality like ignored files (#1232) without needing to alter the adder API or chunk/trickle files that we know we're going to ignore ahead of time. The signature for the new callback would be something like:
type PreNodeCallback func(p *path.Path, f *os.File) (ignore bool, err error)
I considered sticking with nOut *dag.Node
instead of ignore bool
to give folks more flexibility (e.g. “use nOut
instead of chunking/trickling that path yourself”), but I can't think of an easy way to distinguish “please ignore this path” from “please handle this path using your usual chunking/trickling” if we're returning just nOut
and err
. So this callback just returns a Boolean distinguishing “please ignore” from “please chunk/trickle as usual”, and we can add an additional nOut
return if we find a use case for a callback-supplied substitute node.
Hrm, it seems the comment i thought i left didnt make it.
a few things, the functionality you are referring to as 'trickle' is the layout. the trickledag is a specific layout implementation that lays out blocks in a pattern reminiscent of a recursive binomial heap.
I'm also very curious what the reason for taking an os.File
is, what advantages does that have over an io.ReadCloser
?
On Wed, May 27, 2015 at 10:44:32AM -0700, Jeromy Johnson wrote:
the functionality you are referring to as 'trickle' is the layout. the trickledag is a specific layout implementation that lays out blocks in a pattern reminiscent of a recursive binomial heap.
Ok, so trickle → layout.
I'm also very curious what the reason for taking an
os.File
is, what advantages does that have over anio.ReadCloser
?
As I tried to explain in the comment for NodeCallback:
… you can use it to extract metadata about the visited file including its name, permissions, etc.
Can you suggest more obvious wording?
@wking
f
is not an os.File, but stdin? or another reader? i think this should use io.Reader
instead of os.File
. or, what use case are you envisioning? (we could have a way to cast)What might be some example calls that use NodeCallback
in interesting ways? they may motivate its design to the reader.
Add
should take in an io.Reader
, and os.File
should be separated to AddFile(...)
. io.Reader/Writers
are waaay more common in golang than os.Files. think about network io, and all sorts of other things. the most common thing to do is read from an io.Reader
, not an os.File
.We need a way to get information about progress of a running addition back to other goroutines. Choices for this include the channel messages proposed in #1121 or additional arguments to a per-chunk callback like that proposed in #1274. The main difference between a callback and a channel is whether or not we want synchronous collaboration between the adder and the hook. Since I think we want the option for synchronous collaboration (e.g. optionally inserting a metadata node on top of each file node). For situations where asynchronous communication makes more sense, the user can provide a synchronous callback that just pushes a message to a channel (so the callback-based API supports the channel-based workflow).
Maybe the NodeCallback can do the channel stuff:
type nodeElem struct {
node *dag.Node
err error
}
sendchan := make(chan nodeElem)
recvchan := make(chan nodeElem)
unixfs.Add(ctx, n, r, func(nIn *dag.Node, p *path.Path, f *os.File, top bool) (nOut *dag.Node, err error) {
sendchan <- nodeElem{n}
nOutElem <- recvchan
return nOutElem.node, nOutElem.err
})
Actions like wrapping an added file in another Merkle object to hold a filename is left to the caller and the callback API.
SGTM! :+1:
These should look just like the high-level API, except instead of passing in an IpfsNode and using that node's default DAG service, trickler, and splitter, we pass each of those in explicitly:
There's an important observation to make about the Shell, Core, and other APIs. Whatever we do, we need:
So far, we've been calling both 1. and 3. "Shell". We should make a distinction, and also account for how some of this will be implemented, given that the complexity of implementing 3. may impact the implementation of 1. and 2. Meaning, that however we decide to do 3., may make us move from:
// pass in a node
func Add(n *Ipfs.Node, ...) ...
to
// methods on the node
type Node interface {
Add(...) ...
}
I prefer the way above, because it allows more commands to be added on top of a very basic thing (the Node
) without requiring changes to the node. ultimately, the operations must happen to the node itself, and that may happen in another process entirely from wherever the user is calling a function like:
unixfs.Add(ctx, n, r, ...) // this is going to happen over RPC.
We don't currently have a public Trickler interface, but I think we should add one so folks can easily plug in alternative trickler implementations.
I think we should call this "indexing". "TrickleDag" is the one specific implementation @whyrusleeping came up with. "indexing" is more general and describes various ways to index data/objects. this lends itself well to talking about databases, and database internal datastructures for better access to data at the leaves.
Thinking this over some more, we probably want a pre-chunk/trickle callback as well as the current post-chunk/trickle NodeCallback. That would let us handle functionality like ignored files (#1232) without needing to alter the adder API or chunk/trickle files that we know we're going to ignore ahead of time. The signature for the new callback would be something like:
We need a chunking interface. I started on this a few weeks ago. Take a look at this: https://github.com/jbenet/go-chunk
it's very WIP.
I'd like to do this independent of IPFS as it could be useful to other projects, and other languages. (e.g. implement rabin fingerprint chunking everywhere).
@whyrusleeping
a few things, the functionality you are referring to as 'trickle' is the layout. the trickledag is a specific layout implementation that lays out blocks in a pattern reminiscent of a recursive binomial heap.
Oh layout works too! (see comment about indexing above.). Also, this is actually a graph topology. ("the way in which constituent parts are interrelated or arranged.")
… you can use it to extract metadata about the visited file including its name, permissions, etc.
I think this is special casing. this is not 90% of cases. 90% of cases just want an io.Reader. (see my comments in this post above). We should have the os.File
case, but it's the special case, not the common op.
Possible that all this could be fed to the callback anyway though... :
f := os.Open(...)
// suppose Add takes an io.Reader
unixfs.Add(ctx, n, r, func(nIn *dag.Node, r io.Reader, top bool) (nOut *dag.Node, err error) {
// have f!
})
For large files and directories that are chunked and trickled,
Ah, ok. i see what's up. So-- I think that an Add
porcelain call which can take a path to a directory, is really a whole different thing than an Add
plumbing call that takes an io.Reader
only.
We could have a "dealing with os.Files" set of calls that do the magic of dealing with directories and files and so on, but that's very much filesystem specific porcelain on top of the simple idea of "take a stream of data, chunk it, and add it to IPFS".
i think:
// basic unixfs add. uses io. idiomatic. no dirty files.
Add(core.Node, io.Reader)
// fs stuff
AddFile(core.Node, *os.File, ...)
AddDir(core.Node, *os.File, ...)
AddPath(core.Node, path.Path, ...)
// or maybe they're really "import" calls
ImportFile(core.Node, *os.File, ...)
ImportDir(core.Node, *os.File, ...)
ImportPath(core.Node, path.Path, ...)
// or maybe there's "importers"
i := Importer(chunk.Rabin, topology.Trickle)
AddPath(n, s, i)
not sure. @whyrusleeping ?
On Mon, Jun 01, 2015 at 07:29:16PM -0700, Juan Batiz-Benet wrote:
- what if
f
is not an os.File, but stdin? or another reader? i think this should useio.Reader
instead ofos.File
. or, what use case are you envisioning? (we could have a way to cast)
For AddFromReader, NodeCallback's f will be nil. The docs for ‘f’ say:
This will be nil for io.Reader-based adders.
If you have clearer wording in mind, just let me know what to say.
What might be some example calls that use
NodeCallback
in interesting ways? they may motivate its design to the reader.
Ok. The uses I see for it at the moment are pinning (like #1274), wrapping objects in metadata (access modes, permissions, …), and notifying the caller about progress. I don't think it's worth writing those into the NodeCallback docs though. I'd rather just write the pinning and metadata-wrapping callbacks, and then reference those callbacks from the docs.
Along this vein, I think it's probably worth passing an array of NodeCallback (and PreNodeCallback) references to the adders, to make it easy to hook in multiple, independent actions (with some logic to chain the callbacks together, e.g. skip later PreNodeCallback if ignore is true, and pass earlier NodeCallback nOuts into the next's nIn, bailing out if nOut is nil).
- i think the default
Add
should take in anio.Reader
, andos.File
should be separated toAddFile(...)
.io.Reader/Writers
are waaay more common in golang than os.Files. think about network io, and all sorts of other things. the most common thing to do is read from anio.Reader
, not anos.File
.
But AddFile is also about directories, and while those are os.File references too, I don't want to trip folks into thinking AddFile isn't about recursive addition. If that's not convincing enough, I'll go along with this rename though.
For situations where asynchronous communication makes more sense, the user can provide a synchronous callback that just pushes a message to a channel (so the callback-based API supports the channel-based workflow).
Maybe the NodeCallback can do the channel stuff:
type nodeElem struct { node *dag.Node err error } sendchan := make(chan nodeElem) recvchan := make(chan nodeElem) unixfs.Add(ctx, n, r, func(nIn *dag.Node, p *path.Path, f *os.File, top bool) (nOut *dag.Node, err error) { sendchan <- nodeElem{n} nOutElem <- recvchan return nOutElem.node, nOutElem.err })
That doesn't look like it's using NodeCallback for the channel stuff. In fact, I don'tsee you using the callback at all. And you have two global channels? Can you sketch out this idea in a bit more detail?
These should look just like the high-level API, except instead of passing in an IpfsNode and using that node's default DAG service, trickler, and splitter, we pass each of those in explicitly:
There's an important observation to make about the Shell, Core, and other APIs. Whatever we do, we need:
- a thing that has all the knobs and implements all the logic for things, in terms of a generic Node interface. (core)
- a think that offers a very high level API for end user consumers (shell)
- things that implement different ipfs nodes in different execution contexts (e.g. one could make the commands happen over RPC to a node elsewhere, one could do it over a REST API, one could just call functions in-process).
So far, we've been calling both 1. and 3. "Shell".
I think you mean 2 and 3. And having command-line and HTTP APIs (3) that mirror the shell API (2) makes sense to me. Can you give an example of where you'd like to put space between 2 and 3?
Meaning, that however we decide to do 3., may make us move from:
// pass in a node func Add(n *Ipfs.Node, ...) ...
to
// methods on the node type Node interface { Add(...) ... }
I prefer the way above, because it allows more commands to be added on top of a very basic thing (the
Node
) without requiring changes to the node. ultimately, the operations must happen to the node itself, and that may happen in another process entirely from wherever the user is calling a function like:
I don't mind either way, but Ipfs.Nodes are complicated things, so I'd rather not have the core interface be an Ipfs.Node method. Keeping it a function allows you to easily manage the components independently (DAGService, Layout, BlockSplitter). I don't see much functional difference for the shell interface though, so we'll just use whichever feels better for you.
I'd like to do this independent of IPFS as it could be useful to other projects, and other languages. (e.g. implement rabin fingerprint chunking everywhere).
I think lots of stuff in go-ipfs is independent of IPFS and I'd like to see it all split out into separate repositories ;). E.g. DHT stuff and the records implementation shouldn't depend on IPFS, or it should only depend weakly on IPFS. The Merkle DAG implementation is completely independent of any storage considerations and networking protocols, so it could be split out into it's own repository. With Go packages being fairly independent, you get package splitting without needing multiple repositories, but I think having more of a one-to-one mapping between packages and repositories makes it easier to document the APIs and provide targetted testing. Then the central go-ipfs repository just needs to pull everything together in the core and shell APIs and add some integration tests.
Possible that all this could be fed to the callback anyway though... :
f := os.Open(...) // suppose Add takes an io.Reader unixfs.Add(ctx, n, r, func(nIn *dag.Node, r io.Reader, top bool) (nOut *dag.Node, err error) { // have f! })
Again, I'm having difficulty understanding what you intend here. Are you suggesting folks build their callback and scope in a global (or caller-local) *File? If you're worried about a wire protocol for managing the metadata, I'd consider keeping the filesystem → Merkle DAG operations in the local client, and just pushing the serialized Merkle DAG to the local IPFS daemon. I imagine most folks using unixfs nodes will have a filesystem on the creation end and not be building their Merkle filesystem with patch calls (see #1299), but maybe I'm guessing wrong.
For large files and directories that are chunked and trickled,
Ah, ok. i see what's up. So-- I think that an
Add
porcelain call which can take a path to a directory, is really a whole different thing than anAdd
plumbing call that takes anio.Reader
only.
Sure, my Add wrapper is going to be calling AddReader internally (or however we name those functions). The main benefit to using Add when you have a local filesystem is that you get recursion and *File metadata for free. Still, I'd consider everything that gives you explicit control over the chunker and layout ‘plumbing’ ;).
We could have a "dealing with os.Files" set of calls that do the magic of dealing with directories and files and so on, but that's very much filesystem specific porcelain on top of the simple idea of "take a stream of data, chunk it, and add it to IPFS".
I think we agree here, and just have different criteria for the ‘porcelain’ label ;).
i think:
// basic unixfs add. uses io. idiomatic. no dirty files. Add(core.Node, io.Reader) // fs stuff AddFile(core.Node, *os.File, ...) AddDir(core.Node, *os.File, ...)
Why make a distinction between AddFile and AddDir? They have the same signature, so just use:
fileInfo, err := file.Stat() … if fileInfo.Mode().IsDir() { … } else { … }
inside your implementation.
AddPath(core.Node, path.Path, ...)
I don't like this, because:
// or maybe they're really "import" calls ImportFile(core.Node, os.File, ...) ImportDir(core.Node, os.File, ...) ImportPath(core.Node, path.Path, ...)
// or maybe there's "importers" i := Importer(chunk.Rabin, topology.Trickle) AddPath(n, s, i)
not sure. @whyrusleeping ?
I don't see the point to grouping the chunker and layout together. They're already grouped in Ipfs.Node and they're not too complicated on their own, so I prefer an API with them as independent arguments, and then another API that extracts them from the Ipfs.Node.
To keep track of the spec's evolution and make it easier to make inline comments, I've spun this off into an orphan branch with #1321.
it's probably worth passing an array of NodeCallback (and PreNodeCallback) references to the adders
i think this is way overkill. one can chain the calls with one function themselves. you might even provide a utility function that turns N callbacks into one which applies them sequentially, but this function should just accept one callback.
If that's not convincing enough, I'll go along with this rename though.
use io.Reader
.
Can you sketch out this idea in a bit more detail?
just meaning to show how you can feed a callback into .Add that happens to use channels to do all the communication. (i.e. you dont need something chan specific / special)
I think you mean 2 and 3.
ah right, yes, 2 and 3.
Can you give an example of where you'd like to put space between 2 and 3?
not sure what space to put, but these feel like different things.
Keeping it a function allows you to easily manage the components independently
yeah. ok sounds good.
I'd like to see it all split out into separate repositories ;).
amen. seriously. we're close-- need our vendor tool.
With Go packages being fairly independent, you get package splitting without needing multiple repositories, but I think having more of a one-to-one mapping between packages and repositories makes it easier to document the APIs and provide targetted testing.
Yes, I completely agree. This monolithic repo is hard to manage and less productive overall. our stuff becomes hard to reuse, etc. We'll break it up, just need our ipfs-based packaging tool.
cc @whyrusleeping
Then the central go-ipfs repository just needs to pull everything together in the core and shell APIs and add some integration tests.
yeah exactly, and we can have intense per-module tests and run tests more smartly.
I'm having difficulty understanding what you intend here. Are you suggesting folks build their callback and scope in a global
i dont mean a global. it's not func-scoped for simplicity of writing out the example (like the stdlib examples).
and-- what the example expresses is using closure scopes instead of adding a ton of parameters into the function call for everyone. closure scopes allows us to pass params into the callback without needing to register them directly with intermediate calls that dont care about that.
I don't like this, because:
- path.Path is usually used for IPFS paths, not filesystem paths, and
- os.Open is familiar enough that the caller can call it locally.
ok sounds goood. alright, let's use AddFile
for all of it.
On Wed, Jun 03, 2015 at 01:54:03AM -0700, Juan Batiz-Benet wrote:
it's probably worth passing an array of NodeCallback (and PreNodeCallback) references to the adders
i think this is way overkill. one can chain the calls with one function themselves. you might even provide a utility function that turns N callbacks into one which applies them sequentially, but this function should just accept one callback.
Done in e6b7dea8dc.
If that's not convincing enough, I'll go along with this rename though.
use
io.Reader
.
Done in 820b3d80c8.
Can you sketch out this idea in a bit more detail?
just meaning to show how you can feed a callback into .Add that happens to use channels to do all the communication. (i.e. you dont need something chan specific / special)
Do you think we need an explicit example of that in the spec docs? Or can we leave it to the reader's imagination? Or do we need an explicit example, but instead of putting it in the spec itself we can put it in some auxiliary documentation/tutorial?
I'm having difficulty understanding what you intend here. Are you suggesting folks build their callback and scope in a global
… what the example expresses is using closure scopes instead of adding a ton of parameters into the function call for everyone. closure scopes allows us to pass params into the callback without needing to register them directly with intermediate calls that dont care about that.
Ah, that makes sense.
Do you think we need an explicit example of that in the spec docs? Or can we leave it to the reader's imagination? Or do we need an explicit example, but instead of putting it in the spec itself we can put it in some auxiliary documentation/tutorial?
I think an example should exist somewhere. maybe we can use a godoc example for it? but dont care much either way.
@wking sorry maybe i missed it-- the motivation for pre-
and post-
callbacks?
thanks for the e6b7dea and 820b3d8 -- this is looking good
On Wed, Jun 03, 2015 at 02:44:46PM -0700, Juan Batiz-Benet wrote:
sorry maybe i missed it-- the motivation for
pre-
andpost-
callbacks?
Motivation for pre-node callback in 1 and 999cda913c (Add a PreNodeCallback, 2015-06-02). It's basically the same text in both places.
Motivation for the post-node callback is discussed in [2](starting with “The uses I see for it at the moment are…”). I'll write docs for the callbacks I'm imagining and stub in references to them in the API.
On Wed, Jun 03, 2015 at 02:44:00PM -0700, Juan Batiz-Benet wrote:
Do you think we need an explicit example of that in the spec docs? Or can we leave it to the reader's imagination? Or do we need an explicit example, but instead of putting it in the spec itself we can put it in some auxiliary documentation/tutorial?
I think an example should exist somewhere. maybe we can use a godoc example for it?
Ah, interesting. The connection with godoc is documented in 1. 'm not sure how much work it's going to be to write useful examples (we probably don't want to be spinning up IpfsNodes on godoc.org), but I'll poke around and see what we can do.
yeah, likely not worth spending the time mocking the node out
A clear picture of the intended adder API is blocking some of the later commits in #1136. This issue floats my first pass at an adder API that I think strikes a good balance between power and simplicity (mostly by offloading the complicated bits to a per-DAG-node callback).
Where it lives
I'd suggest moving
core/coreunix
tocore/unixfs
to matchunixfs/
andshell/unixfs/
.High-level UI (shell/unixfs/)
Most additions will be recursive and load data from a *File (which can be a directory or a file). Alternatively, the
*FromReader
variants accept a [Reader][2].We need a way to get information about progress of a running addition back to other goroutines. Choices for this include the channel messages proposed in #1121 or additional arguments to a per-chunk callback like that proposed in #1274. The main difference between a callback and a channel is whether or not we want synchronous collaboration between the adder and the hook. Since I think we want the option for synchronous collaboration (e.g. optionally inserting a metadata node on top of each file node). For situations where asynchronous communication makes more sense, the user can provide a synchronous callback that just pushes a message to a channel (so the callback-based API supports the channel-based workflow).
Actions like wrapping an added file in another Merkle object to hold a filename is left to the caller and the callback API.
Low-level UI (core/unixfs/)
These should look just like the high-level API, except instead of passing in an IpfsNode and using that node's default DAG service, trickler, and splitter, we pass each of those in explicitly:
We don't currently have a public
Trickler
interface, but I think we should add one so folks can easily plug in alternative trickler implementations.I'm not familiar enough with Go at the moment to know which arguments are best passed by reference and which should be passed by value. If I was writing this in C, everything except the Boolean ‘top’ would be passed by reference, but I'll have to read around to figure out what's idiomatic in Go.