Open warpfork opened 4 years ago
I'm not sure I'm bothered by this.
Selectors are written to work on the logical Node level, not the physical Block level. (I'm fairly confident this is correct and what Selectors are supposed to be. :))
What the logic in this PR we're reviewing is concerned with is... mostly on the physical Block / plain-ol'-bytes level. It makes sense, then, that it's doing a lot of work in the Loader function.
But being primarily concerned with the physical Block layer is "odd", in the point of view of selectors overall. Correct; but not in the path of the main thing selectors address.
(And part of how I mentally define "the main thing selectors address" is by considering a CLI tool that lets you apply a selector to some data: it's very easy for that CLI tool to yield you a bunch of JSON (or whatever) or a Path for each thing it selects; it's very less trivial to give a CLI tool a loader function that does as much stuff (and certainly not side-effect-accumulating stuff!) as this example does.)
So, using the link loader function to do a significant amount of side-effecting(!) work is... clever. It definitely works and it's definitely not wrong. But to bring this train of thought back to the question of this issue -- is this empty visit func a design smell -- a clever and effectful Loader is also the only way that having a completely empty visit func like this would ever not be a bug.
I'm not sure if it would make more sense to design a new API that takes a callback of similar signature, but promises to be only called on block edges. Maybe that would result in slightly clearer code, by drawing the eye more to that callback. But I'm not sure the intent would be realized: if one wants to do something with the raw byte stream too (this technique of using the loader function like this emerged in graphsync, where that drill across abstractions is actually Excellent), one still does the Loader trick. Then we have two ways to do the same thing, one of which is more powerful than the other and the other is just window dressing?
And further examination of the logic in the Loader function in this example shows: yep, there's some concrete references to size
and counting of bytes, and a bunch of other stuff that's very concretely tied to the storage medium, so... yeah, I think this example is one where a different callback API wouldn't make much difference: I really don't think there's any way this could possibly be written except with code coming in via this kind of Loader interface.
One way this could've played out differently is if we had a WalkDummy
method that just didn't take a visitor function argument at all, and had louder documentation that "hey, you should probably look at the loader logic, because unless that's got magic, this function is pretty useless". Is that worth it, though?
Are there any other interfaces I'm not thinking of that would've made this clearer to read?
One other interesting thing to note here is that we might be doing some work to set up the traversal.Progress
info to hand to the visitor function, and if the visitor function is a no-op dummy thunk, then that is a waste of time.
But I'm not sure that's anything we fix by getting rid of the callback; the biggest cost of assembling that info is (pprof has told me recently, in another investigation) setting up the Path
object... which we also do for the LinkContext
that's handed to the Loader
. So we're not getting away from that any time soon.
(I am interested in some possibly aggressively different APIs in the future which could address this; it's just... it seems unlikely to me that that will be an incremental step. Seealso: https://github.com/ipld/go-ipld-prime/issues/45)
If we do stick with anything like the current design, more docs and an example of using Loader like this would be a good addition to the repo, that's for sure. This pattern does seem to come up often enough.
side note on this:
much of the work I’ve done (Graphsync, CAR) etc operates at the block level, which means I’ve done a bunch of overloading of link loaders.
also, because IPLD prime is not spread throughout most of the code base in say, go-filecoin, there’s not much we can do with the visit functions.
I don’t think it’s a long term problem neccessarily, because it will probably change over time as IPLD prime becomes a default, but it is a bit odd
I do strongly agree that if we don't change any of this, we should make clear that modifying link loaders is THE way to do block level work.
I don't really have an opinion on this but appreciate seeing thoughts from both of you, very helpful!
Is the Selector API... good? Let's do a quick case study.
This PR has an interesting example: here, we see a visitor callback that does... nothing:
https://github.com/ipld/go-car/pull/18/files#diff-f155015dbcffd14f48426f0cc3ddab92R245
... because all the real work is going on in the Loader:
https://github.com/ipld/go-car/pull/18/files#diff-f155015dbcffd14f48426f0cc3ddab92R194-R220
... which is quietly wired in over here:
https://github.com/ipld/go-car/pull/18/files#diff-f155015dbcffd14f48426f0cc3ddab92R242
The question is:
Is this do-nothing visitor concerning, and indicative of an API design UX flaw?
If so, what are options for improving the situation?
(Brought up by @rvagg , I just wrote it up for github.)