The naming of Node's single step "traversal" methods

ipld / go-ipld-prime

Golang interfaces for the IPLD Data Model, with core Codecs included, IPLD Schemas support, and some handy functional transforms tools.

MIT License

132 stars 50 forks source link

The naming of Node's single step "traversal" methods #22

Closed warpfork closed 5 years ago

warpfork commented 5 years ago

The ipld.Node interface currently has a method TraverseField(key string) (ipld.Node, error). This has been sufficient to get going, but there are several issues with this: it's verbose; it always takes a bare string (which is sometimes not going to be ideal); and the name "field" is usually only used in the context of structs, but we're also using it here in describing maps, which is thus awkward. Should we rename this? Furthermore, should we have several methods here?

It would seem we want all of these methods at some point, and since Go eschews parametric polymorphism, they all have to have distinct names:

TraverseByString(fieldname string) -- for use with maps
TraverseByIndex(idx int) -- for use with lists
TraverseBySegment(segment IntOrString) -- for use when you've got pathsegments (selectors do this rather than the above two)
TraverseByNode(node Node) -- for generic use that's not segment-powered (retains typed.Node reference if applicable: preferable if you've got a struct-represented-as-string as a map key, or anything that has a Validate() method on it that would need to be reapplied if we forgot about associated type info).
Traverse(x ConcreteGeneratedType) -- for use when codegen is in flight (similar advantages to TraverseByNode).

If that above fivesome goes through, the first four would on the Node interface now (contrast currently only two) (the fifth is only for codegen).

Questions:

Is that fivesome correct?
Can we come up with shorter names? "Traverse" is a bit wordy when using it so much (and, I tend to think of "traverse" as being a multi-step process (though I figure on renaming any multi-step functions to be "walk", also, to avoid ambiguity)).
Alternative phrasing for the "ByFoo" suffix would also be welcome.

I've been thinking about this (but pushing it off) for a while now, but if something is to be done here, it should probably be a clean break soon, before we start getting too many additional things depending on this API.

warpfork commented 5 years ago

By the same logic, MapBuilder.Insert(k, v ipld.Node) error also should have a variant taking primitive args for convenience: MapBuilder.Insert2(k string, v ipld.Node) error. Similar naming questions apply there, then.

rvagg commented 5 years ago

Is it really necessary to have TraverseBySegment(segment IntOrString)? Or conversely, doesn't that one serve the same purpose as TraverseByString and TraverseByIndex combined? Seems like redundancy, or at the least, an API documentation smell.

Can you explain TraverseByNode and Traverse a bit more? They seem like different beasts to the other 3, perhaps inverting the control somehow? Maybe they deserve a separate name?

Re "traverse", it's the word I've been using for my single-node traversal algorithms, it makes sense to me. Although I do tend to have things called "traversals" that I operate on so that might be a difference for you - a traversal is made up of individual steps, each executed separately. Unless you're interested on coming up with a concrete Traversal object that performs the traversal by interactive steps, then I don't see a major problem with the language here, it's just that your multi-step traversal operations are made up of of atomised traversals.

warpfork commented 5 years ago

TraverseBySegment does seem awfully redundant.

The main use of it is that Selectors return PathSegment (aka IntOrString) from some of their methods.

If we didn't have a PathSegment type over there, then Selectors would have a bunch of very redundant looking methods instead. So I'm afraid it's "six of one, half a dozen of the other" -- we have something that looks a bit ugly in one place or the other and we just get to choose which. Given such a choice, I figure I'd rather put it in the place where it offers more reuse: which means making PathSegment available in the main interfaces, rather than sequestering it off in the traversal/selector packages.

TraverseByNode and Traverse are different beasts but they aren't inversions at least :)

Imagine you have some types like:

type SnazzyKey struct {
    frob String
    noz String
} representation stringjoin {
    join ":"
}
type SomeValue struct {
    # ...
}

type SnazzyMap {SnazzyKey:SomeValue}

If someone uses TraverseByString on a SnazzyMap, the implementation has to parse that string into a SnazzyKey. That's fine (it's certainly well-defined!) but if you already have a SnazzyKey instance in your hands, converting it into a string just to feed it into TraverseByString, which just parses it again... well!

Instead, by handing your SnazzyKey (which of course implements Node) to a TraverseByNode method, we can do a lot less unnecessary work. The TraverseByNode method just does a quick cast to see that it's handling a typed.Node (cheap), asks it for its schema.Type (very cheap), checks that its equal to its own expected key type (very cheap, literally a pointer equality check), and then goes "yep" and does the lookup.

If working with codegenerated implementations, then Traverse with the $GeneratedTypeNameHere -- it would come out func (*SnazzyMap) Traverse(SnazzyKey) SomeValue in this example -- even gets to skip some of the casts and schema.Type equality checks, since its effectively hoisted those into being the Go compiler's problem, so it's faster and cheaper yet. (Also, since it's codegen, it can return the concrete generated type too, instead of being limited by interfaces which must return a bland ipld.Node -- which makes using it in application logic significantly more friendly to the programmer.)

warpfork commented 5 years ago

re: the naming prefix overall: I wonder if just plain Get wouldn't be better.

Terse is good. It's a common operation.

'Get' is familiar and common in many communities when describing handling maps.

This seems to look okay:

GetByString(string)
GetByIndex(int)
GetBySegment(PathSegment)
GetByNode(Node)
Get(ConcreteGeneratedType)

warpfork commented 5 years ago

Alternatives to *By* might be *Via*, or *With*:

GetViaString(string)
GetViaIndex(int)
GetViaSegment(PathSegment)
GetViaNode(Node)
Get(ConcreteGeneratedType)

rvagg commented 5 years ago

:thumbsup: it's reminiscent of a standard map operation in most languages which is basically what it is in an abstract sense. Also I'd prefer By. I suspect that a newcomer would grok Get more than Traverse.

At the moment TraverseIndex and TraverseString exist but these others don't, they are just TODO items or am I looking in the wrong place?

warpfork commented 5 years ago

Indeed. TODO items.

warpfork commented 5 years ago

I'll probably act on this soon -- in the next couple of days. These are pretty foundational shakeups, and although fortunately they'll all be "sed refactors" in complexity, they nonetheless show up everywhere, so it's desirable to get them done before there's too many users.

In particular, it will be really good to get these done before graphsync and its transitive users start using these interfaces heavily... however, (ironically?) I'm waiting for some of the other Selectors branches to land before going ahead one way or the other.

warpfork commented 5 years ago

LookupBy* (or even just Lookup*) might be another alternative to GetBy* to consider.

(I started actually trying these, and when I started seeing examples... n.GetByIndex(0).GetByIndex(0).AsString() really stutters, in my mind. Whereas -- though I can't put my finger on why -- n.LookupIndex(0).LookupIndex(0).AsString() seems to read just fine.)

Fully worked:

LookupStr(string) (Node, error)
LookupIndex(int) (Node, error)
LookupSegment(PathSegment) (Node, error)
LookupNode(Node) (Node, error)
Lookup($GeneratedTypeKey) ($GeneratedTypeValue, error)

There's one more awkward bit in this set, though. LookupString and LookupIndex and LookupSegment all sound fine; but LookupNode sounds odd. I think this is because "Node" is also of course the return type, which adds some awkwardness to the vibe. LookupByNode doesn't have the same weird vibe, but a combo-breaker of the pattern would be also bad. So, a different idea entirely for name of LookupNode might make this set nicer. (LookupTyped? Not entirely accurate, because it certainly wouldn't require typed.Node as a param type; but also, it's the reason it exists and one would probably prefer LookupString in any other usecase. Other ideas, maybe? Perhaps changing tack and giving the generated code a longer name, and saving Lookup for the method working on Node?)

warpfork commented 5 years ago

Okay, one more take:

LookupString(string) (Node, error)
LookupIndex(int) (Node, error)
LookupSegment(PathSegment) (Node, error)
Lookup(Node) (Node, error)
Get($GeneratedTypeKey) $GeneratedTypeValue

This solves the weird vibe by replacing LookupNode with just plain Lookup (abandoning the earlier thought that "we should reserve the shortest name for the codegen'd path, which is where all the happiest UX will happen anyway")...

And then it also solves the codegen'd name question by taking it a different direction. Notice the return tuple also is different for that one: it doesn't need to return a distinct error value. There's no need for errors like ErrInvalidKind on a codegen'd method like that; and we can always represent "not found" unambiguously (either as a native nil (because even if the map value has the nullable modifier, those return a null Node, not a native nil), or as ipld.Undefined (though I learn towards the earlier option; Undef seems best if reserved for describing struct fields)). So, this structural difference in the method signature gives us justification for a different name pattern entirely (like jumping over to Get or something). I've put Get here; but codegen details are questions for "later" anyway, so we can schedule this for a revisit when it comes up again.

Retracted LookupStr shortening in favor of LookupString. It seems un-gopherly to try to shave three letters in an exported interface.

I think I'm actually completely satisfied with this one.

(Some of the "By" phrasing might still show up in other places -- e.g. MapBuilder.InsertByString -- since in those cases, well, you can see how eliding the "By" might confuse things. That's a shortcut method we haven't actually added yet, though.)

Thanks to @dmarby for some conversation outside of the githubs which helped refine this!

warpfork commented 4 years ago

(... much later ...)

One more reneg on this has now occurred. :see_no_evil:

Eliding the "By" from the name turned out to be confusing, and I now regard that as having been a mistake. Also, using "Get" for codegen and "Lookup" for everything else turned out to be confusing. Both mistakes are being rectified.

The final result is:

LookupByString(string) (Node, error)
LookupByIndex(int) (Node, error)
LookupBySegment(PathSegment) (Node, error)
LookupByNode(Node) (Node, error)
Lookup($GeneratedTypeKey) $GeneratedTypeValue

These will become the exported interfaces in v0.5, which should be tagged presently.

(Happily, all the changes are just renames. Semantics stay the same!)