enable plugins to declare pooled/deferred traversals

abathur commented 6 years ago

Laying this out over a few focused comments for digestibility.

I'm curious about enabling plugins to declare traversals they're interested in during a setup phase, pool those traversals across plugins, and execute a single walk to satisfy them in a later phase.

abathur commented 6 years ago

Background/Motivation Through a few long-term projects, I've been looking at pain points in editing/revising/maintaining large documentation/technical texts, and working on integrating (non-unified) lint/processing tools to ease as much work as possible. I'm used to waiting for builds, but I've found feedback loops elongated by tool run/build times to be disruptive and imagine less technical users will find them even more so.

This idea is primarily motivated by the potential to improve performance by reducing another form of cross-tool duplicate logistical work (Unified already trims a few). This is admittedly speculative; the tools I'm already using are doing a fair amount of overlapping work, but I may be overestimating how much runtime the overlap represents. I think demonstrated improvement is a key measuring stick for any proof-of-concept (willing to pull one together; not sure about timeline, though).

Beyond the primary motive, I suspect a declarative traversal pattern will help separate what a plugin does from what the plugin does it to, which might make it easier to re-use plugins in contexts that the plugin writer didn't anticipate.

abathur commented 6 years ago

FWIW, interested in doing the same/similar with tokenizing, but it's probably off topic here (I think traversal makes a cleaner test-case, and it might be one or more separate plugins anyways).

abathur commented 6 years ago

Implementation Haven't considered it thoroughly, but (with the caveat that my knowledge of unified is a little limited) I wanted to document a few things variously from my own notes or discussion with @wooorm on gitter:

If I’m thinking of how to solve this, I’d say an addition to allow attachers to return such a visitor definition, instead of a transformer function. But I see several unresolved problems:

a) can those visitors be async? b) how are they pooled? What if two attachers return definitions, and another a normal transformer, are the first two pooled, and then the third runs? c) I’d rather not add more weight to unified if it isn’t useful for everyone

Maybe something like this could be solved in userland, through a plugin that creates a pool, then other plugins can opt-in, and the pool plugin can “releases” the whole pool at the end of the process?

I agree with doing most or all of the work in a plugin for now, and letting utility/interest drive any decisions from there.
Not certain it's feasible, but I've wondered about working out an attacher pattern that supports both imperative and declarative traversals with the same code (maybe by passing either the imperative or declarative traverser to the plugin). I think this would encourage plugins that support both modes, letting the consumer pick. If the consumer and not the plugin author is making the final call, they can add an explicit call to run whatever is in the pool at whatever point in the chain it makes sense.
@wooorm has previously pointed out that babel-traverse seems to support collecting a set of traversals. I found an example that seems to confirm this, so I was planning to start there for any proof-of-concept work.
I'm not sure what the async implications are yet. In the short-run it may hinge on what babel-traverse supports; I've found docs/discussions on it a little thin, so far.
I think textlint's rules/plugins have a declarative visit specification format, so it's at least possible they have issues/commits/documents that touch on design decisions/pitfalls/etc.

wooorm commented 6 years ago

Well, for starters, we could create a new plugins that adds this to remark-lint, and tests it out there.

Traverse

I’m thinking of a unist-util-traverse, that’s like unist-util-visit, but is passed a stack of parents (like unist-util-visit-parents).

I’d like a signature like so: traverse(tree, schema[, visitor])
If schema is a string and visitor a function, it’s treated as a schema of {[schema]: visitor}
If schema is an object, it’s keys are treated as paths to traversers.
Paths are string, and are selectors supported by (an updated version of) unist-util-select
If traversers is an object with no length key, it’s treated as an array of traversers ([traversers])
For every traverser in traversers...
- If traverser is a function, it’s treated as a traverser of {enter: traverser}
- If traverser is an object, it can have an enter and an exit key, mapped to stops (functions)
Before traversing, all paths are compiled into some intermediate spec (just compiled functions)
If a path partially matches, the spec is updated to dig deeper
If no partial paths match, the branch is not further traversed
If a path completely matches, it’s corresponding enter stops are invoked with useful info, the node that matches, the stack of parents?
(?) Enter stops could return an EXIT (stop traversing for this traverser altogether)
(?) Enter stops could return a SKIP (stop traversing this branch for this traverser)
...

Pool

Not many ideas on this now, other than that it creates one unified schema from pool(tree, schema[, visitor]) calls
Finally flushes the pool and and calls traverse(tree, combinedSchema)?

Further thoughts

How would mutating the tree work?! Omg. That’s hard.

ChristianMurphy commented 3 years ago

Thanks for starting the discussion @abathur ! We're in the process unifying ideas in with discussions https://github.com/unifiedjs/collective/issues/44 If you'd like to continue this thread, or start a new one https://github.com/unifiedjs/unified/discussions will be the home for ideas going forward.

unifiedjs / ideas