pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.31k stars 636 forks source link

Explore replacing monomorphization with implicit conversion #11269

Closed stuhood closed 1 year ago

stuhood commented 3 years ago

We currently "monomorphize" @rules in the RuleGraph by creating a distinct copy of a @rule per set of Param types that it might be used with. This is quite complicated, and a lot of time has been invested in making it "mostly work"... but the implementation is not perfect, and we've needed to apply constraints that can be difficult to reason about.

A motivation for monomorphization is that it avoids needing eq or hash implementations for positional arguments to @rules which are computed from Params, rather than being Params themselves. That is of questionable value relative to the complexity it incurs. And additionally, because monomorphization creates additional copies of @rules, it's likely that we are memoizing less than we could be.


An alternative to monomorphization would be to instead treat the preparation of the positional arguments to a @rule as similar to "implicit conversion" (a native concept in Scala, and possible via the Deref trait in Rust). This would involve optionally recursing to do "type conversion" via other rules (such as from Addresses to Targets) before invoking a @rule. For the purposes of these conversions, it's likely possible to keep a global lookup table of conversions that are possible, rather than actually encoding the conversion as a dependency of the RuleGraph entry.

Eric-Arellano commented 3 years ago

A motivation for monomorphization is that it avoids needing eq or hash implementations for positional arguments to @rules which are computed from Params, rather than being Params themselves.

Certainly reasonable to require __hash__ and __eq__ for every argument to the @rule.

This sounds like an overall nice simplification. I have personally been confused by the idea of a Param (capitalized) in the past, and more unification would make things easier to reason about.

stuhood commented 3 years ago

I started working on breaking out a "runtime" component of this issue, and I realized that how you would do that ends up more directly guiding how "compiletime" might be implemented. See below.


The runtime representation that we expect to have after this change is that the identity of a Task node in the graph becomes:

pub struct Task {
  // The _precomputed_ positional arguments of the @rule.
  arguments: Vec<Keys>,
  // The filtered set of Params which are relevant to `Get`s in the body of the @rule.
  // NB: This will be a smaller set than today, because the Params used to compute the `arguments`
  // field will have been removed.
  params: Params,
  // As today: a reference to the @rule.
  task: tasks: Task,
  // A "smaller" computed `Entry`: see below.
  entry: ..,
}

To compute this smaller Task identity, you'd move some of the logic that's currently in the body of the Task into a factory function for the Task node (or into the Select node).

After computing the arguments, you could prune out any Params that had been used to compute the arguments, leaving behind only the Params that were required to compute Gets in the @rule body.

All of this is fine: but an interesting bit (and the connection to "compile time") is what to do about the rule_graph::Entry in the Task. Since we have already computed the arguments, that portion of the Entry can be pruned out as well, leaving only the edges needed for the Gets in the body. And that is interesting, because it points to a compile time strategy where we would essentially split the Entry into:

In short: it's possible that it's easier to reason about the "argument preparation" steps as a separate node in the compilation graph or specialized edge types perhaps.

stuhood commented 3 years ago

Have made some progress on an implicit-conversion based implementation of RuleGraph construction... lots of work still to do, but no (unexpected) roadblocks so far. 🤞

stuhood commented 2 years ago

Note to self: a "union find" datastructure seems perfect for what I had already been calling "unifying" the solutions to various nodes: https://gist.github.com/cfbolz/ba9b4a9a54e6620b9eb86a213cc6d272

stuhood commented 1 year ago

After revisiting this last month, I believe that fundamentally simplifying graph solving is the path forward. Closing this one in favor of https://github.com/pantsbuild/pants/discussions/18905.