pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.27k stars 628 forks source link

Declare v2 products on v1 Tasks #4769

Closed stuhood closed 6 years ago

stuhood commented 7 years ago

aka "Remove requirement for tasks to fully hydrate a BuildGraph"


Currently, all pants Tasks (including things like list, filter, etc) implicitly require a complete BuildGraph to be hydrated.

To avoid actually walking the entire filesystem in order to hydrate that graph for tasks that don't examine source globs, the v1 engine uses lazily expanded LazyFilesetWithSpec objects. The downside to this laziness is that it cannot be cached in the daemon, so the v2 engine eagerly expands globs and creates EagerFilesetWithSpec objects, but this means that all file fingerprinting happens eagerly in the v2 engine.


The root cause is that Tasks are not specific about which products they need in order to run: instead, they implicitly require a BuildGraph. list, for example, only requires Addresses for the target spec roots, which are very cheap to compute. And filter only requires un-hydrated target objects (in the current v2 Rule datamodel, this means it only needs UnhydratedStructs).

Exposing a new API from Task that allows it to explicitly declare what v2 products it needs (perhaps with a default implementation that indicates that the Task requires a complete BuildGraph) would allow the v2 engine codepaths to avoid instantiating a BuildGraph for tasks that do not require it. It's also possible that Tasks should be able to expose v2 Rules that can assist in calculating its product requirements.


img_20170803_142713

stuhood commented 7 years ago

I'm placing this in the 1.4.x milestone, although it's possible that it will only be feasible once the 1.5.0.dev0 deprecations have landed.

stuhood commented 7 years ago

Attached an image from the design discussion.

stuhood commented 6 years ago

4973 is related, because this new design for tasks to request products should incorporate --[no-]transitive at a fundamental level.

stuhood commented 6 years ago

Another aspect that we should consider incorporating here is the ability for a frontend task to run in a loop... possibly by turning the execute method into a context manager?

illicitonion commented 6 years ago

@stuhood Can you describe the motivation for running in a loop?

stuhood commented 6 years ago

@illicitonion : I was thinking in particular of hot-reload usecases such as jrebel, play-framework, or mobile simulators: not needing to fork/daemonize the JVM would make a task that wanted to continuously reload a child much easier to implement... on each iteration you'd have a (partially) new classpath to feed to the hot reload framework.

stuhood commented 6 years ago

I was thinking in particular of hot-reload usecases such as jrebel, play-framework, or mobile simulators:

...and LSP servers.

stuhood commented 6 years ago

Alright, I think that this looks eminently feasible.

The proposed initial API (can add more selector types in future) on a Task is:

  def address_selectors(cls):
    """Selectors for the root `Spec` subjects of a run.

    Only one of either `def prepare` or `def address_selectors` may be declared.

    :API: experimental
    """

The restriction that allows us to avoid walking the task graph to determine whether we need to compute a BuildGraph (which would trigger a bunch of subsystem creation and etc) is that only one of def prepare or def address_selectors may be declared (we'll enforce this).

Pre-fork, we will warm the address_selectors for all root tasks/goals. If all root goals declare def address_selectors rather than def prepare, then we know that there are no v1 task dependencies, and we will not warm a BuildGraph.

Post-fork, we will re-compute the above "no v1 tasks" property to skip actually constructing the BuildGraph. The build_graph, target_roots, and maybe other properties of pants.goal.Context will become optional, with properties/exceptions to help guide people toward an API that will give them the results of their address_selectors matches.


So, this actually seems relatively straightforward. The only thing is: I don't feel good about diving in here until #4401 is resolved. I'm going to dive into that over the next few days.

cc @kwlzn, @illicitonion, @benjyw, @baroquebobcat

illicitonion commented 6 years ago

As long as this is experimental, sounds good :) I'm sure we'll want to change this as we start working on more migration, but I don't have any suggestions, so let's push ahead

stuhood commented 6 years ago

I have something working here, although it needs a bit of polish. Will post once #4401 is (more) solidly landed.

baroquebobcat commented 6 years ago

Sounds good to me too.

stuhood commented 6 years ago

From #5003 (and other discussion): there is a fundamental question of how root selection will need to change in order for us to succeed in a meaningful way for a command like ./pants test :: (for example) which will match many targets for which we cannot compute a test result (etc). It's also related to how skipping/filtering of roots might work.

I've been thinking for a while that it might be the case that the filter task should be lifted into global options, and that might have something to do with how this is accomplished.

stuhood commented 6 years ago

On #5639, @illicitonion had the interesting suggestion of making frontend tasks a "special" @rule type, and the more I think about it, the more it appeals to me.

The big, big sticking point is that in order for any of the requests the task makes to be memoized, it would need to be running pre-fork. And that's a very high bar... all usage of Subsystems would have to be via #5788, and no other mutable APIs would be safe for usage.

But, jumping over this high bar would also allow us to kill forking for these frontend tasks... and that would be super exciting. And in terms of doing clean-room implementations of tasks (as opposed to ports of existing v1 tasks), this would be a pretty killer API to use: it totally solves the "how do I write a declaration of what some imperative code will need in order to execute?" problem. Getting more excited about it.

stuhood commented 6 years ago

Hm... one unfortunate downside of frontend tasks operating that way would be that in order to avoid interleaving outputs, test.pytest would not be able to operate concurrently with test.junit (for example). If inputs were declared upfront (as described earlier in this issue), all of the dependency @rules for each language would already have run by the time their synchronous portions began.

Certainly for something like ./pants test ::, serializing the various frontend tasks would be a non-starter. Unless I'm still thinking about frontend tasks incorrectly, and per-language support was entirely in @rules, so there was exactly one task that defines test ? But ./pants lint test :: would still be a challenge.

illicitonion commented 6 years ago

It feels like there are two distinct modes that tasks should be run in:

  1. I'm batch running a bunch of stuff, and want to know at the end what succeeded and failed. If you can tell me when each task finishes, that's probably sufficient. Anything you can tell me before then is handy, but not required.
  2. I'm interactive running a small number of targets (probably 1).

and separating the design of the output models here could be useful, and also not terribly difficult, given we already model ConsoleTasks as generators of lines rather than things which write directly to stdout/stderr.

By default, we could batch up the output of any particular task, and atomically output it when it completes. This means things execute in parallel, don't interleave their output, but that we don't get great responsiveness. That's only really a problem for things which you're interactively debugging (e.g. test output, which should be handled by mode 2), or things which are showing progress (e.g. artifact upload, which should be handled somehow from the new UX design)

By in some situations (perhaps if there's one goal? Or one goal+target? Or if a flag is specified?) we should only run one frontend task at a time, and print lines as they're yielded.

This allows us to not worry too much about interleaving, but still provide the power to debug where needed.

stuhood commented 6 years ago

By default, we could batch up the output of any particular task, and atomically output it when it completes.

Unless there is exactly one running maybe? Yea, I think that makes sense. Because certainly we're going to have some other UX for the progress of @rule execution, so not rendering a frontend task's output until it completes would be plenty reasonable.

stuhood commented 6 years ago

A few open questions around the proposed "frontend rules" model:

The last one is the big sticking point for me right now, so it would be interesting to brainstorm that a bit with @kwlzn and anyone else who might be interested.

kwlzn commented 6 years ago

regarding parallelism, I'm leaning towards this rough sketch for the case of all existing ConsoleTask type impls:

1) each @console_rule gets a Console/Terminal type object instance injected. 2) those rules execute and can perhaps console.std{out,err}.write() to a unique virtual console (and maybe those are full file-like objects. unclear how they're backed exactly). (yield could work here too, but would not give us multiple streams - maybe that doesn't matter tho. or maybe we could combine them into a yield console.stdout.write('blah') to yield structured output with stream info, etc.) 3) if there's exactly 1, that becomes the main console 4) if there are multiple, we batch and emit output as they complete - and: A) if we're running under a tty, we can provide a curses virtual console UI feature to be able to jump to any one of the given virtual consoles to see realtime rendering per console. we could also somehow highlight which consoles were getting active output via the curses interface, or auto-jump to them. B) otherwise, if we're piped somewhere or running in CI, we emit buffered output as each completes (with some kind of periodic status display showing whats outstanding, ala iron/scoot). C) an option to emit interleaved lines as they're rendered might also be useful, but off by default probably or maybe a special VC.

if something needs interactivity (e.g. repl or interactive cases of run), I think that's a slightly different variant (@interactive_console_rule?) with different behavior - namely, serialization/locking for tty device access in the default mode such that only one interactive thing is executing at once. we could also potentially address parallel interactivity via the virtual console impl via pseudo-terminals like screen does, but that could get tricky. running only one interactive rule at once seems like a good first start tho.

another interesting idea might be to port ./pants server to the daemon and then hotlink/auto-open a view from the console to a web interface that could provide a web view of the virtual consoles (think travis/jenkins live output etc).

one possible further idea for #2 might involve live streaming @console_rule output via the nailgun protocol as numbered streams beyond 0, 1, 2 and then let the client-side handle rendering those to VCs, interspersing or whatever other configurable display modes we can come up with.

other stray thoughts:

The big, big sticking point is that in order for any of the requests the task makes to be memoized, it would need to be running pre-fork.

or to be able to compute those requests pre-fork somehow, such that we could warm them pre-fork and then re-request post-fork cheaply? It's not immediately clear to me what sort of dynamic product requests one of these sorts of rules would be making that couldn't be computed pre-fork or via a regular, underlying @rule?

Unless there is exactly one running maybe? Yea, I think that makes sense. Because certainly we're going to have some other UX for the progress of @rule execution, so not rendering a frontend task's output until it completes would be plenty reasonable.

this makes perfect sense to me and I think it'll all come down to the UI/UX. I think there's also a question of what sort of logic the @console_rule would fully encapsulate. would it be a full blown e.g. Task impl, or just a display driver that's accumulating finer grained products and simply displaying those? I'm sort of thinking the latter, but it'd be interesting to sketch this out end to end for the python tasks. for example, we'd discussed the idea of executing py.test and collecting structured output for that in sub-@rules, then aggregating all of that in a @console_rule. if we think of @console_rule as more of a display driver here, then it's logic could intrinsically handle optimal output interleaving of structured inputs. this could mean that a single "test executor" @console_rule could potentially drive parallel junit and pytest test execution as sub-@rules and then optimally interleave that even for disparate test runners.

how are frontend rules selected/installed, and what subject types are available to them to kick off execution?

they're probably installed like any other rule + some register.py-esque goal->rule mapping? likely with a v2 GoalRunner-esque driver that understands the mappings and would synchronously call into the engine to request/excute them?

from there, kickoff subject types are probably options, targetroots, ?

would not be "just python code": what does access to the filesystem look like?

trying to understand a concrete case where these would need real access to the filesystem vs the ability to operate in a sandboxed snapshot as subprocess - or just delegating that to a sub-@rule? but in general, unfettered filesystem access is probably a non-starter for any variant of rule? would be good to talk through this a bit tho.

what do access to stderr/stdout/ttys look like?

I think we virtualize stderr/stdout, for sure. ttys are trickier, particularly if we need to hand a real tty to a subprocess like in the case of v1 repls. thinking that initially it'd probably be easiest to just synchronize access to the main tty for any 'interactive' thing, then make everything else non-interactive by default.

we'd also talked about the idea of directly execv()'ing a built product that e.g. had a repl entrypoint post-run, which might ultimately be the cleanest way to handle any kind of task interactivity?

what would the UX for the @rules powering a frontend rule look like, given that we'd be giving frontend rules IO access while the @rules powering them might still be running

I think all of the console access here by rules is fully virtualized, so we'd ultimately have control over how to display process in the UI/UX impl. there are many console UI techniques we could employ here, and thinking about the potential there is making me pretty antsy to begin work on that :)

stuhood commented 6 years ago

Thanks for thinking about this! Much appreciated. It would likely be good to transfer some of this into a doc for easier commenting, but to respond quickly to a few items.


if we think of @console_rule as more of a display driver here, then it's logic could intrinsically handle optimal output interleaving of structured inputs.

Sortof. The issue is that unless an @console_rule generates no output until after it has yielded for the last time, it's still possible for its output to be interleaved with the running @rules that will be satisfying its Get requests. For example, this snippet would result in an interleaving of console and @rule output:

console.stdout.write('stuff')
more_stuff = yield Get(Thing, OtherThing, x)
console.stdout.write('more stuff')

In the curses mode, interleaving is a non-issue. In the "single-output-file" mode (as in naive jenkins without tailing for multiple output files), this results in interleaving. I think I'm convinced that appropriately indented-and-labeled interleavings would be ok, but this is a case that we'll need to be careful of.


would it be a full blown e.g. Task impl, or just a display driver that's accumulating finer grained products and simply displaying those? I'm sort of thinking the latter, but it'd be interesting to sketch this out end to end for the python tasks. for example, we'd discussed the idea of executing py.test and collecting structured output for that in sub-@rules, then aggregating all of that in a @console_rule.

I think my feeling on this is that in order to get something that renders individual test runs as they complete, individual test runs would have to be rendered in the underlying @rule output, rather than in the @console_rule output. That would mean that in the case of a "test runner"@console_rule, it's primary job would be take all xml test result output and test logs and place them somewhere in dist, and then maybe print out a summary of pass/fail.

The reason I say this is because there is no API currently to "stream" results to an @console_rule (or @rule). Basically, the result of a yield is not an iterator: it's a concrete list. Changing that would be a pretty significant API change I think, and I'm not sure it would be positive.

This is a new realization (for me): will comment more on it in the last paragraph.


or to be able to compute those requests pre-fork somehow, such that we could warm them pre-fork and then re-request post-fork cheaply? It's not immediately clear to me what sort of dynamic product requests one of these sorts of rules would be making that couldn't be computed pre-fork or via a regular, underlying @rule?

This is essentially what I had been thinking of doing on this ticket in the first place, but the big big challenge in that case is: what does the definition language look like for the queries that run pre-fork? Basically, it reduces to exactly the problem we had with SelectProjection/SelectDependencies/etc... it's much more challenging to provide flexibility with a declarative language than it is with the generator-based @rules.

In spite of what I've said above about @console_rules being relatively thin, and @rules doing most of the rendering, I still think it is the case that @console_rules will have complex requirements that involve filtering out target roots (and we never introduced an API to the complex selectors that allowed for filtering), locating tools and subsystems, and etc. So I think that making them generators makes sense.


While writing this response, the fact that @rules will need to render things as well has become more clear to me (the test-running example in particular). I think that that means that however we do the virtual console selection, we'll need to provide it to both @console_rules and @rules. The distinctions between the two then currently seem to be:

  1. whether filesystem access is available
  2. whether more than one is allowed to render at a time
  3. the @interactive_console_rule bit (although if the answer to 2. is that all @console_rules are serialized, then we might not need to distinguish between interactive and non-interactive)
illicitonion commented 6 years ago

it's still possible for its output to be interleaved with the running @rules that will be satisfying its Get requests

Interesting - I had been assuming that @rules wouldn't themselves be allowed to output anything, and that it would be the responsibility of whatever's driving them to be outputting... I can see how we've gotten here, though... I'm inclined to think that we should preserve @rules not having console access, and instead swap in an interactive frontend task for the "I'm running tests and want to stream their output", rather than trying to build a model where @rules may output to the console. Logging makes an interesting exception, though...

or to be able to compute those requests pre-fork somehow, such that we could warm them pre-fork and then re-request post-fork cheaply

I'm increasingly thinking that we're going to need to change the fork model here. I feel like we're having to dance around a lot of circles, which aren't good circles to be dancing around, to make sure that things happen at the right time relative to the fork. ("Either we can run the tests per-fork and debugging them is hard, or we run them post-fork and lose our caching")

Either keeping the cache out of memory, making the post-fork cache populate the pre-fork cache, or making a rust process be the master process, and scheduling executions in threads rather than processes, would make this dancing go away.

@console_rules and @rules. The distinctions between the two then currently seem to be:

  1. whether filesystem access is available

Why is filesystem access required in either?

stuhood commented 6 years ago

@console_rules and @rules. The distinctions between the two then currently seem to be:

  1. whether filesystem access is available

Why is filesystem access required in either?

All publishing of files into dist currently happens via tasks, and formatters need to be able to manipulate the working copy.

illicitonion commented 6 years ago

@console_rules and @rules. The distinctions between the two then currently seem to be:

  1. whether filesystem access is available

Why is filesystem access required in either?

All publishing of files into dist currently happens via tasks

If we're going to have a concept of dist, I wouldn't feel too weird about a write_snapshot_to_dist intrinsic...

and formatters need to be able to manipulate the working copy.

Formatters definitely feel like things which fit into a "You only run one of these, and it takes a lock" kind of use-cases - I can't see how anything else can be deterministic, as @rules and things changing the @rules' inputs would interleave differently...

stuhood commented 6 years ago

Yea, it's possible that both of those usecases could be explicit.

stuhood commented 6 years ago

@kwlzn will be splitting this ticket up into a bunch of UX related tickets, with "exposing an alternative to v1 Tasks" as the highest priority.

stuhood commented 6 years ago

.@kwlzn split this up into tickets with the v2 python pipeline label:

While it's possible that we'll still need to add a "(cacheable) v2 products from v1 tasks" API in order to support remoting, I think that can be added as we go to unblock particular usecases (@illicitonion's work on codegen in #5994 seems like the most immediate beneficiary, as running codegen pre-fork to make it cacheable in memory would clean things up significantly).

stuhood commented 6 years ago

I've had a realization that the original goal of this ticket is actually trivially accomplishable using a mechanism similar to @console_rule: basically, have v1 Tasks declare a v2 product type that they need computed for TargetRoots (using precisely the same mechanism as @console_rule). You'd then install an @rule to handle computing that product type pre-fork.

The difference from what I originally suggested here is pretty slim: it's essentially just that rather than using a list of Selectors to define what it needs, a Task would defer the "querying" to an @rule that computes a particular type (which might be bespoke for that Task).