[rush-lib] Proposal: Rework interaction between Build Cache and TaskRunner

dmichon-msft commented 2 years ago

Current State

Currently reads from and writes to the build cache are treated as part of the execution of a single task in the project graph. This has the following undesirable effects: 1) Cache reads wait until all dependencies have been computed. Since the cache lookup logic does not depend on the outcome of dependency builds, this is a wasted opportunity for parallelism. 1) Since the standard task prioritization algorithm prioritizes tasks that have the longest chain of dependencies, if there are two distinct project subtrees:

A -> B -> C -> D -> E

F -> G
And all of A-E are cache hits, whereas F and G are not, then execution of the build for F and G will wait on replaying the cached outputs from A, B, and C (subject to available parallelism). The optimal build order, given the cache state, would be to do F and G first, then replay the cached builds. 1) Cache writes being part of execution causes consuming projects to wait for tar archive generation, which in the presence of Windows Defender can stall due to the outputs being virus scanned before they become available to the tar utility. For large projects this can sometimes introduce several minute stalls.

Proposed Solution

I propose we rework the way the build cache interacts with the task runner to do the following: 1) Check for cache hits during task generation, rather than task execution. For each build task that has a cache hit, replace the task with a "read from build cache" task that has no dependencies. 1) When constructing the task ordering, after accounting for critical path length add a further sort element that ranks "read from build cache" tasks as lower priority than regular "build" tasks (i.e. "read from build cache" task with 1 consumer will evaluate before a "build" with 0 consumers, which will in turn evaluate before a "read from build cache" task with 0 consumers). 1) If cache writes are enabled, queue "write to build cache" as a separate task that depends only on the corresponding "build" task, and which has the lowest possible priority.

Generalization

To allow the build cache layer to operate as a plugin, the following hooks will suffice:

A "afterCreateTasks" async waterfall hook that allows asynchronous modification of the set of tasks for the current rush execution pass
A "compareTasks" sync bail hook that allows customizing the sort algorithm for tasks

Limitations

The "afterCreateTasks" hook requires all queries to the build cache to execute before any tasks start executing, which may increase the startup time when interacting with a cloud build cache with no cache hits.

Interaction with Phased Builds (#2300)

Phased builds purely alter the scope of a single "build" task in the compilation, so all benefits of this proposal will carry over to the finer-grained scheduling of the phased builds proposal.

iclanton commented 2 years ago

I really like this idea. How difficult do you think this would be to implement? I'm guessing we should do this after we finish the phased build work.

dmichon-msft commented 2 years ago

PR #3043 moves some machinery around to provide a nice spot to call these hooks at the moment the AsyncTaskQueue object is constructed.

The TaskSelector provides a base implementation of the createTasks hook for non-phased builds. The compareTasks hook will be invoked by the AsyncTaskQueue initialization process as part of its sort function. }

microsoft / rushstack