Closed iclanton closed 5 months ago
Addendum: The above design inspires a number of additional features that we also want to implement, but which do NOT need to be part of the "MVP" (minimum viable product) initial implementation.
In our meeting, it was pointed out that two closely related features often get confused:
rush rebuild
, the package-deps_build.json
incremental state is ignored, forcing a full rebuild.heft clean
(or heft build --clean
) performs an operation akin to rm -Rf lib/
. Intuitively this should imply #1 above. But note that an incremental build could (and perhaps should) also perform rm -Rf lib/
before invoking the compiler. Today we skip that as a performance optimization, although a more correct optimization would be to ONLY skip deleting known compiler outputs.Today there is no rush rebuild
equivalent for custom commands. The suggested solution would be to deprecate rush rebuild
, replacing it with two solutions:
rushx clean
(and rush clean
which cleans all projects) will invalidate the incremental build state for ALL Rush commands.rush --full build
or rush --full my-bulk-command
that tells Rush to disregard the incremental build state.We could provide a standard way to define a custom parameter --no-docs
, which would skip the _phase:docs
phase for commands like rush test --no-docs
or rush build --no-docs
.
Because of phase dependencies, not EVERY phase should be automatically skippable. Rather, the person who is designing custom-commands.json would define specific CLI switches that make sense for users to invoke. This fits with Rush's goal of providing a CLI that is intuitive for newcomers, tailored around meaningful everyday operations, well documented, and validated. Our CLI is not trying to be an elaborate compositional algebra that allows users to creatively string random tasks together.
The above package.json files have a lot of boilerplate that would need to be copy+pasted into each project:
"scripts": {
"build": "heft build",
"test": "heft test",
"docs": "doc-tool",
"my-bulk-command": "./scripts/my-command.js",
"clean": "heft clean",
"_phase:compile": "heft build --lite",
"_phase:lint": "heft lint",
"_phase:test": "heft test --no-build",
"_phase:docs": "doc-tool"
},
The rush
and rushx
commands should provide a way to move these declarations into a rig package.
CC @ifeanyiecheruo
Makes sense to me. For what it's worth, two things immediately come to mind:
_phase:
seems odd. At that point, we already know it's a phase. I feel like outside of package.json files, prepending with _phase:
seems redundant. If we relegate that _phase:
syntax only to the package.json files, it makes it clear that the reason we chose to do it that way was for visiblity/conflicting with similarly named normal commands. However, this could present some issues with validating that all associatedCommands
are not associatedPhases
, since this would allow for name collisions (unless we explicitly block name collisions on commands vs. phases as well)* If we relegate that `_phase:` syntax only to the package.json files, it makes it clear that the reason we chose to do it that way was for visiblity/conflicting with similarly named normal commands.
@D4N14L But wouldn't it be confusing for the command to be called "_phase:test"
in package.json but "test"
in command-line.json?
As an analogy, when an ESLint configuration refers to a plugin package, they allow you to write @jquery/jquery
instead of @jquery/eslint-plugin-jquery
. I found this to be highly confusing as a newcomer to ESLint configurations. Since you have to write the "plugins"
list only once when you are setting up the config file, the extra characters (eslint-plugin-
) weren't any hassle. Whereas having to learn that two different names refer to the same thing, that was a hassle. :-)
- would we validate before running build that every phase in the "phase dependency graph" is specified in the phases property of the phased command?
There's an interesting tradeoff between making sure developers didn't forget to specify a command, versus requiring needless boilerplate in many places (which motivated ignoreMissingScript in command-line.json.)
If we implement support for rigs, it might change this tradeoff.
A couple of observations
a) The _phase:* scripts feel really out of place in the package.json.
"scripts": {
"build": "heft build",
"test": "heft test",
"docs": "doc-tool",
"my-bulk-command": "./scripts/my-command.js",
"clean": "heft clean",
"_phase:compile": "heft build --lite",
"_phase:lint": "heft lint",
"_phase:test": "heft test --no-build",
"_phase:docs": "doc-tool"
},
Consider declaring the phase commands in a per project (riggable) rush file instead - where you can have clear, rich, declarative semantics. Consider having rushx
and rush
drive off of that config and fallback on package.json
b) You seem to be slowly and adhoc working towards a constrained makefile. One that rush can reason over and orchestrate.
Why not embrace and formalize the part of makefiles that is most useful - Updating dependencies before running a specific command
Namely have rushx
run dependencies for a given phase across dependent projects before running the requested command.
What the world could look like by implementing the two observation above...
One would introduce a new my-per-project-rush-config.json
{
"commands": {
"clean": {
"script": "heft clean",
"phase": "clean"
},
"compile": {
"script": "heft build",
"phase": "compile",
"dependencies": "compile_dependent_projects"
},
"lint": {
"script": "heft lint",
"phase": "???",
"dependencies": "compile_this_project"
},
"test": {
"script": "heft test",
"phase": "test",
"dependencies": "compile_this_project"
},
"doc": {
"script": "doc-tool",
"phase": "???",
"dependencies": "compile_this_project"
},
"build": {
"script": "doc-tool",
"dependencies": "clean_compile_lint_test_doc_this_project"
}
}
}
rushx compile
Make sure my compile dependencies across all projects are up to date, then tsc
my project
rushx compile --no-dep
Only tsc
my project
rushx test
Make sure my compile is up to date, then Jest
my project
rushx test --no-dep
Just Jest
my project
rushx build
Make sure basically all my dependencies are up to date, then tsc
, es-lint
, Jest
, and doc-tool
my project
rush build
Evaluate out of date build
dependencies for all projects once, then schedule all command in phases to maximize throughput
package.json
can now become a convenience interface for tools that depend on pnpm run
"scripts": {
"build": "rushx lint",
"buildall": "rushx build",
"test": "rushx test",
"docs": "rushx doc",
"clean": "rushx clean"
}
Together both changes map far more closely the behavior I've come to expect from something like hitting F5 in visual studio
Or
package.json
can remain
"scripts": {
"build": "heft build",
"test": "heft test",
"docs": "doc-tool",
"my-bulk-command": "./scripts/my-command.js",
"clean": "heft clean",
}
In the absence of a rush config, rush
and rushx
fallback on pnpm run
but do not provide any scheduling intelligence. We get exactly the behavior we have today
P.S It's not clear to me that phases are needed for what is described above. P.P.S Never mind I figured out why they would still be needed.
One issue currently with our task scheduling that phased builds are likely to exacerbate is that rush currently runs tasks in a completely arbitrary order, with no prioritization within the set of currently available tasks to execute. I expect that we will likely benefit overall runtime significantly if we minimally give some consideration to either the total number of other tasks dependent on a given available task, or the maximum depth of the tree based on said task. That way we don't end up running a pile of tasks that have no dependents before progressing on the critical path and thereby leaving cores idle unnecessarily.
Update: I missed where that bit of code for ordering was defined, since I was only looking in TaskRunner. Only change would be if we want to add a second sort criteria after the first.
One issue currently with our task scheduling that phased builds are likely to exacerbate is that rush currently runs tasks in a completely arbitrary order, with no prioritization within the set of currently available tasks to execute. I expect that we will likely benefit overall runtime significantly if we minimally give some consideration to either the total number of other tasks dependent on a given available task, or the maximum depth of the tree based on said task. That way we don't end up running a pile of tasks that have no dependents before progressing on the critical path and thereby leaving cores idle unnecessarily.
This is a good idea. Could you open a separate GitHub issue proposing a specific improvement? The same task scheduler will likely be used for both phased and non-phased operations, so this improvement probably isn't specific to phased commands.
FYI today Rush already does some basic optimization:
...but it could be improved.
For reference, @iclanton 's draft PR is here: https://github.com/microsoft/rushstack/pull/2299
We'll want to investigate supporting pack
, "hydrate from build cache" and "write to build cache" as phases for better concurrency. Maybe have a concept of "built in phases" for some of these things?
Really excited by this feature and am looking for any ways I could help chip in.
A couple thoughts:
While enabling the build cache feature recently in our monorepo, something that made life easier was that you could turn it on and configure it without touching all the projects. Since most of our projects had no rush-project.json
(and/or did not define projectOutputFolders
), build cache was just disabled for most projects, and you could focus on defining projectOutputFolders
for the easy projects and/or rigs first, and follow up later with the projects that needed more configuration changes.
I'd like it if this feature worked the same way. For example -- you turn on phased commands, you define your phases, but projects that haven't defined their phases yet continue to operate as a single block. (To other projects with phases enabled, this affects nothing but the timing -- it just means that whether Project A is depending on _phase:compile
or _phase:docs
for Project B, if Project B hasn't opted-in to phased builds, it is really depending on a classic rushx build
to complete.)
You'd need to decide what "opting in" means here... Maybe it's "has defined at least one _phase task in package.json".
(Our monorepo is a good use case here... we have a big majority of projects that are classic heft
builds, and we could copy and paste a bunch of phases into all of them. But then there's some outliers with special build scripts that use other tools, where it might be complicated to split them up as desired. I suppose our fallback without this feature would be to define _phase:compile
as "rushx build"
, and define all other _phase
tasks as ""
-- not semantically correct, but would behave just like the paragraph above.)
If package.json
continues to be the primary API between rush and rushx, this feature definitely introduces a new muddy layer that is not intuitive for the "application owners" (i.e. they don't spend time managing the monorepo). Your example package.json
has explanatory comments, but in real life you can't put those comments there.
What if instead of _phase:blah
, phases were listed as part of the commands they represent? Taking your example, this might look like this (with no comments, the way a new developer in a project might see it):
{
"scripts": {
"build": "heft build",
"test": "heft test",
"docs": "doc-tool",
"my-bulk-command": "./scripts/my-command.js",
"_build:compile": "heft build --lite",
"_build:lint": "heft lint",
"_build:docs": "doc-tool",
"_test:compile": "heft build --lite",
"_test:lint": "heft lint",
"_test:test": "heft test --no-build",
"_test:docs": "doc-tool",
}
}
Now without even knowing about phases (much less where to find the configuration for them), the developer will immediately intuit that in some context, there's a series of steps for building and a series of steps for testing.
Note that in command-line.json
, you'd still define phases as constants in relation to each other, for example, _phase:test
depends on _phase:compile
, etc. But when you go to call the script, you don't call _phase:test
, you call _${taskName}:test
. (To be honest, I don't know if I'm even 100% sold on the "reusable global phases across all phased commands" idea -- but this is one way you could retain that idea and still make this change in package.json).
Expanding on @octogonz comment around "CLI syntax for skipping phases". We've recently found ourselves requiring the ability to selectively skip the Webpack Bundle Phase as the bundled output isn't required for us to run tests. I've added a suggestion for enhancement to how Heft manages this currently #2778
In this thread @josh-biddick asked for rush build
to skip Webpack for a command the runs unit tests only. But whether it is safe to skip Webpack depends on how it is used. Some projects use Webpack to generate a bundle that is consumed by tests, whereas other projects export CommonJS files for tests, and the Webpack bundle is only used by apps.
This suggests that the phase dependencies may sometimes depend on the project (or rig profile).
I would argue that if unit tests have a custom webpack bundle, said bundle should be generated as part of the test phase rather than the bundle phase.
I would argue that if unit tests have a custom webpack bundle, said bundle should be generated as part of the test phase rather than the bundle phase.
I was referring to a multi-project setup, like this:
library-a
:
library-a/dist/bundle-a.js
library-a/src/a.test.ts
library-b
:
library-b/lib-commonjs/index.js
for Jest consumerslibrary-b/dist/bundle-b.js
for Webpack consumersconsumer-c
:
consumer-c/src/c.test.ts
that consumes library-a/dist/bundle-a.js
and library-b/lib-commonjs/index.js
consumer-c/dist/bundle-c.js
which is the final appThus a rush only-run-the-tests
custom command could save time by skipping webpack for library-b
and consumer-c
. But it would NOT be safe to skip Webpack for library-a
.
Reading the original thread more closely, it sounds like @josh-biddick's case was more like he wants to run tests for consumer-c
without actually testing the app. So the --no-bundle
flag proposed in https://github.com/microsoft/rushstack/issues/2778 might be sufficient for that, if nothing else depends on the projects that use it.
We did a "Rush Hour breakout session" today to discuss the status of PR #2299, as well as its interaction with the Rush build cache feature.
The build cache brings some new requirements that weren't considered in the original design for #2300. The consensus was to handle it as follows:
./temp/compile-phase
(whereas today the rush-project.json insists that they are top-level folders). But none of these output paths can be under another output path. Thus the tar
command simply includes ./temp/compile-phase/**
without having to calculate exclusions.rush build
tarball to be restored for a rush build --lite
command, but decided to avoid that sort of complexity. Instead, we will try to represent --lite
as a macro for selecting phases. (This will be a separate feature in the future, not part of the initial requirements.)rush build
incremental feature.)This discussion also highlighted some minor related fixes for the build cache:
"build": ""
should get tarballs by default.The next steps are as follows:
master
branch, incorporating the above suggestions."build": ""
I forgot to mention during the meeting, but I think there's a worthy (7):
For the user output, we could continue to collate at the project level, perhaps with a "sub" =======
header inside each project level to denote when individual phases start. I think that's probably OK as long as the scheduler still aggressively prioritizes "finishing" projects that have started, so that the collated log isn't stuck 8 projects behind for minutes at a time.
Another option is to adjust the collation down to the phase level, but this is probably only interesting to people really into working on rush and not to people just building their projects.
Last, I would love to see some machine output at the end of a build -- maybe a JSON file -- that lists every phase that was run, the number of seconds each took, and the number of seconds into the build it was started. This would be enough data to reconstruct the timeline of a build in an external tool or program, to compare different changes to the task scheduler down the road, etc. (The format of this data, some day, could also be used as an input to the task scheduler... I imagine that being able to weight each node in the DAG with estimated run times could make for a much better prioritization than just number of child nodes.)
@elliot-nelson
For the user output, we could continue to collate at the project level, perhaps with a "sub"
=======
header inside each project level to denote when individual phases start. I think that's probably OK as long as the scheduler still aggressively prioritizes "finishing" projects that have started, so that the collated log isn't stuck 8 projects behind for minutes at a time.
We'll need to collate only within a single external process invocation, which in this case means a single phase in a single project. The primary goal, at least for my team, of phased builds is to allow the scheduler to deprioritize parts of individual projects that are not on critical paths, and to execute in parallel tasks that have no upstream dependencies. It is expected that the phases that comprise a single project will rarely if ever be executed back-to-back.
Regarding machine output, #2569 emits this information in the Trace Events Format used by Chromium (e.g. you can import it into chrome://tracing or the Performance profiler). It'll probably need adjustment to handle merge conflicts, though.
We'll need to collate only within a single external process invocation, which in this case means a single phase in a single project. The primary goal, at least for my team, of phased builds is to allow the scheduler to deprioritize parts of individual projects that are not on critical paths, and to execute in parallel tasks that have no upstream dependencies. It is expected that the phases that comprise a single project will rarely if ever be executed back-to-back.
This makes perfect sense the way I envision a rush build -v
in CI; maybe instead of "05/70" projects you're looking at "025/350" phases to build. But for a non-verbose rush build
on a local machine, it seems like a pretty big burden of output.
I'd much rather see something like this locally with normal phased builds:
==[ heft-node-rig-tutorial ]=====================================[ 60 of 77 ]==
✅ compile "heft-node-rig-tutorial" completed successfully in 10.50 seconds.
✅ lint "heft-node-rig-tutorial" completed successfully in 3.50 seconds.
✅ test "heft-node-rig-tutorial" completed successfully in 30.15 seconds.
==[ doc-plugin-rush-stack ]======================================[ 61 of 77 ]==
✅ compile "doc-plugin-rush-stack" completed successfully in 7.00 seconds.
✅ lint "doc-plugin-rush-stack" completed successfully in 1.10 seconds.
✅ test "doc-plugin-rush-stack" completed successfully in 15.93 seconds.
==[ generate-api-docs ]==========================================[ 62 of 77 ]==
I think for the average developer working on a project in the monorepo, they'd prefer to continue to see a project-focused output, but with the additional breakdown of phases as they complete. However, if the scheduler totally deprioritizes phases without any dependents, this output would be pretty broken.
If we do go down that road then maybe we can include some sort of progress indicator for "how much of" a project is done, like:
==[ compile heft-node-rig-tutorial ]=============================[ 60 of 77 ]==
compile "heft-node-rig-tutorial" completed in 10.50 seconds (phase 1/4)
==[ compile doc-plugin-rush-stack ]==============================[ 61 of 77 ]==
compile "heft-node-rig-tutorial" completed in 10.50 seconds (phase 1/4)
==[ test heft-node-rig-tutorial ]================================[ 60 of 77 ]==
test "heft-node-rig-tutorial" completed in 30.50 seconds (phase 2/4)
Edit: or embed it in the header:
==[ compile heft-node-rig-tutorial ]==============[ 60 of 77 / phase 2 of 4 ]==
For non-verbose builds, the main things we need to convey are: 1) That Rush is "doing something", and ideally some vague notion of what it is doing 2) When an error occurs, the entire error (truncation makes the output completely useless right now 90+% of the time)
It was suggested to me that we really ought to consider doing more webpack and other tools that simply display a current status summary and erase and update it with new information. Something like:
==[ Building \ ]======================[ 60 of 77 ]==
Phase status:
- "compile": 24 of 27 complete
- "lint": 22 of 27 complete
- "test": 12 of 23 complete
Then when we encounter errors, we surface the errors in as useful yet concise a form as possible. It may behoove us to create a standard syntax or JSON schema for something like the TypeScript compiler's "diagnostics" that have a message, error code, source file, line and column that we can aggregate and hand off to a customizable formatter.
👏 I love that idea, and wouldn't be unusual in the nodejs CLI world. Even rush install
(with pnpm
anyway) has a fancy in-progress output.
Breaking it out by phase first, like you have, also makes it really clear the progress of the different phases (just like pnpm with the resolved / downloaded / added status bar). Maybe we also could include a tally of the statuses, so for example if a unit test fails, we would spit out the collated error for that unit test phase and then continue with the progress below (and you'd be able to see in the WIP meter that you had one failure so far in the test phase).
==[ test "doc-plugin-rush-stack" ]==================
test doc-plugin-rush-stack failed with:
<< some unit test failure>>
==[ Building \ ]======================[ 60 of 77 ]==
Phase status:
- "compile": 24/27 (24 success, 0 with warnings, 0 failed)
- "lint": 22/27 (21 success, 1 with warnings, 0 failed)
- "test": 12/23 (11 success, 0 with warnings, 1 failed)
Last, I would love to see some machine output at the end of a build -- maybe a JSON file -- that lists every phase that was run, the number of seconds each took, and the number of seconds into the build it was started. This would be enough data to reconstruct the timeline of a build in an external tool or program, to compare different changes to the task scheduler down the road, etc. (The format of this data, some day, could also be used as an input to the task scheduler... I imagine that being able to weight each node in the DAG with estimated run times could make for a much better prioritization than just number of child nodes.)
I love the idea of this, and hooking it into telemetry.
For @octogonz's Step 1 above, I've pushed up the example repo https://github.com/elliot-nelson/rush-phased-builds-example.
(I think this hits most of @dmichon-msft's suggested test cases, but we can expand to include more if needed.)
@elliot-nelson, it doesn't look like the example contains an update to the rush-project.json
file schema to support describing which folders are outputs of each phase, and if a phase doesn't produce output. I'm also not seeing where the phases themselves are described, unless I'm missing something. The original design called for that to be in command-line.json
, although I'm starting to question if that's the right place for this, given that the definitions of individual phases is creeping out of the scope of what is really part of the "command line."
Sorry @iclanton, I probably should have linked directly to https://github.com/elliot-nelson/rush-phased-builds-example/pull/1
(I've put the "phased version" in the branch phased
rather than main, so that if we need to add more projects and phases the baseline remains buildable in current rush.)
Ahh there we go! Cool I'll take a look.
@elliot-nelson, @octogonz, I put together a PR for the command-line.json
portion of this feature: https://github.com/microsoft/rushstack/pull/2936. I'll start on rush-project.json
tomorrow.
I made some tweaks to @elliot-nelson's branch. See PR here: https://github.com/elliot-nelson/rush-phased-builds-example/pull/2. Feedback is welcome.
@elliot-nelson raised some interesting design questions. I listed some out in this comment: https://github.com/microsoft/rushstack/pull/2936#discussion_r721164434
@elliot-nelson, @octogonz, here's PR 2: https://github.com/microsoft/rushstack/pull/2969
Once we get into the actual behavior, it might be worth making a decision about https://github.com/microsoft/rushstack/pull/2840 ahead of time.
(Making the file structure -- which really boils down to keys -- include the command name would in hindsight have been a bad idea, or at least would have needed to be further fixed, to support phases. In theory a key like "packageName/phase/hash" would be safe though -- the issue with "build/rebuild", "test/retest" doesn't exist as long as the command is a phased command.)
Design update: In PR #3103, we've moved the build logs to a new folder:
<project-name>/<project-name>.build.log
(regardless of which Rush command is run)<project-name>/rush-logs/<project-name>.<command-name>.log
. If the name includes a :
character, it will be converted to a _
. (Naming collisions cannot occur, because Rush command names can only contain lower case letters, numbers, hyphens, and colons.) Rush will no longer write any *.log
files in your project root folder.For a phased command, multiple files will be created, for example:
my-project/rush-logs/my-project.build.log
my-project/rush-logs/my-project._phase_compile.log
my-project/rush-logs/my-project._phase_jest.log
This redesign applies to all commands (whether phased or not).
For now, the new folder location is only used if the _multiPhaseCommands
experiment is enabled.
CC @elliot-nelson @chengcyber
@octogonz what are the reasons to put logs into a new <project>/rush-logs
and not into <project>/.rush/logs
?
And I'm still hoping we will make some progress for this Option for storing projects autogenerated metadata in specified place. I'm preparing some info for that, it just took a bit more time than I expected 😅
@deflock - I thought about dropping logs in the .rush/logs
folder, but generally the dot-prefixed folders are used only by machine-written and machine-read files. I'd expect a lot of text editors actually have these folders hidden.
@octogonz, @elliot-nelson - point of clarification. For "flag"
parameters with the "addPhasesToCommand"
or "skipPhasesForCommand"
properties, should the flag still be passed on to the scripts that are run?
For example, in @elliot-nelson's example repo, there's a --lite
flag defined that skips _phase:lint
, _phase:test
, and _phase:push-notes
(leaving _phase:compile
and _phase:update-readme
). Should --lite
be passed to the scripts run by _phase:compile
and _phase:update-readme
?
Proposing a solution for the flag question here: https://github.com/microsoft/rushstack/pull/3124
@octogonz , @iclanton , @elliot-nelson , I'm working on updating the internals of the task scheduler right now and ran across a bit of a gap in the design with respect to the action of parameters that add or remove phases, particularly in the face of the statement in the comments that phases for commands are implicitly expanded (ala the "--to" CLI operator): In what order should the "skipPhasesForCommand" on the command definition, the implicit phase expansion to include dependencies, the "addPhasesToCommand" and "skipPhasesForCommand" on flag parameters be applied?
My current thoughts are something like: 1) Start with the command's "phases" list 1) Apply the "--to" operator in phase space to the selected phases (I would prefer to just have an exact list in the command instead of this operation or step (3) below) 1) Apply "skipPhasesForCommand" defined in the command 1) Apply "addPhasesToCommand" from any parameters 1) Apply "skipPhasesForCommand" from any parameters
In what order should the "skipPhasesForCommand" on the command definition, the implicit phase expansion to include dependencies, the "addPhasesToCommand" and "skipPhasesForCommand" on flag parameters be applied?
I think at Rush Hour the consensus was to postpone this feature until after initial release, since the design discussion will be easier after people have some firsthand experience using the feature.
Regarding the design questions in https://github.com/microsoft/rushstack/pull/3144#issuecomment-1008133601, @elliot-nelson and I had a big discussion about this today. We focused on the following requirements:
projectOutputFolderNames
(e.g. because it's in a far away config file), and then rush build
fails nondeterministically, only in the misconfigured library is loaded from cache but its consumer is not. (In other words, things are broken but it's difficult to repro because rush rebuild && rush build
always succeeds.)projectOutputFolderNames
differently for phased vs unphased commands (which was a consequence of calling the setting "phaseOptions"
even though it applies to unphased commands)The config file currently looks like this:
\<your project>/config/rush-project.json (riggable)
{
"buildCacheOptions": {
// Selectively disables the build cache for this project. The project will
// never be restored from cache. This is a useful workaround if that project's
// build scripts violate the assumptions of the cache, for example by writing
// files outside the project folder. Where possible, a better solution is
// to improve the build scripts to be compatible with caching.
"disableBuildCache": true,
// Allows for fine-grained control of cache for individual Rush commands.
"optionsForCommands": [
// The Rush command name, as defined in custom-commands.json
"name": "your-command-name",
// Selectively disables the build cache for this command. The project will never
// be restored from cache. This is a useful workaround if that project's
// build scripts violate the assumptions of the cache, for example by
// writing files outside the project folder. Where possible, a better solution
// is to improve the build scripts to be compatible with caching.
"disableBuildCache": true
]
},
// Specify the folders where your toolchain writes its output files.
// If enabled, the Rush build cache will restore these folders from the cache.
// The strings are folder names under the project root folder. These folders
// should not be tracked by Git. They must not contain symlinks.
"projectOutputFolderNames": [ "lib", "dist" ],
// The incremental analyzer can skip Rush commands for projects whose
// input files have not changed since the last build. Normally, every Git-tracked
// file under the project folder is assumed to be an input.
// Set incrementalBuildIgnoredGlobs to ignore specific files, specified as globs
// relative to the project folder. The list of file globs will be interpreted the
// same way your .gitignore file is.
"incrementalBuildIgnoredGlobs": [],
// Options for individual phases.
"phaseOptions": [
{
// The name of the phase. This is the name that appears in command-line.json.
"phaseName": "_phase:build",
// Specify the folders where this phase writes its output files. If enabled,
// the Rush build cache will restore these folders from the cache. The strings
// are folder names under the project root folder. These folders should not be
// tracked by Git. They must not contain symlinks.
"projectOutputFolderNames": [ "lib", "dist" ]
}
]
}
Issues encountered with this design:
"projectOutputFolderNames"
is a top-level setting, whereas it should be specified differently for each custom command (unless that command has phases). And the folders must not overlap. So the build cache really only works with rush build
right now. 👉 See recent discussion in Zulip #general - build-cache with custom commandsprojectOutputFolderNames
has a "project" prefix that sounds like it's for an entire project, but actually it affects an individual shell scripts (which might be a phase or a command)phaseName
is not defined by the monorepo -- PR #3144 is blocked behind this problem, since it's trying to define phases in a rig, but our rig shouldn't FORCE you to use those phasesExplanation is in the file comments:
\<your project>/config/rush-project.json (riggable)
{
"incrementalBuildIgnoredGlobs": [],
// Let's eliminate the "buildCacheOptions" section, since now we seem to
// only need one setting in that category
"disableBuildCacheForProject": true,
// INSIGHT: We have a bunch of settings that apply to the shell scripts
// defined in package.json, irrespective of whether they are used by phases
// or classic commands. So instead of inventing an abstract jargon that
// means "phase or command", let's just call them "projectScripts".
"projectScripts": [
{
// This is the key from the package.json "scripts" section.
// To support rigs, it is OKAY to provide configuration for scripts that
// do not actually exist in package.json or are not actually mapped to
// a Rush command.
"scriptName": "_phase:build",
// These are the folders to be cached. Their cache keys must not overlap,
// HOWEVER that validation can safely ignore: (1) scripts that aren't mapped
// to a Rush command in a given repo, (2) scripts that have opted out of
// caching, e.g. via disableBuildCacheForProject or disableBuildCacheForScript
"outputFolderNames: ["lib", "dist"],
// Allows you to selectively disable the build cache for just one script
"disableBuildCacheForScript": true,
// FUTURE FEATURE: If your shell command doesn't support a custom parameter
// such as "--lite" or "--production", you can filter it here. This avoids
// having to replace "rm -Rf lib && tsc" with "node build.js" simply to
// discard a parameter.
"unsupportedParameters": [ "--lite" ],
// FUTURE FEATURE: We could optionally allow rigs to define shell scripts in
// rush-project.json, eliminating the need for projects to copy+paste
// these commands into package.json.
"shellScript": "heft build --clean"
},
{
// Addressing the question from PR #3144, the Rush Stack rig could provide
// both "build" and "_phase:build" definitions, allowing the rig to be used
// in any monorepo regardless of whether they are using phased commands.
"scriptName": "build",
"outputFolderNames: ["lib", "dist"]
}
]
}
This redesign would be a breaking change, but both the build cache and phased commands features are still "experimental."
Love the new design, although a little worried about the transition of the fields in rush-project.json when people install a new Rush version.
(maybe we could include a customized error or honor the projectBuildOutputFolders etc as a stopgap)
A helpful error message seems sufficient. As long as you know how to update rush-project.json, the actual work of updating it seems pretty easy, and can be procrastinated until you are ready to upgrade Rush.
@octogonz , @elliot-nelson , I've been thinking over this a lot in the process of writing #3043, fiddling with the scheduler and the generation of the Rush task graph, as well as how it interacts with Heft, etc., and my current thoughts for North star design have changed a lot. I very much want rush-project.json
and its riggable nature to be the authority on build tasks, but I'd like to take it a bit further:
rush-project.json
{
// The set of build tasks (essentially subprojects) that exist for this project.
// "npm run build" becomes an alias for "run these in the correct order"
"tasks": {
// Convert SCSS to CSS and .scss.d.ts
"sass": {
// Filter to apply to source files when determining if the inputs have changed.
// Defaults to including all files in the project if not specified
"inputGlobs": ["src/**/*.scss"],
// What files this task produces. If specified, will be used for cleaning and writing the build cache
// Ideally these are just folder paths for ease of cleaning
"outputGlobs": ["lib/**/*.css", "temp/sass-ts"],
// What local (or external) tasks this task depends on.
// If not specified, has no depenencies.
// Not quite sure on structure here yet, but want to be able to specify in any combination:
// - Depends on one or more tasks in specific dependency projects (which must be listed in package.json)
// - Depends on one or more tasks in this project
// - Depends on one or more tasks in all dependencies in package.json
"dependencies": {
"sass-export": ["#upstream"]
},
// What Rush task runner to invoke to perform this task. Options:
// - "heft": Rush will send an IPC message to a Heft worker process to invoke the task with this name for this project.
// Said process will load a task-specific Heft configuration and perform a build with the relevant plugin(s).
// - "shell": Rush will start a subprocess using the specified command-line
"taskRunner": "heft"
},
// Copy .scss include files to dist/sass
"sass-export": {
"inputGlobs": ["src/sass-includes/**/*.scss"],
"outputGlobs": ["dist/sass"],
"taskRunner": "heft"
},
// Emit JS files from source
"compile": {
"inputGlobs": ["src/**/*.tsx?"],
"outputGlobs": ["lib/**/*.js", "lib/**/*.js.map"],
"dependencies": {
// This task will invoke this plugin, so make sure it has been built
"bundle": ["@rushstack/heft-transpile-only-plugin"]
},
"taskRunner": "heft"
},
// Type-Check, Lint
"analyze": {
"inputGlobs": ["src/**/*.tsx?"],
// Specified as having no outputs, meaning cacheable but doesn't write anything
"outputGlobs": [],
"dependencies": {
// Depends on .d.ts files in npm dependencies
"analyze": ["#upstream"],
// Depends on .scss.d.ts files to type-check
"sass": ["#self"]
},
"taskRunner": "heft"
},
// Webpack
"bundle": {
// Omit inputGlobs to assume everything is relevant
"outputGlobs": ["release"],
"dependencies": {
// Need the JS files from this project and all of its dependencies
"compile": ["#self", "#upstream"],
}
},
"taskRunner": "heft"
}
}
Essentially my main premise is that the concept of a "Build Task" should be the entity Rush is primarily concerned with (since that's what the scheduler operates on). The current hybrid of phases + projects results in a lot of unnecessary edges in the dependency graph and limits the ability for projects to take advantage of more or fewer build tasks as appropriate to the project.
Interesting - and very configurable!
@dmichon-msft In your comment you say that "npm run build" would be an alias for running these tasks in order, but where is that controlled?
(that is, what tells "npm run build" -- or "rush build" for that matter -- which of these phases are relevant to the build command, and which might be relevant to a totally different custom command?)
Hmm... As things stand today, rushx build
and rush build --only .
look very similar but behind the scenes are fairly different:
rush build
populates the cache, whereas rushx build
does not because it produces arbitrary outputs -- this is why the build cache deletes lib
and re-extracts the tarball every time (in case rushx
disrupted something)rush build
spawns separate Node.js processes for each phase. rushx build
ignores phase definitions and invokes a hopefully equivalent Heft command to do everything in a single process. The reason why: rushx
sacrifices caching to get flexibility and speed for local development.rush build
only accepts the generic CLI parameters that are centrally defined in command-line.json
, and meaningful for any project across the monorepo. Whereas rushx build
will pass through all sorts of specialized parameters that only work for a single project's toolchain.rush build
is orchestrated by Rush and produces output files on disk, whereas rushx build
is orchestrated by Heft (or whatever toolchain a project may choose) and can generate in-memory outputs that don't get saved anywhere.It's annoying to have two approaches to the same problem, but they are optimized for different needs. Your multi-project watch improvements are aligning the two worlds more closely, at least for watching. However it seems unclear whether really great watching will have the same topology as really great phased/cached building.
Maybe for now the IPC taskRunner
+ inputGlobs
concept should get its own rush-project.json section (and design discussion), and we can merge its settings with phased builds later down the road if they do turn out to be the same structure?
A couple specific points of feedback:
"sass-export"
and "bundle"
user-defined strings? If so, consider putting them in JSON values. Config files are more intuitive when key=question and value=answer.The current hybrid of phases + projects results in a lot of unnecessary edges in the dependency graph and limits the ability for projects to take advantage of more or fewer build tasks as appropriate to the project.
The taskRunner
+ IPC thing is the domain of a custom Rush plugin; I only really included it in the example above as a suggestion that we make it configurable in the same way the build cache config is, i.e. custom plugins may provide alternate taskRunner
implementations. The inputGlobs
option is mostly there to point out that incrementalBuildIgnoredGlobs
as a top-level field should be mutually exclusive with defining phases. Essentially if phases are in play, the only option that remains a top-level field is if we support filtering the changefile detection algorithm.
Regarding string keys, are you suggesting that the reason various rushstack schemas use a semantically incorrect data type (an array) to represent a dictionary (we literally convert it into one immediately after parsing and have to compensate for the language not enforcing uniqueness of keys, which it would if it were written as a dictionary to begin with) is to work around a text editor autocomplete bug? If so, scripts
and dependencies
in package.json
have something to say about that. As do moduleNameMapper
and transform
in Jest, etc.
The main issue I've been encountering is how I specify that a project phase depends on some tooling projects, but doesn't require any of the other npm dependencies to have been built. This is a situation that will occur as soon as we start trying to leverage isolatedModules, since the only relevant dependencies will be on Heft and the TypeScript plugin, not on any of the imported product code. The best I can do in the current system is to inject a bogus extra "tooling" phase at the beginning of the compilation that all other phases depend on, and have that be the only phase defined in the tooling project.
@dmichon-msft and I had a long discussion today, and he persuaded me that watch mode can be defined around phases. So we're amending the proposal to introduce an abstraction term after all:
An operation
is defined to be:
We avoided words like "task", "action", "stage", etc because those words already have special meanings for Heft.
Explanation is in the file comments:
\<your project>/config/rush-project.json (riggable)
{
"incrementalBuildIgnoredGlobs": [],
// Let's eliminate the "buildCacheOptions" section, since now we seem to
// only need one setting in that category
"disableBuildCacheForProject": true,
// Note: these "settings" have no effect unless command-line.json defines the operation
"operationSettings": [ // 👈👈👈 revised
{
// This is the key from the package.json "scripts" section.
// To support rigs, it is OKAY to provide configuration for scripts that
// do not actually exist in package.json or are not actually mapped to
// a Rush command.
"operationName": "_phase:build", // 👈👈👈 revised
// These are the folders to be cached. Their cache keys must not overlap,
// HOWEVER that validation can safely ignore: (1) scripts that aren't mapped
// to a Rush command in a given repo, (2) scripts that have opted out of
// caching, e.g. via disableBuildCacheForProject or disableBuildCacheForOperation
"outputFolderNames: ["lib", "dist"],
// Allows you to selectively disable the build cache for just one script
"disableBuildCacheForOperation": true, // 👈👈👈 revised
// FUTURE FEATURE: If your shell command doesn't support a custom parameter
// such as "--lite" or "--production", you can filter it here. This avoids
// having to replace "rm -Rf lib && tsc" with "node build.js" simply to
// discard a parameter.
// (We'll have a separate design discussion for this idea.)
"unsupportedParameters": [ "--lite" ],
// FUTURE FEATURE: We could optionally allow rigs to define shell scripts in
// rush-project.json, eliminating the need for projects to copy+paste
// these commands into package.json.
// (We'll have a separate design discussion for this idea.)
"shellScript": "heft build --clean"
},
{
// Addressing the question from PR #3144, the Rush Stack rig could provide
// both "build" and "_phase:build" definitions, allowing the rig to be used
// in any monorepo regardless of whether they are using phased commands.
"operationName": "build",
"outputFolderNames: ["lib", "dist"]
}
]
}
This redesign would be a breaking change, but both the build cache and phased commands features are still "experimental."
I kind of curious about the difference between builkScript and phaseCommand. I really like the idea to avoid xxx words since they have special meanings. So, the question is could we avoid the concept phase to rush user too? Like the internal idea for each phases is a part of script. Such as build script defines heft build, so phases might be lint, build, test… it has been already known by heft. While if i define a script names lint, ‘eslint —fix’ ,i can also defines lint script itself as phase command.
My idea is to avoid phase concept totally and treat ‘phase’ as script in rush and stage(task? or other special meaning word) in heft. i.e phase commands is a feature of bulkScript, or even other xxxScript. If we need make script uses heft phase-able, it can be implemented in heft side.
Things seems be easier. Correct me if my understanding is wrong.
I kind of curious about the difference between builkScript and phaseCommand.
If I remember right, the main idea was that "commands" are something a user invokes from the CLI, whereas "phases" are implementation details of a command.
Technically a bulk command is equivalent to a phased command with only one phase. We considered deprecating bulk commands entirely, however because phases are more complicated to set up, it seemed better to provide a simple "bulk command" to make it easier for developers.
My idea is to avoid phase concept totally and treat ‘phase’ as script in rush and stage(task? or other special meaning word) in heft.
We cannot assume that developers are using Heft. This is why a phase has:
rm -Rf lib && tsc
Heft can be optimized with additional behaviors (for example the proposed IPC protocol that enables a single Heft process to be reused for multiple operations, and special interactions for watch mode). But these will always be modeled as contracts that other toolchains could easily implement. We recommend Heft, but developers should not be forced to use Heft in order to take advantage of these features.
Thanks for your reply, Pete.
We recommend Heft, but developers should not be forced to use Heft in order to take advantage of these features.
This makes sense that a new concept in Rush.js.
Closing this issue out, as the feature has been out for several years at this point.
This is a proposal for "phased" Rush commands. It's based on two design meetings from March 2020 and later in October 2020.
Goal 1: Pipelining
Today
rush build
builds multiple projects essentially by runningnpm run build
separately in each project folder. Where possible (according to the dependency graph), projects can build in parallel. The operation performed on a given project is atomic from Rush's perspective.For example, suppose
npm run build
invokes the following tasks:If project
B
depends on projectA
, it's not necessary forB
to wait for all ofnpm run build
to complete. For example:B
can start compiling as soon asA
finishes compilingB
can start linting as soon asA
finishes compilingB
andA
can start testing as soon as their own compiling completesB
can start docs as soon asB
finishes compiling ANDA
finishes docsThis sort of pipelining will speed up the build, particularly in bottlenecks of the dependency graph, such as a "core library" that many other projects depend upon.
Goal 2: Fine-grained incremental builds
Today
rush build
supports an incremental feature, that will skip building projects if their source files have not changed (based on the package-deps-hash state file). Recently this feature was extended to work with Rush custom commands, with each command maintaining its own state file. This works correctly for purely orthogonal operations, but the incremental analysis does not consider relationships between commands.For example, suppose we define two commands:
rush build
does compiling and linting and docsrush test
does compiling and linting and testing and docsRush would maintain two state files
package-deps_build.json
andpackage-deps_test.json
, but the analysis would not understand thatrush build
has already performed 50% of the work. Even worse, it will not understand whenrush build
has invalidated the incremental state forrush test
.Goal 3: An intuitive CLI syntax for skipping tasks
Today you can define a parameter such as
rush build --no-docs
that would skip the docs task, but the meaning of this flag would be determined by code inside the build scripts. And Rush's incremental build logic has no way to understand that compiling and linting are up-to-date but docs is outdated.Design spec
The multiphase-rush repo sketches out example of the the proposed design for phased builds.
The phase scripts are defined in package.json like this:
project1/package.json
The
_phase:
scripts implement phases that will be globally defined in command-line.json for the monorepo. Unlike Makefile dependencies, our proposed phase graph is a standardized model that every project will conform to. When a new project is moved into the monorepo, its toolchain may need to be adapted to match the phase model. Doing so ensures that the multi-projectrush
CLI syntax will perform meaningful operations for each project. Example:common/config/rush/command-line.json
Example command line
The above configuration defines these multi-project commands:
rush build
invokes shell commands:heft build --lite
(_phase:compile)heft lint
(_phase:lint)doc-tool
(_phase:docs)rush build --production
invokes shell commands:heft build --lite --production
(_phase:compile)heft lint
(_phase:lint)heft test --no-build
(_phase:test)rush test
invokes shell commands:heft build --lite
(_phase:compile)heft lint
(_phase:lint)heft test --no-build
(_phase:test)doc-tool
(_phase:docs)rush test --production
invokes shell commands:heft build --lite --production
(_phase:compile)heft lint
(_phase:lint)heft test --no-build
(_phase:test)doc-tool --production
(_phase:docs)rush my-bulk-command
invokes shell commands:./scripts/my-command.js
(my-bulk-command)rush my-bulk-command --production
invokes shell commands:./scripts/my-command.js --production
(my-bulk-command)It also defines these commands:
rushx build
invokes:heft build
rushx build --some --arbitrary=stuff
invokes:heft build --some --arbitrary=stuff
(rushx
currently passes CLI parameters unfiltered to the underlying script)rushx test
invokes:heft test
rushx docs
invokes:doc-tool
my-bulk-command
invokes:./scripts/my-command.js
clean
invokes:heft clean
Note that
rush test
andrushx test
conceptually perform similar actions, but their implementation is totally different. It's up to the implementation to ensure consistent behavior. We considered conflating their implementation, but that runs into a couple problems: First, Heft's "watch mode" implementation is not generalizable to multiple projects; the planned multi-project watch mode will have a different architecture (that coordinates multiple Heft watch modes). And secondly,rushx
supports a lot of very useful CLI parameters (via Heft) that cannot meaningfully be generalized torush
. Thus for now at least, it seems like the right design to keeprush
andrushx
similar in spirit only, not implementation.