[rush] Proposal: --affected for building only projects affected by a PR diff

mikecann commented 3 years ago

Summary

This was discussed on the following Zulip thread: https://rushstack.zulipchat.com/#narrow/stream/262513-general/topic/partial.20builds.20based.20on.20changed.20files.20in.20CI

It would be great if the rush commands could support "partial builds" depending on what was changed between commits. Nx solves this with its "affected" range of CLI commands, for example: https://nx.dev/latest/react/cli/affected-build

Its touted as one of the biggest selling points of Nx so I think it would be great to get it in Rush too :)

Suggestion

I suggest for rush that a new flag --affected is offered that can be added to a number of commands:

rush build --affected

Will only build projects that have been affected by the current set of changes.

rush mycommand --affected

Will only run "mycommand" against the projects that have been affected by the current set of changes.

This particular one would be really nice because it will allow us to only deploy projects that were affected. So for example if your workspace has a "server" project and a "web" project and you only made changes to "server" then only that should get built and deployed.

Notes

This will have to intelligently work out what "affected" means. Is it changes from the current commit, changes from the last successful CI run? Nx might be worth looking at for inspiration here.
William Bernting has an example implementation for a script that he runs on his CI https://gist.github.com/wbern/2ebbdf3b7df330a9b0743438dfeb6cbc

wbern commented 3 years ago

Thanks for the issue, happy to discuss this more! Here are my thoughts on how to figure out what "--affected" could mean.

My latest approach in the now updated gist you reference, is to figure out added/removed files in Git history compared to: 1.) Commit hash of last successful build in that branch, if exists 2.) .. Or if not available, last ancestor commit from master branch (ie. the latest commit that reflects master branch)

The second one may raise false positives, and "master" is an assumption I'm making here of what branch should be compared with from the current one.

If it's the master branch, I will have to either: 1.) Check if commit hash of last successful build exists (or if master is brand new, or has always been failing since its creation) 2.) Run for all projects.

What these two options have in common is that they take in a commit hash (or no hash at all for the master branch option 2 scenario), so I suggest that you could do, also instead of --affected,

rush build --since [git-commit-sha].

It came to me just now that --since might make more sense, since we're building everything that has changed since that commit.

Edge cases

What if there are files outside the rush projects that should trigger some rush projects to run?
- This could perhaps be handled by the user, for starters.

Note that I haven't looked at Nx!

wbern commented 3 years ago

I'd also like to add that this solution won't work if the deploy step requires an atomic upload of all dist files in the monorepo, then the build cache would help. #2393

This solution is also not performant if you change libraries that all other packages in the monorepo depends on. For that, you'd need the build cache feature for it to work well.

So while on the surface this feature looks like it could replace the build cache feature, I don't think it does!

mikecann commented 3 years ago

@wbern oh I agree, I think both of them can work in tandem. For example read the docs on Nx where it talks about both caching and "affected": https://nx.dev/latest/react/guides/ci/monorepo-affected#caching-and-affected

Affected and caching are used to solve the same problem: minimize the computation. But they do it differently, and the combination provides better results than one or the other.

I dont think the two features would be mutally exclusive.

dmichon-msft commented 3 years ago

PackageChangeAnalyzer can do this if we add the ability to forward a revision specifier via PackageChangeAnalyzer into getPackageDeps. Presumably we skip the git status check if the revision is not HEAD, but then we can use the same machinery as any other incremental build to determine "changed since revision". It'd have to follow the expansion logic of --from to ensure we build a valid set, but that's easy enough.

I think it'd be cleanest to do --since REVISION, where REVISION is any revision specifier that Git can parse in the git ls-tree command. Then determining the target is up to your build pipeline.

elliot-nelson commented 1 year ago

🎉 Closing this issue -- released as scope selector git.

In a PR Build, for example, you select the set of projects modified by the PR:

rush list --only --git:origin/main --json

Or, you could ask it to build and test only the projects affected by the projects touched in this PR:

rush test --from --git origin/main

(etc.)

dmichon-msft commented 1 year ago

🎉 Closing this issue -- released as scope selector git.

Clarification on the syntax: rush list --only git:REVISION rush test --from git:REVISION

where REVISION is a Git revision, e.g. origin/main, main, HEAD~1 (parent commit).

For the common case where PRs generate a merge commit parented off of the target branch, HEAD~1 is usually the correct revision.

e.g. rush list --only git:HEAD~1 will list all projects that were directly modified.

microsoft / rushstack