microsoft / rushstack

Monorepo for tools developed by the Rush Stack community
https://rushstack.io/
Other
5.82k stars 594 forks source link

[rush] Support "incrementalBuildAdditionalGlobs" in rush-project.json #3253

Open elliot-nelson opened 2 years ago

elliot-nelson commented 2 years ago

Summary

It would be powerful and convenient to be able to specify additional globs, outside a project's folder, that the project depends on. For example, perhaps a particular project's build is highly dependent on the OS it runs on, and modifications to common/config/azure-pipelines/_variables.yaml should trigger a full rebuild and retest of that project, even if it didn't have any modifications otherwise.

In order to accomplish this today, you'd need to wrap each such file in a dummy project; perhaps pipelines/os-specific-projects/_variables.yaml, then refer to that file from the rest of your pipelines in common/config, and depend on the project in the relevant project.

Details

In theory, it would be relatively simple to add incrementalBuildAdditionalGlobs right next to incrementalBuildIgnoredGlobs in the rush-project.json.

The tricky part is that ProjectChangeAnalyzer#getChangedProjectsAsync, today, first filters changed files into projects by their file path, and then it loads the potentially affected projects' rush-project.json files (if they exist) to do additional filtering. If we truly allowed any project in the repo to depend on any individual file, we'd have no choice but to load every project's rush-project.json and check for this new property, even if there were 500 projects to loop through and only one changed file.

I'm proposing the feature anyway, perhaps there's a more efficient way we could support it. (Could the additionalGlobs live right inside the project definition in rush.json, so it's already loaded? Maybe it could be file-based, where the file common/config/azure-pipelines/_variables.yaml could be listed in common/config/rush/additional-globs.json, with each affected project in an array? Etc.)

Standard questions

Please answer these questions to help us investigate your issue more quickly:

Question Answer
@microsoft/rush globally installed version? latest
rushVersion from rush.json? latest
useWorkspaces from rush.json? yes
Operating system? mac
Would you consider contributing a PR? yes
Node.js version (node -v)? 14.18.0
dmichon-msft commented 2 years ago

@elliot-nelson , I've been considering the idea of having a cache file that contains the full resolved contents of every rush-project.json in the entire repo (and potentially the same for package.json), alongside the Git hashes of the source files from when the cache is created, so that we can streamline the loading of these files during boot. If they haven't changed, we need only read the one cache file, and even if some have changed, only need to read those with different Git hashes. Unfortunately reading in the Git hash state still takes on the order of 800 ms (300 for the index, 500+ for working tree state), which, if not in a state where we need those anyway, might be slower than just reading all of the config files.

If we allowed projects to depend on arbitrary files, we'd need to break the 1:1 mapping currently performed for file -> project and instead do a 1:many mapping, which is feasible, but the performance of that assignment algorithm has pronounced effects on the initialization time of rush build. The optimized 1:1 mapping was something I added a while back (#2455) and reduced initialization time by ~28 seconds in my teams' monorepo.

elliot-nelson commented 2 years ago

@dmichon-msft The more I think about it, the more I dislike that 1:many mapping. In theory such a feature could be used to avoid depending on @acme/shared and instead "depend" on libraries/shared/assets/index.html, which breaks the whole point of having a package-based monorepo.

Maybe instead, the feature could be something like this:

// common/config/rush/ghost-dependencies.json
[
    {
        "projects": [
            "@acme/my-windows-app",
            "@acme/my-macos-app"
        ],
        "fileGlobs": [
            "common/config/azure-pipelines/_variables.yaml",
            "common/config/azure-pipelines/macos/*.yaml"
        ]
    }
]

The idea would be that a list of file globs get grouped up as a "ghost project" and added as dependencies for each listed project, and that no given file can match more than one ghost project (or a real project).

This would give you the dependency structure of my initial proposal above, where I broke the files up into packages, without having to actually go and do that.

elliot-nelson commented 2 years ago

That said, maybe I'm just spending too much avoiding breaking up pipelines into packages instead of just biting the bullet and exploring what that might look like, perhaps by moving common/config/azure-pipelines into a top-level section azure-pipelines, with a bunch of package folders underneath it.