microsoft / rushstack

Monorepo for tools developed by the Rush Stack community
https://rushstack.io/
Other
5.9k stars 595 forks source link

[rush] multi-project watch feature #1202

Open wbern opened 5 years ago

wbern commented 5 years ago

FEATURE: Repo & Project watch support

Let Rush monitor file system changes and run incremental builds as a result of the changes, either by cold booting project build scripts or, if specified and supported, by sending ipc messages to forever running build:watch project script instances to recompile project files.

For illustrations regarding feature implementation, see here: https://docs.google.com/presentation/d/1z81JcBk_OiPhT3YHQDCbfTLUPuKdQtfIAWIB4M8zsTg/edit?usp=drivesdk

This new watch functionality is intended for fast-paced, scalable, local development inside a monorepo.

Feature 1: repo/rush watch support - Allow re-running incremental rush builds when the file system changes (rush build --watch)

Relevant issue: https://github.com/Microsoft/web-build-tools/issues/1122

“Rush offers incremental builds via rush build. If needed, users can provide parameter rush build --watch.

This will start a rush build instance, but after initial build, it will never stop. Instead, it monitors the file system for changes, and reruns itself when that happens. To take into account upstream and downstream building and other options, the triggered builds are just like a regular rush build.”

Feature 2: project watch support - Allow watch instances of build scripts to execute under Rush, while not blocking Rush’s execution, using IPC. (rush build --use-ipc [project-name])

Relevant issue (sending processes to background): https://github.com/Microsoft/web-build-tools/issues/1151

“Rush offering support for project build scripts to continue running, essentially not ending the rush build script unless terminated manually”

Incremental builds are expected to finish running when they are done. In order to keep a project running after it is done, a separate parameter is needed: rush build --use-ipc [project-a, project-b | all (implicit)]. When supplied, a separate build script will need to be defined in rush.json that says "this is the script to run if I'm executed as a never-ending instance". Some requirements will exist in order to allow project scripts run with ipc enabled.

An additional parameter that can be useful in this case is rush build (--ipc-instances myproject) --build-first myproject

Regarding both features

What's important to mention, is that previously described rush watch (file system) functionality and the project watch support, is that enabling one of them does not implicitly enable the other. You can use rush watch without having a single project ipc instance, and you can run rush build once and have an ipc instance keep running via project watch support.

This is important, because some ipc instances of projects may not be directly relevant to build outputs. Someone may want to start a web server via this method for example. If you want to run a web server first, before building a single project, you can then utilize the --build-first parameter to put it at the top of the execution order. If the package that is being built first relies on another package as well, that package will implicitly be built before it as well. This is unclear to me if this will collide with other, existing rush functionality.

If the user would like the project watch instances to be blind to file changes to save performance, and instead relying on rush watch to inform when changes have been made to the project watch instance, this is one of the points where we'd need the rush watch functionality to make use of the ipc process to inform of file system changes, provided that the user has configured the project properly to utilize it.

For careful consideration

To carefully consider when combining the two features in the use case of configuring build tools like webpack, etc. If we have a rush build --watch process running (aka rush watch support), it would make sense to want to utilize that file system watching process to notify underlying project build instances to build based on its findings, allowing for a more scalable solution.

This would imply that some configuration will be needed alongside the project build script’s tool (eg. webpack, browserify, parcel, etc).

Ideally, at least in the case of webpack, you’d want to configure the build tool to run in watch mode (keep state in memory and don’t exit), but don’t monitor for file changes, instead, inject a plugin that will notify the build tool when to begin compiling, and at the same time will be able to notify back to rush that the build tool has entered a compiled state.

Future possible enhancements

Basically this will be a game changer for monorepo management, and when the initial implementation is put in place, various improvements could be made.

• For example, a rush build --ipc parameter could trigger an inquirer-(npm package)esque prompt asking which projects to run via ipc. That way, users can conveniently control their build process for the feature they'll be working on today.

• It could also be easy to create a small rush ipc library that helps users to interact with rush instead of using raw nodejs code, which may intimidate some.

• If we want to support sharded builds in this measure, it would be possible to replace the ipc solution offered by nodejs with another one that communicates over say TCP and is better at dealing with network traffic.

poelstra commented 5 years ago

This looks nice!

I like the IPC mechanism too, where we can leave the dependency watching to rush. This would solve needless rebuilding of dependents while the 'source' is still updating its watch.

I wonder whether this could be an automatic mechanism, though:

I.e. when rush watch or rush build --watch is executed, it will look for an 'IPC version' of the build task (e.g. a build-watch script in package.json, can be configurable) and execute that. If not, it will fall back to a running the normal build script.

It would be extra nice if e.g. @microsoft/rush-stack-compiler would natively support such a task.

Furthermore, I was wondering whether an alternative IPC mechanism could simply be stdin/stdout triggering. I.e. rush sends a line on stdin to the watch process to initiate a rebuild, and looks for a certain marker on stdout/stderr to determine whether the watch cycle completed.

wbern commented 5 years ago

@poelstra

I.e. when rush watch or rush build --watch is executed, it will look for an 'IPC version' of the build task (e.g. a build-watch script in package.json, can be configurable) and execute that. If not, it will fall back to a running the normal build script.

I like the idea of less setup. What I want to make sure of is that the method of ipc instances and rush build --watch don't overshadow other ways of usage. Someone might have configuration in place for ipc instances for development builds for example, but wants to utilize rush watch for their production builds (maybe hotfixing some production issue with minification). The ipc instances also undeniably consume more ram, so if you've configured ipc for all your projects, running them all by default may not be ideal.

However reflecting on convenience, perhaps this syntax could work?

rush build --watch --ipc?

  1. --ipc is shorter than --use-ipc, and kind of the same implication anyway.
  2. Not specifying any project implicitly means, run all projects. I had planned for this to spawn a prompt asking which projects to run as ipc, but perhaps this is a better standard use case? Alternatively, we could have --ipc * (I like globs) , or --ipc all.
poelstra commented 5 years ago

The ipc instances also undeniably consume more ram, so if you've configured ipc for all your projects, running them all by default may not be ideal.

Good point.

I was thinking a bit more about this, and it seems that there are basically three possible kinds of build for each project:

Intuitively, it appears that rush rebuild corresponds to the first: we want to rebuild all packages, and most likely first clean the existing output for each. This corresponds with the current 'standard' of having "build": "gulp --clean".

However, we don't really have commands for incremental or watch yet.

rush build is somewhat of a hybrid, because it does sort of an incremental build of the whole monorepo (by skipping unchanged packages), but it does so by doing a 'clean + build' step on each package.

Therefore, my proposal would be to have a command like rush incremental or rush build --incremental or something that would look for a package.json script called incremental if it exists, and otherwise falls back to build. A simple script would be "incremental": "gulp" (i.e. without the --clean), or "incremental": "gulp incremental", which could call e.g. TS3.4 tsc --incremental.

We could then have rush build --watch and rush incremental --watch (or rush build --incremental --watch) to basically have these approaches being called in a loop.

Optionally indeed allowing to specify a glob to specify which projects to watch for file changes, as you mentioned. (Or possibly, this could automatically be inferred from using --to and/or --from.)

Then for the 'real' watch mode, that would indeed require an IPC mechanism. I'm not sure we'd want to expose that 'implementation detail' to end-users, so I'm not a big fan of calling it --ipc, but yeah, there probably needs to be a way to specify you'd want to have these projects in 'real' watch mode, not just 'incremental' mode. I'd also prefer it to be a switch, not an interactive question, as that's easier to e.g. launch stuff from a VSCode build task.

wbern commented 5 years ago

Rush checks for incremental builds using the package-deps.json file. Perhaps the incremental flag doesn't need to execute a separate script, instead keep it with "build"?

Also ipc is an intimidating name, but perhaps rightly so. We don't know all the real use cases for this feature yet, and we can't kid ourselves with that this is not an advanced piece of functionality coming from a normally pretty straightforward tool.

In the future rush may have offered enough abstraction aimed towards 98% of the most common use cases (a webpack plugin that talks with rush for example together with other things). When such a day comes, ipc could be either deprecated or simply referred to as advanced usage not normally recpmmended.

octogonz commented 5 years ago

rush build --watch --ipc?

It seems to me that the IPC capability should be detected from a config file, rather than specified on a command line. When a developer runs rush build --watch in a given repo, they probably don't care about whether it is implemented using IPC or not. They expect it to "just work".

wbern commented 5 years ago

Fair point @octogonz, as long as there's a way to choose not to run ipc for some projects?. If I have one project that starts a web server for example that I don't need at the moment, I should be able to avoid running that project via ipc while still using watch.

timini commented 4 years ago

any updates on this what is a good solution in the mean time?

octogonz commented 4 years ago

We had a big discussion about this yesterday. I will post some notes with the takeaway and next steps.

It took me some time to fully understand @wbern's proposal here and to realize that #1122 and #1151 were subtasks of this one -- I need to read more carefully. :-) His design is fairly close to what we came up with, which is a good sign. The core ideas are captured in slide 5 from his deck, pasted here for convenience:

Capture

wbern commented 4 years ago

I understand that it was quite a lot of text. We should perhaps have had a discussion at the time, but at least it came in handy for your design discussions. 🙂

octogonz commented 3 years ago

Earlier this year the Rush maintainers had a couple meetings about this problem. The solution we came up with is pretty close to what @wbern described above. (Apparently we didn't reply to this issue with an update, so let me do that now.)

The plan was separated into three milestones:

  1. Rush watches for file changes (instead of the build script doing the watching). Centralizing the watching turned out to be architecturally important enough to tackle first. Rush's watch loop will then invoke a non-looping heft build (or equivalent) in the appropriate project folders, in the appropriate order. Generally there is one "endpoint application" (or a small number of them) that hosts the webpack dev server on http://localhost, and thus needs to loop. These "endpoint" projects will be treated specially in milestone 1, with their loops running in parallel with Rush. In this first milestone, they will rebuild by watching for changes under their lib folder, with maybe some hack where Rush touches a file in that folder to force them to rebuild at the right time.

    This roughly corresponds to #1122, except it adds a hack so that webpack dev server can be used.

  2. Rush uses an IPC protocol to communicate with build scripts. This second milestone introduces a Node.js IPC protocol that allows Rush to communicate with a single-project toolchain. The protocol commands talk to a looping build script. Initially there are only two protocol commands: rebuild yourself and terminate yourself. This enables Rush's watcher to manage a pool of build script processes. This way projects can be recompiled as needed without the performance cost of a cold start for heft build. It would also completely eliminate any need for an "endpoint" project to watch for any filesystem changes itself (instead listening for the rebuild yourself message). This second milestone requires the toolchain to implement a Rush-specific IPC contract, and we would use Heft as the reference implementation for this protocol contract.

    This roughly corresponds to #1151, except we added a terminate yourself command. And we proposed that the pool size can be limited. The IPC protocol makes it straightforward for Rush to kill/restart least-recently-used processes to avoid having too many build scripts looping at once.

  3. Skip intermediary projects. Lastly, Microsoft specifically requested a feature for optionally skipping intermediary projects -- only rebuilding projects where source files were actually changed. Although this is theoretically incorrect, they believed it would do the right thing in most developer scenarios. So a little incorrectness would be worth it for a big speed gain.

    For example, suppose I save a change to the Guid.ts file in a core library that every project uses. In a massive monorepo, this may require recompiling 50 intermediary projects, which might take 5 minutes. That's way too long for "watch mode." But in many situations my change probably won't break other projects. (Examples: if I added a new method, or modified an existing method without changing its API signature, or modified a new method that is only used in one place. Even if I know that my change affects 3 other projects, I could manually save a file in each of those 3 projects to make them rebuild, rather than waiting for 50 projects.)

We would implement these milestones one at a time, in the order listed above.

dimfeld commented 3 years ago

For anyone wanting a less-robust solution right now, I created https://github.com/dimfeld/rush-dev-watcher a while back, which can run the dev package script for a package's dependencies and optionally the package itself as well.

It builds a DAG of the dependencies of a package and runs the dev commands in dependency order, using some output parsing to detect when a package has finished building. This command assumes that the dev commands of each package will handle whatever file watching is required.

Free free to take this and customize it for your own needs if you want. Just be warned that it's definitely less generic than the solution described above, and may require some minor customization for your particular build tools.

wbern commented 3 years ago

There's also @telia/rush-select, while we're listing workarounds.

dmichon-msft commented 3 years ago

There's an additional step phase between (1) and (2) in which Rush uses IPC to communicate with the leaf watch process, which still does normal webpack-dev-server or jest watch, but can have both the dependency rebuild and the leaf watcher initialized from the same call to Rush.

octogonz commented 3 years ago

In this chat @scamden asked for Rush to spawn the dev server (heft start) via the Rush command line, which implies that the output of heft start and Rush's watch mode should share a single console. (Whereas in the current implementation, we instruct people to open separate shell windows for these two pieces.)

We could consider this to be a fourth work item (separate from the 3 items called out above):

  1. Rush launches the "endpoint" (e.g. dev server) processes and collates their console output. For example, suppose application A depends on library B which depends on library C. When the developer invokes rush build:watch, Rush would build C, then build B, then invoke heft start for A to launch the Webpack dev server. When Rush detects a change to C, it would temporarily suspend console output from heft start, so that it can display the progress for rebuilding C and B. Then it would resume console output for A, to show Webpack rebuilding the bundle and reloading the web browser. This is similar to what @rushstack/stream-collator already does today for B and C, but with a slight quirk that A doesn't terminate and doesn't close its stream; it merely gets temporarily paused while those other jobs are running.

(This feature does not seem to depend on any of the items 1-3. Maybe it is easy to implement. The first step would be for someone to create a GitHub issue proposing a design for how "endpoint" projects are designated, so that Rush can know to treat them specially. For example, maybe a config file setting or CLI parameter.)

dmichon-msft commented 3 years ago

The easiest option for output routing would be to have rush redirect its build output from the upstream watcher and hand it off to the heft process to integrate into its own console output at its leisure.

scamden commented 3 years ago

@octogonz that sounds great! i'd be happy to take a stab. seems like a simple first look would be to add a flag to the command-line.json config that would indicate the task is non terminating.

is there a format or guide for opening the design issue?

octogonz commented 3 years ago

is there a format or guide for opening the design issue?

No, just create a GitHub issue and structure it however makes sense. Here's a couple recent examples of design proposals (probably more elaborate than what we need for this feature): https://github.com/microsoft/rushstack/issues/2393 https://github.com/microsoft/rushstack/issues/2254

If you're not sure how to approach it, you could try to summarize the problems first, so people can discuss them, then post a proposed design a somewhat later as a comment.

Some more topics to consider:

scamden commented 3 years ago

ok sounds good. i'll try to put something together when i get a few.

scamden commented 3 years ago

ok here's a start! https://github.com/microsoft/rushstack/issues/2582

VanCoding commented 1 year ago

Since handling non-terminating watch processes in task-runners is a common problem, maybe we can come up with a protocol, that can be used by all watch-tasks and task-runners.

Here's a proposal of mine: https://github.com/VanCoding/task-graph-protocol

elliot-nelson commented 1 year ago

Closing this issue as a duplicate of https://github.com/microsoft/rushstack/issues/3181.

(Now that Heft supports full watch mode for any phase of a Rush phased command, the usage described above should be possible -- let's open a new issue if there's a specific use case that's still missing.)

UberMouse commented 1 year ago

How is this possible currently? As far as I'm aware you can't do the multi project watching outlined in this at the rush level, you can't even combine rush phases with heft watch commands because a rush phased command must exit, which a command running in watch mode does not.

Am I missing something?

elliot-nelson commented 1 year ago

@D4N14L What do you think of the question above ☝️?

It's possible I've closed this issue prematurely, and the new heft watch phases are a step towards a true multi-project watch but doesn't actually deliver it without some additional changes in Rush to match them?

dmichon-msft commented 1 year ago

This has been on my backlog for a while, though tends to get delayed since watch mode per-project builds usually take shortcuts that make them not quite match the output of, e.g. rush build --watch. The main work items that need to happen for this are:

07akioni commented 10 months ago

I wonder what if I just want to run tsc --watch in all upstream projects? Seems rush will do full compiling.

Do I need rush watch to implement it, or does rush watch fits it?

If not what's the proper way?

CleanShot 2023-12-04 at 18 16 10@2x

VanCoding commented 10 months ago

It is not possible to implement this without the support from build tools. If the running build-tool process doesn't have some kind of API over which it can be told to rebuild, the only option is to restart the process.

This is why I tried to propose the following protocol: https://github.com/VanCoding/task-graph-protocol

But it sadly didn't get anywhere, there was not much interest so far.