[rush] Proposal: Rush mitigations for postinstall failures

octogonz commented 2 years ago

Summary

This is a proposal for two Rush changes to improve the developer experience in a large monorepo where postinstall scripts often fail:

Faster retries by separating the install and postinstall operations, so that postinstall can be retried without redoing pnpm install
Avoid unnecessary postinstalls by exposing PNPM's --ignore-scripts parameter for rush install and rush update. (When is a postinstall "unnecessary"? For example you need to install that package as part of rush update --full, but you aren't going to compile any projects that use it.)

(The term "postinstall" in this proposal includes multiple operations; see clarification below.)

Background

My employer has a large monorepo where postinstall fails unusually often. The root causes:

the monorepo has lots of recently imported projects with different tech stacks and versions, so the number of NPM packages with postinstall is unusually large
we also have multiple PNPM lockfiles, causing the same postinstall to run multiple times
CI jobs run in geolocations with unreliable network connections to USA CDNs -- postinstall scripts often download large binaries without retry mechanisms, and without providing a way to redirect their URL to a geolocated mirror

A full rush install of the monorepo does 3 kinds of work:

PNPM validates the lockfile, and in the case of rush update PNPM calculates an updated lockfile
PNPM downloads tarballs from the NPM registry and extracts them to make node_modules folders
Lastly, PNPM runs the "install", "postinstall", "prepare", etc lifecycle scripts for some NPM package dependencies. Typically these "postinstalls" download assets and/or invoke native compilers.

If step 3 fails, then we need to retry rush install repeatedly, for example to reattempt a failed download, or to find out if apt-get install has fixed a compile failure. Each iteration redoes all three steps from the beginning, which is painful because steps 1+2 are very slow.

What PNPM implements

PNPM already provides a mechanism for separating install+postinstall. It works like this:

# Install everything but skip postinstall
pnpm install --ignore-scripts

# Run the postinstall scripts for packages whose postinstall 
# has not already completed successfully
pnpm rebuild --pending

This "pending" status is tracked in ./node_modules/.modules.yaml in a pendingBuilds field, for example:

. . .
packageManager: pnpm@6.33.0
pendingBuilds:
  - /core-js-pure/3.23.1
  - /es5-ext/0.10.61
  - /node-sass/7.0.1
  - /encoding/0.1.13
  - /node-sass/7.0.0
prunedAt: Tue, 12 Jul 2022 03:27:29 GMT

Proposed Solution

The solution has two parts:

Part A: Faster retries

When you run rush install, it invokes PNPM something like this:

~/.rush/node-v14.20.0/pnpm-6.29.1/node_modules/pnpm/bin/pnpm.cjs \
  install --store ~/code/rushstack/common/temp/pnpm-store \
  --frozen-lockfile \
  --reporter ndjson \
  --strict-peer-dependencies \
  --recursive \
  --link-workspace-packages false

Let's ALWAYS add --ignore-scripts to the above command line, and call that Stage 1. After it completes, we run Stage 2 which is pnpm rebuild --pending. This way if a postinstall script fails, then Rush can skip Stage 1 and go straight to Stage 2.

To be clear, if you invoke rush install or rush update without --ignore-scripts, we're proposing that PNPM will still be invoked with --ignore-scripts:

rush install

# Stage 1
pnpm install --ignore-scripts

# Stage 2
pnpm rebuild --pending

rush install --ignore scripts

# Stage 1
pnpm install --ignore-scripts

# (Stage 2 is skipped)

Part B: Avoid unnecessary postinstalls

Suppose that project X depends on a package D whose postinstall would fail on my computer. I don't want to fix this failure. (For example maybe the fix involves compiling Python 3 for my old Debian 9 box, which is pointless if my local work won't use this NPM package, or if I can rely on a CI job to build that project.)

If I want to work on project Y that does NOT depend on D, then a subset install will avoid the problem: rush install --to Y
However as part of working on project X, maybe I need to run rush update --full for the entire monorepo. This will need to install D. But the postinstall for D is NOT relevant to my work, because this operation only cares about versions/lockfiles. I will not compile project Y.

Let's expose --ignore-scripts to rush update:

# Regenerate the lockfile for the entire monorepo,
# package D gets installed but not postinstalled
rush update --full --ignore-scripts

# Perform postinstall for dependencies of Y so that I
# can compile Y.  (But do NOT postinstall D because I
# will not compile X.)
rush install --to Y

Design considerations

In a world where postinstall always succeeds, deferring pnpm rebuild to Stage 2 might be slightly slower, because it cannot run in parallel with other downloads. (Does PNPM actually parallelize that?) Maybe we should provide a setting to opt-in to this two-stage install? Something like robustPostinstalls=true
What about NPM and Yarn? It seems that neither Yarn Classic nor NPM implement the pnpm rebuild --pending functionality. Yarn Plug'n'Play does implement it, but is not yet supported by Rush. After some consideration, we're okay with this being a PNPM-only feature for now. This proposal doesn't take away any functionality for Yarn/NPM nor does it enable any new scenarios; it merely provides a tool to improve reliability of existing operations.
How does rush install know whether postinstall has run already? We could store this state in a file similar to last-install.flag. We could also parse ./node_modules/.modules.yaml and check whether pendingBuilds is empty, but it would be better to avoid reliance on PNPM internals.
How does rush install --to X know whether postinstall has run for the subset of dependencies needed by X? A few possibilities: a. Rush could analyze the transitive dependencies of X. b. Rush could spawn pnpm build --pending --filter ...X every time? This will regress the execution time for an already up to date rush install --to X, but maybe that is negligible? c. We could disable the two-stage installation in the case of subset installs.

CC @chengcyber

octogonz commented 2 years ago

Clarification: In this proposal, the term "postinstall" used as a general term for invoking several different lifecycle scripts (preinstall, install, postinstall, prepublish, prepare) and also recreating the node_modules/.bin scripts.

iclanton commented 2 years ago

This seems useful to me, but wouldn't Part B be better handled by rush install --to <project you're working on> to avoid installing the problematic dependency at all?

octogonz commented 2 years ago

This seems useful to me, but wouldn't Part B be better handled by rush install --to <project you're working on> to avoid installing the problematic dependency at all?

@iclanton In the example, the person cannot solve it with rush install. Here's a more concrete scenario:

I need to upgrade the jest-canvas-mock dependency for project Y because it has a new API that I need for my unit tests
But ensureConsistent versions requires me to upgrade all the other projects in the monorepo to have a consistent version.

rush update fails because the postinstall failed for optipng-bin used by project X. The failing code looks like:

optipng-bin/lib/install.js

try {
      // From https://sourceforge.net/projects/optipng/files/OptiPNG/
      await binBuild.file(path.resolve(__dirname, '../vendor/source/optipng.tar.gz'), [
              `./configure --with-system-zlib --prefix="${bin.dest()}" --bindir="${bin.dest()}"`,🤦‍♂️
          'make install'🤦‍♂️
      ]);

      console.log('optipng built successfully');
} catch (error) {

It could take 30 minutes to track down the right prerequisites to make that succeed on my VM. But that's irrelevant to my work -- an upgrade of jest-canvas-mock is unlikely to break project Y, and if it did happen we'll find out when the CI job builds it.

Thus, I need rush update to regenerate the lockfile, whereas the postinstall of optipng-bin is irrelevant.

(Ideally we need to eliminate optipng-bin/lib/install.js from our build, but... one step at a time.🙃 )

microsoft / rushstack