microsoft / rushstack

Monorepo for tools developed by the Rush Stack community
https://rushstack.io/
Other
5.82k stars 594 forks source link

[rush] Proposal: Rush mitigations for postinstall failures #3530

Open octogonz opened 2 years ago

octogonz commented 2 years ago

Summary

This is a proposal for two Rush changes to improve the developer experience in a large monorepo where postinstall scripts often fail:

  1. Faster retries by separating the install and postinstall operations, so that postinstall can be retried without redoing pnpm install
  2. Avoid unnecessary postinstalls by exposing PNPM's --ignore-scripts parameter for rush install and rush update. (When is a postinstall "unnecessary"? For example you need to install that package as part of rush update --full, but you aren't going to compile any projects that use it.)

(The term "postinstall" in this proposal includes multiple operations; see clarification below.)

Background

My employer has a large monorepo where postinstall fails unusually often. The root causes:

A full rush install of the monorepo does 3 kinds of work:

  1. PNPM validates the lockfile, and in the case of rush update PNPM calculates an updated lockfile
  2. PNPM downloads tarballs from the NPM registry and extracts them to make node_modules folders
  3. Lastly, PNPM runs the "install", "postinstall", "prepare", etc lifecycle scripts for some NPM package dependencies. Typically these "postinstalls" download assets and/or invoke native compilers.

If step 3 fails, then we need to retry rush install repeatedly, for example to reattempt a failed download, or to find out if apt-get install has fixed a compile failure. Each iteration redoes all three steps from the beginning, which is painful because steps 1+2 are very slow.

What PNPM implements

PNPM already provides a mechanism for separating install+postinstall. It works like this:

# Install everything but skip postinstall
pnpm install --ignore-scripts

# Run the postinstall scripts for packages whose postinstall 
# has not already completed successfully
pnpm rebuild --pending

This "pending" status is tracked in ./node_modules/.modules.yaml in a pendingBuilds field, for example:

. . .
packageManager: pnpm@6.33.0
pendingBuilds:
  - /core-js-pure/3.23.1
  - /es5-ext/0.10.61
  - /node-sass/7.0.1
  - /encoding/0.1.13
  - /node-sass/7.0.0
prunedAt: Tue, 12 Jul 2022 03:27:29 GMT

Proposed Solution

The solution has two parts:

Part A: Faster retries

When you run rush install, it invokes PNPM something like this:

~/.rush/node-v14.20.0/pnpm-6.29.1/node_modules/pnpm/bin/pnpm.cjs \
  install --store ~/code/rushstack/common/temp/pnpm-store \
  --frozen-lockfile \
  --reporter ndjson \
  --strict-peer-dependencies \
  --recursive \
  --link-workspace-packages false

Let's ALWAYS add --ignore-scripts to the above command line, and call that Stage 1. After it completes, we run Stage 2 which is pnpm rebuild --pending. This way if a postinstall script fails, then Rush can skip Stage 1 and go straight to Stage 2.

To be clear, if you invoke rush install or rush update without --ignore-scripts, we're proposing that PNPM will still be invoked with --ignore-scripts:

rush install

# Stage 1
pnpm install --ignore-scripts

# Stage 2
pnpm rebuild --pending

rush install --ignore scripts

# Stage 1
pnpm install --ignore-scripts

# (Stage 2 is skipped)

Part B: Avoid unnecessary postinstalls

Suppose that project X depends on a package D whose postinstall would fail on my computer. I don't want to fix this failure. (For example maybe the fix involves compiling Python 3 for my old Debian 9 box, which is pointless if my local work won't use this NPM package, or if I can rely on a CI job to build that project.)

  1. If I want to work on project Y that does NOT depend on D, then a subset install will avoid the problem: rush install --to Y
  2. However as part of working on project X, maybe I need to run rush update --full for the entire monorepo. This will need to install D. But the postinstall for D is NOT relevant to my work, because this operation only cares about versions/lockfiles. I will not compile project Y.

Let's expose --ignore-scripts to rush update:

# Regenerate the lockfile for the entire monorepo,
# package D gets installed but not postinstalled
rush update --full --ignore-scripts

# Perform postinstall for dependencies of Y so that I
# can compile Y.  (But do NOT postinstall D because I
# will not compile X.)
rush install --to Y

Design considerations

  1. In a world where postinstall always succeeds, deferring pnpm rebuild to Stage 2 might be slightly slower, because it cannot run in parallel with other downloads. (Does PNPM actually parallelize that?) Maybe we should provide a setting to opt-in to this two-stage install? Something like robustPostinstalls=true

  2. What about NPM and Yarn? It seems that neither Yarn Classic nor NPM implement the pnpm rebuild --pending functionality. Yarn Plug'n'Play does implement it, but is not yet supported by Rush. After some consideration, we're okay with this being a PNPM-only feature for now. This proposal doesn't take away any functionality for Yarn/NPM nor does it enable any new scenarios; it merely provides a tool to improve reliability of existing operations.

  3. How does rush install know whether postinstall has run already? We could store this state in a file similar to last-install.flag. We could also parse ./node_modules/.modules.yaml and check whether pendingBuilds is empty, but it would be better to avoid reliance on PNPM internals.

  4. How does rush install --to X know whether postinstall has run for the subset of dependencies needed by X? A few possibilities: a. Rush could analyze the transitive dependencies of X. b. Rush could spawn pnpm build --pending --filter ...X every time? This will regress the execution time for an already up to date rush install --to X, but maybe that is negligible? c. We could disable the two-stage installation in the case of subset installs.

CC @chengcyber

octogonz commented 2 years ago

Clarification: In this proposal, the term "postinstall" used as a general term for invoking several different lifecycle scripts (preinstall, install, postinstall, prepublish, prepare) and also recreating the node_modules/.bin scripts.

iclanton commented 2 years ago

This seems useful to me, but wouldn't Part B be better handled by rush install --to <project you're working on> to avoid installing the problematic dependency at all?

octogonz commented 2 years ago

This seems useful to me, but wouldn't Part B be better handled by rush install --to <project you're working on> to avoid installing the problematic dependency at all?

@iclanton In the example, the person cannot solve it with rush install. Here's a more concrete scenario:

  1. I need to upgrade the jest-canvas-mock dependency for project Y because it has a new API that I need for my unit tests
  2. But ensureConsistent versions requires me to upgrade all the other projects in the monorepo to have a consistent version.
  3. rush update fails because the postinstall failed for optipng-bin used by project X. The failing code looks like:

    optipng-bin/lib/install.js

    try {
          // From https://sourceforge.net/projects/optipng/files/OptiPNG/
          await binBuild.file(path.resolve(__dirname, '../vendor/source/optipng.tar.gz'), [
                  `./configure --with-system-zlib --prefix="${bin.dest()}" --bindir="${bin.dest()}"`,🤦‍♂️
              'make install'🤦‍♂️
          ]);
    
          console.log('optipng built successfully');
    } catch (error) {

    It could take 30 minutes to track down the right prerequisites to make that succeed on my VM. But that's irrelevant to my work -- an upgrade of jest-canvas-mock is unlikely to break project Y, and if it did happen we'll find out when the CI job builds it.

Thus, I need rush update to regenerate the lockfile, whereas the postinstall of optipng-bin is irrelevant.

(Ideally we need to eliminate optipng-bin/lib/install.js from our build, but... one step at a time.🙃 )