npm / cli

the package manager for JavaScript
https://docs.npmjs.com/cli/
Other
8.48k stars 3.16k forks source link

[BUG] Prepare scripts in workspaces should wait for dependencies #3034

Closed timoxley closed 3 years ago

timoxley commented 3 years ago

Current Behaviour:

It seems that prepare scripts are now run in parallel, which means that if:

Then A's prepare script will fail because B's has not completed.

If I switch the package.json scripts from prepare to prepublish then it seems to work without a problem.

Is there an alternative lifecycle script to the deprecated prepublish?

This may be an arborist issue.

Expected Behaviour:

npm install should wait for dependent's prepare scripts to complete before running.

Steps To Reproduce:

I've tried to capture a reproduction here: https://github.com/timoxley/npm7-prepare-issue

git clone git@github.com:timoxley/npm7-prepare-issue.git
cd npm7-prepare-issue
npm install # this will break
npm install # works on second run
# run this to clean up and try again
npm run clean

Environment:

wraithgar commented 3 years ago

All of the lifecycle scripts have a pre and post event that runs before and after the main scripts. Their reason for existing is cases like this. They admittedly only work for very simple cases like this where there are only two packages, and having one go before the other solves it. More complex setups would require a whole new spec.

So, to be clear, before prepare scripts are run any preprepare scripts are run, and will have finished by the time prepare is running. postprepare scripts will then run and all prepare scripts will have finished by the time postprepare scripts have started.

wraithgar commented 3 years ago

If you would like to propose a more formal solution for this problem that allows for more complex situations to be addressed, the RFC process is a good place to suggest something: https://github.com/npm/rfcs

timoxley commented 3 years ago

Preprepare does not solve my actual problem because I have 6 packages in my workspace and a nesting depth of 3.

I can make a rfc but I feel like it's more of a bug than a new feature, npm should be running the scripts for child dependencies before the parents.

timoxley commented 3 years ago

@wraithgar Would like to reiterate this really is a bug and currently I believe this makes it impossible to use typescript in a workspace since you can't ensure a package is actually built before something tries to use it.

wraithgar commented 3 years ago

Sorry, didn't see that this was specifically related to workspaces when I replied.

npm install does not currently support workspaces. It is being worked on right now, and prepare script ordering will be something that is handled correctly when it does ship.

wraithgar commented 3 years ago

You can follow the progress of adding workspace support to all of arborist's reify commands in the cli here

timoxley commented 3 years ago

npm install does not currently support workspaces.

@wraithgar What does that mean? It's in the docs: https://docs.npmjs.com/cli/v7/using-npm/workspaces#installing-workspaces

wraithgar commented 3 years ago

You're right, looks like install is already pulling in workspaces, which is different than the enable workspaces context work being added to install. Sorry for the confusion. We're on the bleeding edge here of a new thing currently being implemented, some of the terminology is overlapping.

This is already on our radar but I'll reopen this so it's being explicitly represented here in this issue too.

isaacs commented 3 years ago

In the short term, you can force serialized operation by doing npm install --foreground-scripts.

More generally, this does get a bit tricky in the case where the dependency graph cycles back on itself. For example, a -> b -> c -> a, where all three are workspaces with prepare scripts. So we can't just naively say "wait until all your deps prepares have completed", because it'll deadlock. At least, though, we can say "wait until all your deps prepares have completed, except those that are in the waiting queue already". So in the a->b->c->a case, it would wait on a until b runs, then wait on b until c runs, then c would run immediately, since a is already waiting for it.

isaacs commented 3 years ago

Hm, exploring a bit further, there is really no way to do solve this bootstrapping problem in general that doesn't lead to either a traveling salesman problem or deadlocks.

Consider this dependency graph:

root -> (b)
a -> (b)
b -> (c, x)
c -> (d)
d -> (e, y)
e -> (a)
x -> (e, y)
y -> (c)

Let's say that:

  1. x, y, a, and e have prepare scripts
  2. all modules with prepare scripts require() their dependencies at least once within the prepare script and within the main module.
  3. all modules with prepare scripts cannot be loaded until their prepare scripts are run.

This is unresolvable. Picking any arbitrary point in the graph, we end up with a failure for at least 2 reasons:

Furthermore, even if we say we just won't handle cycles in this way, sorting by any combination of dependency relationships or depth within the dependency graph as a heuristic doesn't really work either. Take for example the c node. There are multiple routes to it:

When we look at the y node, we find:

So, if we have y and c in the queue to run a prepare script, it's impossible to reliably guess which one should go first. If we had all these paths, we could arbitrarily say we're going to sort them shortest to longest, and then use that, but that's literally the classic example of an NP-hard problem, and will spiral out of control very quickly when we have graphs with thousands of nodes.

Even if we run package scripts in serial like npm v6 did (and which you can do with the --foreground-scripts config option), or lay out the dependency graph maximally nested like npm v1 and v2 did (and which you can do with the --legacy-bundling config option), there's always going to be cases where this just doesn't work.

Simply put, there is no way we can guarantee, or ever could have guaranteed, that a package's dependencies will have their prepare scripts run prior to its own. This was working for you only accidentally, because you happened to have a non-cyclic dependency graph where sorting in alphabetical order happened to avoid the issue. You can still do this with --foreground-scripts, so if that was working before, then that is a workaround for now.

The safest assumption is that lifecycle events happen in a single platonic moment for all packages within the tree, and design your modules accordingly.

pro-wh commented 3 years ago

Thanks for the workaround with --foreground-scripts. I think that's suitable for the cases where I needed ordered prepares. And in the bigger picture, our workspaces will be prepared and uploaded separately, so our downstream users won't need to fiddle with their flags.

But I must say that the fact that cycles can merely be represented has not often been sufficient reason to give up on something like this. for loops in JavaScript exist despite for (;;) {} being expressible. Even require cycles aren't fatal, the quandary not resolved in the style of "every require initially returns an empty object."


This [a best effort ordering based on a traversal] was working for you only accidentally, because you happened to have a non-cyclic dependency graph where sorting in alphabetical order happened to avoid the issue.

I have had a different experience with workspaces, where it's rather not by accident that we have non-cyclic dependency graphs among workspaces.

  • all modules with prepare scripts require() their dependencies at least once within the prepare script and within the main module. ... [leading to analysis of algorithms known to be hard]

It might even be reasonable to suppose that:

and thus anything downstream would be needed, as probably as direct dependencies. I'd be happy to have --foreground-scripts behavior, even if npm didn't especially maximize the direct dependencies' availability.


Using the ordering from --foreground-scripts would be great for projects that have acyclic workspace dependencies; without it, setting up a project such as the one described in the first post from a repository checkout would fail every time.

Ordered prepare for projects with cyclic dependencies seems better too. If there is some permutation that would work, I'd prefer to have npm install either succeed or fail predictably rather than occasionally stumble on one of these sequences of events. Or more sinister still, it could sometimes expose outdated packages during dependents' build process, resulting in build errors that I could have sworn I already fixed.

Doing them all at once is faster for projects with packages that can be prepared independently, but among projects that have set up workspaces, I really don't know how common that is. Doing

The behavior of npm running the prepare scripts from the top level workspace is relatively new, first added in 7.20.0. Right now would be the best opportunity to make ordering like --foreground-scripts the default.