npm / rfcs

Public change requests/proposals & ideation
Other
726 stars 238 forks source link

[RRFC] Topological Sort #707

Open doug-wade opened 1 year ago

doug-wade commented 1 year ago

Motivation ("The Why")

I got this idea from a github issue.

Example

I am proposing that we add a sorting algorithm for ordering the graph traversal of npm commands that operate on the dependency graph express by a workspace.

A fully configured example in package.json might be as follows

{
  "workspaces": [
    "workspaces/transitive-dependency",
    "workspaces/dependency",
    "workspaces/dependent"
  ],
  "workspaces-sort": {
    "algorithm": "topological-parallel",
    "include": [
      "dependencies",
      "bundleDependencies",
      "devDependencies",
      "optionalDependencies",
      "peerDependencies"
    ]
  }
}

And a user might use it at the command-line as follows

npm rn test --workspaces --workspaces-sort-algorithm=topological-parallel --workspaces-sort-include=dev,bundle,optional,peeru

How

Current Behaviour

Currently, developers maintain a valid sort order manually in package.json, in the form of an ordered array in the workspaces key.

Desired Behaviour

Developers could list workspaces in any order, while still operating on packages in the order required for correctness.

References

doug-wade commented 1 year ago

I'm leaning pretty heavily on the draft I have written, rather than putting extensive detail in this issue, but if it makes it easier, I can certainly spend some time with the template below to beef it up.

ruyadorno commented 1 year ago

Hi @doug-wade, great stuff! While it's exciting seeing new efforts to move things forward, I'd like to provide some historical background in the hope that your attempt can be successful here.

Your draft would need to address the concerns described in https://github.com/npm/cli/issues/3034#issuecomment-885290017 in the past I tried to get topological sorting working for lifecycle scripts (ref: https://github.com/npm/arborist/pull/303) and I stumbled upon the issues described in Isaac's comment.

I'd also like to point a related (and very recent RRFC, ref: https://github.com/npm/rfcs/issues/706) that sounds to me a little more actionable at tackling the running-multiple-interdependent-scripts-in-a-project challenge.

trusktr commented 11 months ago

I think we should limit scope, and not let those issues stop something that is great for most people (already proven by Lerna, Yarn, Pnpm, and others).

We can limit this feature to workspaces commands (hence this does not include install, prepare, or any lifecycle script, those work as-is), as that what most people are going to use it for, in mono repos and super modules.

trusktr commented 11 months ago

I'd also like to point a related (and very recent RRFC, ref: #706) that sounds to me a little more actionable at tackling the running-multiple-interdependent-scripts-in-a-project challenge.

Upon taking a closer look, wireit does not seem to provide the a solution for topologically-ordered scripts across workspaces (unless I missed it (cc @justinfagnani)), and Wireit only allows explicitly-defined script dependencies.

Both Wireit, and topological workspace script running, would be complementary:

Wireit allows manually specifying cross-workspace script dependencies, which is in a way similar to specifying a list of workspaces in package.json and relying on that list for the order of execution, but from what I can tell it doesn't include automatic topological sorting.

If npm had topological sorting, then I think that wireit's cross-package dependency config would not be needed in average cases as with build scripts, although could still be handy sometimes (like one package's build script depends on some other script in a way that topological sorting would not be able to determine, hence could be a nice fallback).

Additionally, if topological ordering and wireit are both adopted, then the topological ordering should also take into consideration manually-specified cross-package dependencies and not run them more than once (in case one build script is defined to depend on another package's build script, but running the topological workspaces command would already run the other package's build script first, we would not want the dependent package to run the other package's build script a second time because we know the topological ordering already handled it). @doug-wade maybe this should be mentioned in the topological RFC too.


The downside of having to explicitly define script dependencies across packages is that if we change packages, we have to remember to update the config. This is also awkward if our packages are git submodules that, when inspected on their own, contain relative paths to outside of the repo to places that may or may not exist. When cloning such project on its own, the config will fail.


Possibly for some inspiration, also check out ultra-runner by @folke, the output and concept is pretty dang sweet: