purescript / registry-dev

Development work related to the PureScript Registry
https://github.com/purescript/registry
95 stars 80 forks source link

Prune unused dependencies from manifests generated from spago.dhall files #667

Closed thomashoneyman closed 9 months ago

thomashoneyman commented 9 months ago

Fixes #662. The registry API will now detect unused dependencies in manifests generated from spago.dhall files. Those dependencies will be removed from the manifest before it is written to disk and we kick off the publishing process.

The general problem is that spago.dhall files frequently contain test dependencies and we have no way to distinguish a test from a non-test dependency from the config file alone. The main reason why these dependencies are mixed together is that spago.dhall files have historically been internal-only: the package set repository contained the "correct" list of source-only dependencies (though this is frequently incorrect, too). When we import a legacy package and it contains a spago.dhall file then we just take its dependencies at face value, but the result is over-constrained package dependencies as seen in #662.

This PR prunes unused dependencies from the manifests generated from legacy packages. Specifically, we prune unused dependencies any time we use the fetchLegacyManifest function, because that creates a manifest from the aggregate of bower.json, spago.dhall, and package-sets files.

The pruning process is as follows. As soon as we have the project source code and its generated manifest, we:

  1. Determine all of the project's dependencies (including transitive) by running the solver on the generated manifest.
  2. Download all dependencies so we have access to their source code
  3. Provide the dependency directory and source globs to purs graph (only usable from 0.13.8 onward, FYI) to determine, for any given module name, its path and the modules it directly depends on.
  4. Associate each module in the graph with the package it belongs to by parsing the path segment — we can do this because packages are installed in a standard form, ie. <tmp>/<package-name>-<version>/...
  5. Extract all modules that belong to the package source code (ie. those that have a path which begins with the package source directory) from the graph
  6. For each module in the package source code, determine its full transitive module dependencies, and merge all of them together into a set of module names reachable from the package source. Then, look up the package names these modules belong to, and merge that together to produce the set of package names containing a module reachable by the project source.

The result is a Set PackageName representing packages that are actually in use. We can then walk through the generated manifest checking each dependency to see if it is in the "used packages" set; if not, then it is removed from the manifest.

Finally, we write that updated manifest to disk and proceed with publishing the package. Publishing the package involves re-solving the manifest, downloading dependencies, and compiling, so we can be sure that the adjusted package indeed works. (This doesn't affect packages with a purs.json or spago.yaml file — they are solved only once, since we don't have to prune unused dependencies from them).

Internal Notes