purescript / registry-dev

Development work related to the PureScript Registry
https://github.com/purescript/registry
97 stars 80 forks source link

`compiler-versions` script: Compute supported compiler versions for a single package #632

Closed colinwahl closed 1 year ago

colinwahl commented 1 year ago

This PR is a first step towards #255. It adds a new script, compiler-versions, which will be used to compute the supported compiler versions for packages.

Currently the script can only work on a single package@version that has no dependencies. No attempt is made to modify the Metadata type.

In follow up PRs, I will start to introduce more sophisticated routines to get the supported compiler ranges for all existing packages, and eventually get to modifying the Manifest type.

Note: This currently computes a Range of supported compiler versions. In particular, since we use simple ranges, this means that once the package@version previously compiled for a lower compiler version and stops compiling for a later compiler, we stop checking to see if it becomes supported by a different compiler in the future. In general, I think this is fine, but wanted to call it out explicitly.

I've run this on various versions of prelude - here's an example of the final output:

Found supported compiler versions for prelude@6.0.0: >=0.15.0 <0.15.11
f-f commented 1 year ago

Note: This currently computes a Range of supported compiler versions. In particular, since we use simple ranges, this means that once the package@version previously compiled for a lower compiler version and stops compiling for a later compiler, we stop checking to see if it becomes supported by a different compiler in the future. In general, I think this is fine, but wanted to call it out explicitly.

Is this the behaviour we wish for? If a generally useful package works at some point, then it's broken by some compiler regression that is then fixed, do we not want to consider it compatible with the new series of compilers?

I would say we should do it the other way around (start checking from the most recent compiler versions rather than the older ones) if we want to keep it a single range, or just use a list of Ranges so we can cover everything.

thomashoneyman commented 1 year ago

if we want to keep it a single range, or just use a list of ranges so we can cover everything

This is the big design question for the feature. Do we use single versions, like [ "0.13.0", "0.13.1" ], where every supported compiler is listed? Do we use ranges instead, like ">=0.13.0 <0.13.8"? Do we follow your suggestion and use a list of ranges, like [ ">=0.13.0 <0.13.8", ">=0.13.10 <0.13.12" ]?

The most correct thing to do is to list out every supported version, either via single versions or via ranges. The benefit is that we have accurately captured every supported compiler for every package. But there are downsides:

  1. This isn't compatible with using the solver to figure out what versions of dependencies to download based on what compilers they support. Right now we insert the compiler as a fake dependency into every package version based on what it's known to compile with (for example, prelude@6.0.0 gets a new dependency "purs": ">=0.15.0 <0.15.9"). Then, when solving a new package version, we can find all of its dependencies that are also valid with the indicated compiler. We need to do this to ensure we download dependencies that are compatible with the given compiler too, or else the solver will just download the latest versions regardless of compatibility and then we might get a false compilation failure.

  2. We'd no longer be able to set a cutoff for packages (such as "known not to compile from 0.14.0 onward), so when we release a new compiler we would have to compile every package version in the registry again (~13,000 versions). Realistically, most packages with no dependencies would fail right away (like prelude). Packages with dependencies only get tried if their dependencies indicate they support the compiler in question, so they'd be filtered out by the solver. But we're still looking at and attempting to solve 13,000 package versions, the vast majority of which is redundant work since we know that if a package version stopped compiling at 0.14.0 it's almost certainly not going to start working again at 0.15.9.

  3. (Weak) If we have a list of ranges or versions, then making the metadata files human readable via pretty-printing will cause these arrays to break over multiple lines, adding multiple lines to every version in a metadata file and ballooning the repo size. We might wish then to make metadata files not formatted, and just stringified instead, like how the manifest index is. Or switch to using dodo-printer so we can control this better.

In contrast, using a simple range is less correct — as you said, a compiler regression can cause a package version to falsely record that it doesn't work with some compilers that it does work with. But it is simple, and it allows us to be correct when compiling with the package version's dependencies, and in the vast majority of cases it will be correct too: when a package version stops compiling it almost certainly won't begin compiling again with new releases.

I am happy to use a list of ranges if we can figure out the solver issue. Also, we can migrate this discussion over to #255, as it affects more than this PR.

thomashoneyman commented 1 year ago

Oh — and I agree we should prioritize either a) producing the newest range of supported compilers or b) producing the largest range of supported compilers. For example, if you work with >=0.14.0 <0.15.0 and also >=0.15.4 <0.15.5 I feel like we should select the former range as your more "natural" range.

Maybe we could tweak things when we do the bulk version of this script so that we can see, in the registry today, how many packages would end up with multiple ranges if we chose to go that way. If there are quite a few then maybe we prioritize supporting a list of ranges. If there are very few, then maybe we just use a simple range.

f-f commented 1 year ago

Good analysis - note that I'm not advocating for a list of ranges, I'm just saying that in all the options that we have available to implement this feature (oldest working range, newest working range, list of ranges), then the current one (oldest working range) is the least correct, so we should really do something else.

While #255 is the general ticket for tracking this change, I do think that this is the right place for this discussion, as this simple assumption is the basis for further work.

colinwahl commented 1 year ago

Thank for the discussion - yeah, after reading through it, oldest working range is probably not what we want.

Ideally down the line we'd be able to compute whether or not each compiler version works for each package version - maybe for now I can change this script to just determine the full list of supported compiler versions for a single package, and as we look further into implementing the bulk check we can see if that's viable or if we want to go for a different approach with ranges.

When I first was implementing this script I was thinking it'd be nice to cutoff at some point - but the side effect is the range isn't really "correct". I suppose I was optimizing too early - at least for this script, computing the whole list is fine.

thomashoneyman commented 1 year ago

I think that's the best approach for now, and we can always switch to list-of-ranges or simple range later if need be. At least this gives us the exact versions that are usable.

colinwahl commented 1 year ago

The latest commit changes the script to compute all supported compiler ranges for a single package version. Here's an example of the output:

Found supported compiler versions for prelude@6.0.0: 0.15.0, 0.15.2, 0.15.3, 0.15.4, 0.15.5, 0.15.6, 0.15.7, 0.15.8, 0.15.9, 0.15.10 Found supported compiler versions for prelude@3.0.0: 0.13.0, 0.13.2, 0.13.3, 0.13.4, 0.13.5, 0.13.6, 0.13.8, 0.14.0, 0.14.1, 0.14.2, 0.14.3, 0.14.4, 0.14.5, 0.14.6, 0.14.7, 0.14.8, 0.14.9