purescript / registry-dev

Development work related to the PureScript Registry
https://github.com/purescript/registry
97 stars 80 forks source link

PureScript Registry Development

deploy tests

This repository hosts the code for the PureScript Registry and its affiliated projects. If you are new to the registry and are interested in its development, then the following should be helpful:

If you are interested in the products of the registry, then you should see:

Finally, as always, package documentation is hosted on Pursuit.

Below, you can see the original RFC for a PureScript registry as it was written once the Bower registry shut down.


Problem at hand

PureScript needs a way to distribute packages. We used to rely on the Bower registry for that, but this is not possible anymore.

Goals for a PureScript package registry

Here's a non-comprehensive list of desiderable properties/goals that we'd generally like to achieve when talking of a PureScript registry, and that we think this design covers:

This section has been informed by happenings, discussions, and various other readings on the internet, such as this one and this one.

Non-goals of this design

Things we do not aim to achieve with this design:

Proposed design: "Just a GitHub Repo"

Two main ideas here:

This repo will contain:

All of the above is about metadata, while the real data (i.e. the package tarballs) will live on various storage backends.

The Package Manifest

A Manifest stores all the metadata (e.g. package name, version, dependencies, etc) for a specific version of a specific package.

Packages are expected to version in their sources a purs.json file, that conforms to the Manifest schema to ensure forwards-compatibility with future schemas. This means that new clients will be able to read old schemas, but not vice-versa. And the reason why forward (rather than backwards) compatibility is needed is because package manifests are baked in the (immutable) package tarballs forever, which means that any client (especially old ones) should always be able to read that manifest.

This means that the only changes allowed to the schema are:

For more info about the different kinds of schema compatibility, see here

All the pre-registry packages will be grandfathered in, see here for details. You can find some examples of the Manifest that has been generated for them in the examples folder.

Note: the Location schema includes support for packages that are not published from the root of the repository, by supplying the (optional) subdir field. This means that a repository could potentially host several packages (commonly called a "monorepo").

Registry Versions & Version Ranges

The PureScript registry allows packages to specify a version for themselves and version ranges for their dependencies.

We use a restricted version of the SemVer spec which only allows versions with major, minor, and patch places (no build metadata or prerelease identifiers) and version ranges with the >= and < operators.

This decision keeps versions and version ranges easy to read, understand, and maintain over time.

Package Versions

Package versions always take the form X.Y.Z, representing major, minor, and patch places. All three places must be natural numbers. For example, in a manifest file:

{
  "name": "my-package",
  "version": "1.0.1"
}

If a package uses all three places (ie. it begins with a non-zero number, such as 1.0.0), then:

If a package only uses two places (ie. it begins with a zero, such as 0.1.0), then:

If a package uses only one place (ie. it begins with two zeros, such as 0.0.1), then all changes are potentially breaking changes.

Version Ranges

Version ranges are always of the form >=X.Y.Z <X.Y.Z, where both versions must be valid and the first version must be less than the second version.

When comparing versions, the major place takes precedence, then the minor place, and then the patch place. For example:

All dependencies must take this form. For example, in a manifest file:

{
  "name": "my-package",
  "license": "MIT",
  "version": "1.0.1",
  "dependencies": {
    "aff": ">=1.0.0 <2.0.0",
    "prelude": ">=2.1.5 <2.1.6",
    "zmq": ">=0.1.0 <12.19.124"
  }
}

The Registry API

The Registry should support various automated (i.e. no/little human intervention required) operations:

Adding a new package

As package authors the only thing that we need to do in order to have the Registry upload our package is to tell it where to get it.

We can do that by opening an issue containing JSON that conforms to the schema of an Addition.

Note: this operation should be entirely automated by the package manager, and transparent to the user. I.e. package authors shouldn't need to be aware of the inner workings of the Registry in order to publish a package, and they should be able to tell to the package manager "publish this" and be given back either a confirmation of success or failure, or a place to follow updates about the fate of the publishing process.

Implementation detail: how do we "automatically open a GitHub issue" while at the same time not requiring a GitHub authentication token from the users? The idea is that if a package manager wants to avoid doing that then it's possible to generate a URL that the user can navigate to, so that they can preview the issue content before opening it. This is an example of such link.

Once the issue is open, the CI in this repo will:

The CI will post updates in the issue as it runs the various operations, and close the issue once all the above operations have completed correctly.

Once the issue has been closed the package can be considered published.

Publishing a new version of an existing package

It is largely the same process as above, with the main difference being that the body of the created issue will conform to the schema of an Update.

Unpublishing a package/release

Unpublishing a version for a package can be done by creating an issue containing JSON conforming to the schema of an Unpublish.

CI will verify that all the following conditions hold:

If these conditions hold, then CI will:

Unpublishing is allowed for security reasons (e.g. if some package was taken over maliciously), but it's allowed only for a set period of time because of the leftpad problem (i.e. breaking everyone's builds).

Exceptions to this rule are legal concerns (e.g. DMCA takedown requests) for which Trustees might have to remove packages at any time.

Package metadata

Every package will have its own file in the packages folder of this repo.

You can see the schema of this file here, and the main reasons for this file to exist are to track:

Package Sets

As noted in the beginning, Package Sets are a first class citizen of this design.

This repo will be the single source of truth for the package-sets - you can find an example here - from which we'll generate various metadata files to be used by the package manager. Further details are yet to be defined.

Making your own Package Set

While the upstream package sets will only contain packages from the Registry, it is common to have the need to create a custom package set that might contain with packages that are not present in the Registry.

In this case the format in which the extra-Registry packages will depend on what the client accepts.

One of such clients will be Spago, where we'll define an extra-Registry package as:

let Registry = https://raw.githubusercontent.com/purescript/registry/master/v1/Registry.dhall

let SpagoPkg =
      < Repo : { repo : Registry.Location, ref : Text }
      | Local : Registry.Prelude.Location.Type
      >

..that is, an extra-Registry package in Spago could either point to a local path, or a remote repository.

Here's an example of a package set that is exactly like the upstream, except for the effect package, that instead points to some repo from GitHub:

-- We parametrize the upstream package set and the Address type by the package type that our client accepts:
let upstream = https://raw.githubusercontent.com/purescript/registry/master/v1/sets/20200418.dhall SpagoPkg
let Address = Registry.Address SpagoPkg

let overrides =
    { effect = Address.External (SpagoPkg.Repo
        { ref = "v0.0.1"
        , repo = Registry.Repo.GitHub
            { subdir = None Text
            , githubOwner = "someauthor"
            , githubRepo = "somerepo"
            }
        })
    }

in { compiler = upstream.compiler, packages = upstream.packages // overrides }

Registry Trustees

The "Registry Trustees" mentioned all across this document are a group of trusted janitors that have write access to this repo.

Their main task will be that of eventually publish - under very specific conditions - new versions/revisions of packages that will need adjustments.

The reason why this is necessary (vs. only letting the authors publish new versions) is that for version-solving to work in package managers the Registry will need maintenance. This maintenance will ideally be done by package authors, but for a set of reasons authors sometimes become unresponsive.

And the reason why such maintenance needs to happen is because otherwise older versions of packages with bad bounds will still break things, even if newer versions have good bounds. Registries which don’t support revisions will instead support another kind of "mutation" called "yanking", which allows a maintainer to tell a solver not to consider a particular version any more when constructing build plans. You can find a great comparison between the two here illustrating the reason why we support revisions here.

Trustees will have to work under a set of constraints so that their activity will not cause disruption, unreproducible builds, etc. They will also strive to involve maintainers in this process as much as possible while being friendly, helpful, and respectful. In general, trustees aim to empower and educate maintainers about the tools at their disposal to better manage their own packages. Being a part of this curation process is entirely optional and can be opted-out from.

Trustees will try to contact the maintainer of a package for 4 weeks before publishing a new revision, except if the author has opted out from this process, in which case they won't do anything.

Trustees will not change the source of a package, but only its metadata in the Manifest file.

Trustees are allowed to publish new revisions (i.e. versions that bump the pre-release segment from SemVer), to:

Note: there is no API defined yet for this operation.

Name squatting and reassigning names

If you'd like to reuse a package name that has already been taken, you can open an issue in this repo, tagging the current owner (whose username you can find in the package's metadata file).

If no agreement with the current owner has not been found after 4 weeks, then Registry Trustees will address it.

For more details see the policy that NPM has for this, that we will follow when not otherwise specified.

The Package Index

I.e. the answer to the question:

How do I know which dependencies package X at version Y has?

Without an index of all the package manifests you'd have to fetch the right tarball and look at its purs.json.

That might be a lot of work to do at scale, and there are usecases - e.g. for package-sets - where we need to lookup lots of manifests to build the dependency graph. So we'll store all the package manifests in a separate location yet to be defined (it's really an implementation detail and will most likely be just another repository, inspired by the same infrastructure for Rust).

Storage Backends

As noted above, this repository will hold all the metadata for packages, but the actual data - i.e. package tarballs - will be stored somewhere else, and we call each of these locations a "storage backend".

Clients will need to be pointed at place they can store package tarballs from, so here we'll store a mapping between "name of the storage backend" to a function that given (1) a package name and (2) a package version then returns the URL where the tarball for that package version can be fetched.

We maintain the list of all the Storage Backends and the aforementioned mappings here.

We also provide a small utility to demonstrate how to use the mappings.

There can be more than one storage backend at any given time, and it's always possible to add more - in fact this can easily be done by:

Downloading a package

A package manager should download a specific version of a package in the following way:

  1. given "package name" and "version", the URL to fetch the tarball can be computed as described above
  2. fetch the tarball from one of the backends
  3. lookup the SHA256 for that tarball in the package metadata file
  4. verify that the SHA256 of the tarball downloaded in (2) matches the one from (3)
  5. unpack the tarball and use the sources as desired

Note: we are ensuring that the package we download is the same file for all backends because we are storing the SHA256 for every tarball in a separate location from the storage backends (this repo).

Implementation plan

It is paramount that we provide the smoothest migration path that we can achieve with the resources we have. This is because we feel the ecosystem is already close to maturity (at this point breaking changes happen very rarely in practice), and we don't want to unnecessarily mess up with everyone's workflow, especially if it's possible to avoid that with some planning.

So a big chunk of our work is going towards ensuring that Bower packages are gracefully grandfathered into the new system. This basically means that for each of them we will:

What has happened already:

What is happening right now:

What will happen after this:

The Registry CI

All the Registry CI is implemented in PureScript and runs on GitHub Actions. Source can be found in the ci folder, while the workflows folder contains the various CI flows.

Checks on new packages

Yet to be defined: see this issue

Mirroring the Registry

As noted above, "The Registry" is really just:

Mirroring all of this to an alternative location would consist of:

Additionally we could keep some kind of "RSS feed" in this repo with all the notifications from package uploads, so other tools will be able to listen to these events and act on that information.

FAQ

Why not use X instead?

We have of course investigated other registries before rolling one.

Our main requirement is to have "dependency flattening": there should be only one version of every package installed for every build.

All the general-purpose registries (i.e. not very tied to a specific language) that we looked at do not seem to support this.

E.g. it would be possible for us to upload packages to NPM, but installing the packages from there would not work, because NPM might pull multiple versions of every package from there according to the needs of every package.

Why not a webserver like everyone else?

These are the main reasons why we prefer to handle this with git+CI, rather than deploying a separate service:

How do I conform JSON to a Dhall type?

Install dhall, then:

$ cat "your-file.json" | json-to-dhall --records-loose --unions-strict "./YourDhallType.dhall"

Authors of this proposal

This design is authored by @f-f, with suggestions and ideas from:

Development

Setup

Create a .env file based off .env.example and fill in the values for the environment variables:

cp .env.example .env

If you are running scripts in the repository, such as the legacy registry import script, then you may wish to use the provided Nix shell to make sure you have all necessary dependencies available.

$ nix develop