Introduce sets - Githubissues

zopsicle commented 5 years ago

A nice way to guarantee stability and compatibility is immutable distribution sets.

zopsicle commented 5 years ago

Alright, made an idea:

Requirements

You can't install distribution that is incompatible with the version of Rakudo in use.
You can't install two distributions such that they are not compatible with each other.
Installations never interfere: you can do concurrent installations, you can change Rakudo versions, and everything will keep working (the latter, changing Rakudo version, will cause rebuilds of all distributions automatically).
It is possible to audit dependencies: once you know what you'll get, you'll always get that, and nothing else. If anybody tampered with anything, installation will fail and no foreign code will be executed.
There is a worldwide cache of built packages that contains bytecode and rendered POD. You don't have to compile code if somebody else* already did that, since that would be a waste of resources. This sort of interferes with the point above; you have to trust the cache. Therefore using the cache is optional.
It must be possible to use a set but also use distributions not in the set, or different versions of a distribution than in the set. This will often work, but it is not guaranteed, and at your own risk.

* Typically the CI job that builds distribution sets, see below.

Prior work

Nixpkgs (not to be confused with Nix, which is the tool that Nixpkgs is built on top of) is a similar system that it used for all sorts of packages, typically programs for workstations and servers.

Stackage is a similar system used for Haskell libraries.

cp6t already uses Nixpkgs to get the tools necessary to build Rakudo, such as gcc and glibc, and has so far worked out great. Nix will be used for building distributions, since it satisfies the properties we want: tarballs with hashes, ability to override versions, global cache (e.g. https://cachix.org). In fact I already have some Nix code to build distributions: https://github.com/chloekek/cp6t/blob/master/perl6-on-nix/default.nix.

Sets

There will be the concept of sets. A set is a collection of distributions, each pinned at a specific version. There will be no more than one version of each distribution in the set. The set is also associated with a specific version of Rakudo.

Sets are append-only: once a set is created, new distributions can be added to it, but not deleted from it, and their versions cannot be changed.

New sets are created periodically, with newer versions of distributions.

Each set is stored in the repository as:

A database with all metadata, including name, version, tarball URL, tarball hash, rendered POD, etc.
A .nix file, generated from the database, which has a derivation for each distribution in the set. A derivation (Nix terminology) is a description of how to build something. It's just a Bash script with some metadata, really. The Bash script is executed in a sandbox and must put the output in $out, which will then be available to dependants. The trick here is that $out is a path with a directory name derived from the hash of the derivation and the hashes of all the inputs. This means that if e.g. Rakudo changes, then everything that depends also gets a new hash, and will need to be rebuilt.

There will be a directory for each set in the repository.

Development of a new set occurs in an unstable directory. Once the set is deemed comprehensive and all tests pass, it is matured into a numbered set and released.

cp6t-propose-set

The cp6t-propose-set program will replace the current cp6t-ecosystem program, and it will work as follows:

Create the database, empty.
Retrieve a list of archives on CPAN.
Retrieve p6c and use git ls-remote to find the commit hashes, then construct tarball URLs.
For each archive:
1. Store the URL of the archive in the database.
2. Use nix-fetch-url to find the hash of the archive.
3. Store the hash of the archive in the database.
4. Read META6.json from the archive.
5. Store metadata in the database.
6. If there’s an earlier version of the distribution in the database, delete it.
Sort the distributions topologically using dependency information.
For each distribution:
1. Generate a Nix expression.
2. Build the Nix derivation.
3. Upload the artifacts to the global cache.
4. Run the tests.
5. Store in the database that the distribution is successful.
Generate a Nix expression for the entire set.

If at any point processing a distribution fails, store this in the database and continue with the next distribution.

The program cp6t-propose-set can be invoked either at step 1, or at any later step in which case it will use the existing database to continue.

CI will run cp6t-propose-set periodically and create pull requests on this repository. The pull request will include a detailed report of what went well and what went wrong.

Open questions

How to deal with p6c? It's a bit annoying since META.list contains so little information. But it's probably doable if we accept that we can use Git master of each package. :cold_sweat:

What format to use for the database? SQLite is annoying to use with Git. Perhaps JSON or a custom text-based format that can easily be merged.

Windows and macOS support? First of all Nix doesn't work on Windows. And I don't want to maintain these anyway since they're unfree operating systems and I don't want to install them since they cost money and contain malware. For now this will be Linux-only.

zopsicle commented 5 years ago

Development is going well regarding CPAN and p6c: I now have two subroutines with the same interface, one giving a seq of CPAN tarball URLs and one giving a seq of p6c tarball URLs.

These latter works by using git ls-remote on each Git repository mentioned in source-url in projects.json, and taking the commit hash for HEAD, then constructing a GitHub tarball URL. It doesn't yet work for Bitbucket and GitLab but that is easy to add.

zopsicle commented 5 years ago

Now that I can retrieve tarball URLs, I can start working on use nix-prefetch-url to download the tarballs and compute their hashes. Then we can create a file that maps tarball URLs to their hashes, and be certain that later downloads will result in the exact same file. :)

zopsicle commented 5 years ago

I now have two files, one with a list of archive URLs, and one with a hash for each archive URL.

zopsicle commented 5 years ago

I now have a file that contains META data from most of the distributions.

zopsicle / cp6t

Introduce sets #16

Requirements

Prior work

Sets

cp6t-propose-set

Open questions