RFI: search and info commands in gxpkg

belmarca commented 6 years ago

Short of Gambit's own native module/package system I am using gerbil's, which is quite nice to work with.

In order to facilitate adoption, we could have a searchable package metadata repository. This could enable gxpkg usage such as:

gxpkg search BLAS

package | version | runtime | author | release date
---
scmblas | v x.y.z | gambit | feeley | YYYY-MM-DD
blas | v x.y.z | gerbil | vyzo | YYYY-MM-DD
gblas | v x.y.z | gambit, gerbil | belmarca | YYYY-MM-DD

gxpkg info gblas

Description: Gambit FFI bindings to BLAS.
Author: Marc-André Bélanger
Runtime: Gambit, Gerbil
Repo: github.com/X/YZ
Version: x.y.z
Release date: YYYY-MM-DD
Commit: hash123

The search and info commands would simply query an HTTP package metadata repository. A list of all packages could be kept locally and updated at will. A call to gxpkg install my-package would then clone the proper repository to ~/.gerbil/pkg/my-package and run the Makefile.

We could require the Makefile to contain at minimum the gerbil rule, used to compile the library with gxc. gxpkg would then simply call this rule and the rest would fall into place. Thus the trouble of actually building the required object files (or whatever else needs to be done) is left to the library/package author and requires only one assumption from us, the existence of the gerbil rule. So if an author wants to write tests, they can, but we don't disallow untested code. Etc.

The metadata repository's state could be mutated by git (or another VCS) hooks. As an author I can thus write my library locally and push it to GitHub (or BitBucket or whichever provider). Ideally, our metadata server is notified of the latest metadata with a simple POST. However git doesn't have post-push hooks, so that could be a little bit annoying.

Package versioning could be handled relatively simply. Instead of having a master package whose HEAD tracks whatever commit is in the metadata repository, we could use a directory structure such as:

~/.gerbil/pkg/my-package
~/.gerbil/pkg/my-package/current
~/.gerbil/pkg/my-package/hash123
~/.gerbil/pkg/my-package/hash456
~/.gerbil/pkg/my-package/tagXYZ

With current being used whenever (import :user/my-package) is called. A call such as (import :user/my-package#hash123) or (import :user/my-package 'tagXYZ) could then use the library at any particular commit. This allows the use of different versions of a package in different REPLs. If, on the contrary, a call such as (import :user/my-package 'tagXYZ) simply checked out the particular commit (a functionality that is not undesirable), there would be a single package version available at all times (unless one wants to mess with starting different processes at long enough intervals to let the checkout from one process to complete, etc).

This is obviously an incomplete proposal. I haven't discussed important details such as authentication/authorization (who gets to write to the metadata repository?) and signing of packages as well as how much trust to put into said packages (gxc is involved after all).

Hope this gets the ball rolling :)

fare commented 6 years ago

Note that in the above example, a segfault is the happy failure, obvious, easy to detect and to fix. Data corruption, persistent database corruption, months of work wasted, customers losing a lot of money, companies going bankrupt and people dying are less obvious possible consequences of version mismatch in libraries.

fare commented 6 years ago

As for the issue with termite and passing data over the wire between incompatible versions: it's a problem with integrations, not with source code, and the source code's build system is the wrong place to deal with it.

Indeed, integrations include details like target CPU, compiler version and options, versions of foreign libraries used, etc., that don't belong in the source code itself!

If you want to deal properly with version mismatch in a distributed system, then you need either or both of:

Version-aware atomic distributed system deployment, when there is a common authority for all the components, using e.g. NixOps or DisNix, plus explicit support for schema upgrade where relevant.
Proactive LangSec-style validation of inputs to avoid not just failures but deliberate attacks, as well as cryptographic hashes and signatures to ensure proper versioning, validation and authority in your input data. http://langsec.org/

vyzo commented 6 years ago

Not to mention that current best practice for what passes for mobile code in the wild (eg js, java) is to either use source, or bytecode (webassembly, java class files). And there is no support for arbitrary state recovery, just a clean slate sandbox execution. [edit: not quite true for java, there is pretty good serialization there and they get by fine without versioning :]

I fear that trying to implement the old 80s/90s dream of mobile code in the modern era is like chasing chimeras. And the last place that should care about mobile code versioning is the module system itself.

belmarca commented 6 years ago

Now, if I use 100 libraries that of course don't use the exact same integrations as I, I must fork and maintain git repos for 100 libraries. That's completely crazy and backward.

No need to fork, just refer to a particular tag or commit and use that. As long as git commit history isn't modified (nothing prevents this) you're good.

So, what metadata do you all want to see in a package? I have a mock implementation that uses the following fields:

    name
    author
    description
    version
    runtime (pure gambit or gerbil needed?)
    license
    last_updated (set by metadata server)
    repo (where do we get the code?)

vyzo commented 6 years ago

that sounds fine for prototype purposes.

fare commented 6 years ago

Either the exact version of each and every dependency has to be included in the source of each library (what I understood that you were proposing), in which you need to fork the entire world to modify the source every time you have a new integration, or integrations are actually orthogonal to the source code (what I propose), in which case indeed there is no need to fork.

belmarca commented 6 years ago

Hi,

I have pushed a basic python implementation of a metadata server. See https://github.com/belmarca/gxpkgd_python.

vyzo commented 6 years ago

Also created a repo for the pure gerbil implementation: https://github.com/vyzo/gxpkg-daemon

vyzo commented 6 years ago

Opened a couple of issues to discuss api and canonical package metadata:

mighty-gerbils / gerbil

RFI: search and info commands in gxpkg #105