mighty-gerbils / gerbil

Gerbil Scheme
https://cons.io
GNU Lesser General Public License v2.1
1.16k stars 112 forks source link

RFI: search and info commands in gxpkg #105

Open belmarca opened 6 years ago

belmarca commented 6 years ago

Short of Gambit's own native module/package system I am using gerbil's, which is quite nice to work with.

In order to facilitate adoption, we could have a searchable package metadata repository. This could enable gxpkg usage such as:

gxpkg search BLAS

package | version | runtime | author | release date
---
scmblas | v x.y.z | gambit | feeley | YYYY-MM-DD
blas | v x.y.z | gerbil | vyzo | YYYY-MM-DD
gblas | v x.y.z | gambit, gerbil | belmarca | YYYY-MM-DD

gxpkg info gblas

Description: Gambit FFI bindings to BLAS.
Author: Marc-André Bélanger
Runtime: Gambit, Gerbil
Repo: github.com/X/YZ
Version: x.y.z
Release date: YYYY-MM-DD
Commit: hash123

The search and info commands would simply query an HTTP package metadata repository. A list of all packages could be kept locally and updated at will. A call to gxpkg install my-package would then clone the proper repository to ~/.gerbil/pkg/my-package and run the Makefile.

We could require the Makefile to contain at minimum the gerbil rule, used to compile the library with gxc. gxpkg would then simply call this rule and the rest would fall into place. Thus the trouble of actually building the required object files (or whatever else needs to be done) is left to the library/package author and requires only one assumption from us, the existence of the gerbil rule. So if an author wants to write tests, they can, but we don't disallow untested code. Etc.

The metadata repository's state could be mutated by git (or another VCS) hooks. As an author I can thus write my library locally and push it to GitHub (or BitBucket or whichever provider). Ideally, our metadata server is notified of the latest metadata with a simple POST. However git doesn't have post-push hooks, so that could be a little bit annoying.

Package versioning could be handled relatively simply. Instead of having a master package whose HEAD tracks whatever commit is in the metadata repository, we could use a directory structure such as:

~/.gerbil/pkg/my-package
~/.gerbil/pkg/my-package/current
~/.gerbil/pkg/my-package/hash123
~/.gerbil/pkg/my-package/hash456
~/.gerbil/pkg/my-package/tagXYZ

With current being used whenever (import :user/my-package) is called. A call such as (import :user/my-package#hash123) or (import :user/my-package 'tagXYZ) could then use the library at any particular commit. This allows the use of different versions of a package in different REPLs. If, on the contrary, a call such as (import :user/my-package 'tagXYZ) simply checked out the particular commit (a functionality that is not undesirable), there would be a single package version available at all times (unless one wants to mess with starting different processes at long enough intervals to let the checkout from one process to complete, etc).

This is obviously an incomplete proposal. I haven't discussed important details such as authentication/authorization (who gets to write to the metadata repository?) and signing of packages as well as how much trust to put into said packages (gxc is involved after all).

Hope this gets the ball rolling :)

vyzo commented 6 years ago

Well, the Makefile is redundant -- the package assumes a build script which builds with the standard build tool (:std/make).

But this is a great proposal overall, I would very much like to have searchable metadata and info for packages.

belmarca commented 6 years ago

See https://github.com/vyzo/gerbil/wiki/The-Gerbil-Package-Manager#a-word-of-caution for more discussion about security, signing of packages, etc.

@vyzo Regarding :std/make, I will have to think about it. As I mentioned in chat, I think the package system should be native to gambit and not necessarily particular to gerbil. Thus using gerbil's build scripts by default would more tightly couple the project to gerbil. However I am being realistic about the efforts needed to implement the equivalent of gerbil's package management in native gambit and I will consider using a build.ss.

As for sandboxing, this can indeed be a tricky problem. Maybe the package definition/manifest should define the created files. Then maybe we can execute the compilation in a firejail (https://github.com/netblue30/firejail), and only copy the set of files mentioned in the manifest from the firejail?

I am absolutely not the person to make recommendations on sandboxing, but that looks like an interesting option.

belmarca commented 6 years ago

Here is an example of a package repository: https://github.com/belmarca/gxpkg-example.

vyzo commented 6 years ago

I am opposed to using Makefiles as the build vehicle.

The package manager already utilizes the build script which covers all basic functionality for building gerbil (and properly namespaced gambit) code, handles dependencies and build order, and so on. Also, there is a simple wrapper macro that defines the build script with just a couple lines of code -- see :std/build-script.

So I don't see any good reason to make the build system depend on make, it's a step backwards. Nonetheless, you could include a Makefile that calls the build script itself!

vyzo commented 6 years ago

Now, with regards to the basic functionality of search and metadata.

Firstly, we need additional metadata in packages. This can be simply done by having a metadata: field in the gerbil.pkg plist. This could include tags and anything else we deem useful.

The second issue is discovery of packages. We could have a server at gxpkg.cons.io that allows you to register packages by linking to the github repo. The server could then fetch the metadata (and periodically update, perhaps with github integration with commit hooks) and store them locally for answering queries.

The implementation of gxpkg search could then query the metadata server. The implementation of gxpkg info can answer queries about locally installed packages or perform a query to the remote server.

We can also cache the package repo metadata locally to avoid having to hit the server for every query.

feeley commented 6 years ago

As you know I have been designing Gambit's module system with @FredericHamel and have some solutions to the issues you raise. But first I'd like to discuss the "build script" as this seems to be contentious. @belmarca suggests a Makefile and @vyzo a Scheme script. Another option we have considered is a shell script. I can see the following pros and cons:

So perhaps the build can be automated from the dependencies, and if a build shell script or Scheme script exists then it can be used for more complex situations.

Your thoughts?

vyzo commented 6 years ago

gxpkg can also work with a shell script, provided it supports a couple of meta-commands (deps and compile). But it won't be able to clean packages without the spec meta-command, which returns an :std/make build-spec.

fare commented 6 years ago

My experience with Common Lisp was that translating ASDF's bootstrap build system from a Makefile plus shell (and perl!) scripts to a pure Common Lisp "script" was intensely satisfying. (ASDF builds CL code from CL, the bootstrap build system builds ASDF from the shell, vanquishing all portability and dependency issues, with lots of additional targets for testing, releasing, etc.)

My co-maintainer Robert Goldman though didn't like it: he experienced a whole lot of portability issues that I had to fix and yet made him deprecate the system. Having to deal with 10 major implementations most of which run in each of 3 major OS families must have played a role in it, though.

I would like a pure Scheme build system... but then, having worked on XCVB, ASDF and Bazel, in addition to using a lot of different build systems... well, I have my grand ideas of what a build system could and should be... https://ngnghm.github.io/blog/2016/04/26/chapter-9-build-systems/

feeley commented 6 years ago

@fare I'll take a look at your writings... sounds interesting.

Portability is a really really really important feature of Gambit so I'd like a build system that is not dependent on the OS. So shell scripts and Makefiles are not ideal. But I worry that a Scheme script will be awkward for building modules that are more OS dependent (for example an interface to a C library with lots of dependencies with other C libraries), where "good old standard" tools are more appropriate for that. I believe there is a spectrum of build situations that require a gradually more detailed/low-level build procedure. I feel there is no single best method, so several build situations should be supported. Something like

This is only to give the general idea... I'm not (yet) proposing this exact hierarchy. The idea is to make the programmer's life as easy as possible for a particular build situation (level of detail).

fare commented 6 years ago

My experience is that Lisp or Scheme scripting is so much more pleasant than shell-scripting that just... wow.

On the other hand, yes, it's important to be able to call out to external programs including shell-scripts, and to shell out a pipe or two in some cases. In CL, I maintained an awkward uiop:run-program compatibility layer and inferior-shell:run as a more usable layer on top. In Scheme, well, I wrote Gerbil's std/misc/process, that wraps around Gambit's open-process. It's OK for simple uses, but is missing a lot of the features available in CL, even more features that CL didn't provide, and isn't extensible.

Ideally, you'd want a way to handle, e.g. resetting signal masks before execve, handling arbitrary redirection of arbitrary file descriptors (including user-defined extensions for pipes, ptys, sockets, etc.), tweaking OS personality and arbitrary configuration system calls in the child, not just chdir, etc. Of course, making that portable, too, is a lot of work, and requires experimentation. GC, multiprocessing and other features should probably be disabled in the child pre-execve, though.

At least, having chosen Gambit, we don't have to deal with 10 different implementations' internals...

fare commented 6 years ago

In a nutshell, my view is:

  1. Build more primitives, macros, and modules and sublanguages in Scheme.
  2. Anything usable at the base-level should be usable at the meta-level, and vice-versa... and there should be system support for hygiene or renaming across levels ("these are not the source/target/installed files/variables you're looking for!").
  3. Yes, hermetic, deterministic, pure functional reactive programming is a great way to build stuff incrementally... so implement it to make it available at the base-level, and use it at the meta-level!
fare commented 6 years ago

Or, things not to do:

  1. BAD: Have a closed solution supposed to be perfect, instead of lots of exposed APIs that can be used to hack better solutions. So you have automated module dependency detection? GOOD: Expose that! Extensible OO APIs are good (pure typeclasses preferred, see e.g. my ILC 2012 paper).
  2. BAD: make special-purpose evaluation tools (say, pattern-matching of filenames, rule-based backtracking evaluation, string interpolation, etc.) that are only available at the meta-level but not the base-level, or vice-versa. GOOD: if it's good for a build DSL, it's good for other DSLs, so expose it!
  3. BAD: Give up on hermeticity, determinism, purity, functional style, reactivity, etc. GOOD: It's OK to not have everything in the beginning, but keep the goal in mind, allow for future extensions, and steer the boat in the right direction.
feeley commented 6 years ago

Here's another important issue... how are modules/libraries identified unambiguously? Obviously this is important to attribute an unambiguous meaning for a given module. Our approach is to use a URL to name modules, specifically a github URL or other publicly accessible git repo. The repo tag is part of the URL to give it an unambiguous meaning, for example (import (github.com/feeley/digest/tree/1.0.0)). There are also Gambit "builtin" modules that are unambiguously identified by a short name, for example (import (gambit regex)). Among other things, this simplifies the migration of code from one node of a distributed system to another (the migrated code contains a reference to the library name and the receiving node, if it does not have that library, can automatically download it, checkout the specific commit/tag, build the library and load it to continue executing it).

fare commented 6 years ago

I believe there is no escaping managing your own module namespace.

URLs won't do it: a project may change hosting, it may be forked, it may be replaced (mostly)compatibly by something else (from the same authors or different authors), etc. What is the URL for the libc? Some obscure page from some standards body? The GNU libc webpage? That of *BSD? What about musl? uclibc? klibc? dietlibc? Magic autodetection of whatever was used? Even if it's a local fork?

I recommend embracing the fact that you're going to have to manage a new namespace.

FredericHamel commented 6 years ago

URL is not encoded in the library's source code, it is determine at compile time. This means that if a project change hosting the namespace also change to the new URL.

fare commented 6 years ago

Still a bad idea. The URL has no advantage and only disadvantages.

The "real" source code for a library dependency is NOT the choice of the author of the using library, but that of the integrator who builds a complete system from a set of precisely versioned libraries. The exact URL of the source, fork, commit, patch, etc., used for the library depends on the integrator, not on the author of any library.

Just adopt a short namespace, and let the integrators do the mapping not just to URL, but to specific commits/tarballs.

belmarca commented 6 years ago

Thanks for the discussion. It seems there are multiple intertwined arguments going on though. I will not concern myself with the internals of the actual build system. I will simply agree with @feeley that we should not impose too much on authors. Having read most of the text by @fare , I understand why gerbil wants to use a build system written in gerbil. However forcing its use upon the package maintainers, library authors etc is not the way to go IMO.

If broader usage of Gambit and Gerbil is a goal, I think Marc's multi-level proposal is the one to follow. That is, Gerbil should have its build system written in gerbil, but any newcomer should also be able to write and share a package using whatever build method suits them best. This only facilitates adoption. One could trivially add a tag to packages to indicate if they are built using anything other than exclusively gerbil's (or gambit's future) build system. Call these "pure" vs "impure" or "dirty" vs "clean" packages. Or even just tag the packages with any and all combinations of "make", "shell" or "build-script" meta tags. This is simple to do and allows users to forbid installation of certain classes of packages. Personally, if I am forced to learn a new build system while also learning a new language and a new runtime, I might well not bother. These things take time and not forcing their usage seems like a good middle ground.

As for unambiguously identifying modules, one way would build a DAG in which vertices are modules/packages and edges dependencies. Packages point to particular commits of dependent packages. Thus if package B at commit b depends on package A at commit a, the particular state of package B at commit b can be rebuilt even if package A gets more commits. Then package B can be marked as having stale dependencies, for example.

In practice this means that the author metadata is inferred by the package metadata server by looking at the github repo (i.e github.com/author1/packageB). Installation of said package will clone the repo, checkout at the particular commit if needed, then clone the dependencies (github.com/author2/packageA in our case), checkout that package to its required commit, and build.

Am I making any sense?

belmarca commented 6 years ago

Note that this scheme does not imply tight coupling to any provider whatsoever. It shouldn't matter if you get your code from wherever. The use of GitHub is for convenience and solves the authorization issue (I can't publish code under author names that I do not control).

fare commented 6 years ago

The "not imposing a build system" bit reminds me of this Koan about Sussman: http://www.catb.org/jargon/html/koans.html#id3141241

As for pointing to a specific commit of a package dependency in a specific repository: once again, it's the job of the integrator, and emphatically not of the package author. Anything else is crazy... and yes, it's been tried... then when you patch one tiny library for a security bug, you must recursively update all transitive using libraries... nuts.

fare commented 6 years ago

Please re-read the section of my essay about the four roles around a library. https://ngnghm.github.io/blog/2016/04/26/chapter-9-build-systems/

belmarca commented 6 years ago

@fare Let's say my code depends on yours and you go through a major refactoring and large but needed API changes. What am I to do? By fixing my code to a particular commit of yours, I can assure that it works even if you refactor (unless you destroy the commit in question). You do not have to do anything in particular.

And about the build system: it depends on who is your target audience. If you want "ordinary" people to contribute packages (I consider myself one of them), you cannot impose too much IMO. Otherwise it gets in the way. Maybe I can change my mind if the build system has great examples and documentation. It's just less trouble I would think. But then again it depends on who is your target audience.

belmarca commented 6 years ago

I didn't find that section in particular very enlightening. I preferred the following section (Pure Functional Reactive Programming) and agree with the conclusions. I'm not sure where we are diverging especially with regards to build determinism. How can I make sure my build is deterministic through time if I do not peg dependencies to particular commits?

fare commented 6 years ago

You confuse many things. For your local installation, you're an integrator, so pick whichever version you want.

As you publish your code into the wild, others want and need their own integration, and you cannot and should not try to impose your own.

fare commented 6 years ago

If there's a long-term fork between the code your library needs and the upstream, then it's a fork, so it needs a different name in the namespace.

belmarca commented 6 years ago

I might be confusing things but I don't see how what you propose solves this situation:

Package A has a hard dependency on package B. Last known working version of package A used package B at commit b. That is guaranteed by the author of package A.

What you seem to propose is that the package manager treats all package dependencies as referring to HEAD and then the particular integrators, on their own machine, either figure out which particular commit to use or simply make changes to the library?

fare commented 6 years ago

You confuse the build system and the package manager.

Libraries cannot and should not specify a specific version of their dependencies. That's the job of the integrator / distribution.

Look at how Nix packages are defined: each package captures its direct content, but it's the overall repository (plus overrides) that defines the mapping of names to versioned packages, and does a fixpoint when building.

fare commented 6 years ago

Library A depends on library B. Library C depends on library B. Which library author gets to set the version of B ? The one with the biggest penis? Nope. Neither. The integrator does.

FredericHamel commented 6 years ago

Library A set which version of libary B it depends. Library C set which version of libary B it depends.

The system could allow multiple version of the same library to be installed.

fare commented 6 years ago

That's not how any of this works.

Unless B is pure functional and there is no indirect interaction between A and C, e.g. via D that depends on A and C.

When I fix a crucial security hole in openssl (real story), I don't want to propagate that everywhere in every single damn package. I let the package manager compute the fixpoint for me.

fare commented 6 years ago

The whole point of having a library is to provide an intent that insulates the users from the contents of the library.

This intent has to be coherent between different simultaneous users of the library, i.e. they must see the same content.

If the library depends on installed files to read at runtime, unless you use Nix, there will be conflict.

And if files maintaining read/write state are involved, then even if you use Nix, there will be conflict.

belmarca commented 6 years ago

You confuse the build system and the package manager.

That's what I meant by

It seems there are multiple intertwined arguments going on though.

:smile:

Library A depends on library B. Library C depends on library B. Which library author gets to set the version of B ?

@fare This is a non sequitur IMO. The author of package B is free to do whatever they want, whenever they want (event destroy commit history). So in the case of openssl, you just commit code to fix the security hole. You don't care about any package that depends on your library.

Then, like @FredericHamel mentions, it's the dependent libraries that set the depended upon library's particular commit. Thus inside package A we have a reference to library B in the state it was at commit b0 and inside package C we have a reference to library B in the state it was at commit b1.

Now if B is vulnerable, and a patch has been made available, it is the responsibility of the maintainers of A and C to fix their dependencies.

The package metadata repository could notify users that package B at commits below b2 is insecure. This is extra work but can be done.

How does Nix do it? I'm just starting to read the documentation, maybe you could point me to the particular concepts I am missing?

fare commented 6 years ago

If I have to make a recommendation for the namespace, I'll just say: keep it mapped to the filesystem (modulo search function / mounting / mapping / indirections configurable by the user), just like Perl or Python, and without the idiotic prefixing of Java.

fare commented 6 years ago

When you're thinking at the level of integration, Nix / GUIX should be the source of inspiration.

fare commented 6 years ago

D depends on A and C, that both depend on B. Who decides which version of B is used?

(Extra points: E depends on D. F depends on E and directly on B. etc. Generate more non-tree-like DAGs ad nauseam.)

I'm not against authors sharing their successful integrations, if they like, especially as part of unit testing. But these obviously cannot be authoritative for anyone else, and should NOT be part of the build files as such.

fare commented 6 years ago

It's not ignorance that does so much damage; it's knowing so darned much that ain't so. — Josh Billings

belmarca commented 6 years ago

D depends on A and C, that both depend on B. Who decides which version of B is used?

Both, or none if you wish. They each use their own namespaced functions from B.

In A: (package-b#commitX#some-function arguments)

In C: (package-b#commitY#some-function arguments)

The author of A does not care about the version of B the author of C uses, and vice versa. He calls (import :author-b/package-b commitX) or whatever the syntax, and then the functions from B are namespaced under b#commitX. If one calls import without a commit, HEAD is assumed and the functions are simply called under the b# namespace. The same thing goes for the author of C.

Now D depends on A and C, and thus should never directly call functions from B, so there is no problem here.

What exactly is your objection? More specifically, where does it lie? I cannot figure where you want to put the version pinning in practice. A cursory look at Nix doesn't give me the information I need to see where I'm being mistaken. Just point me to a particular thing to read and I will oblige, but I have to say I don't understand how my proposition wouldn't work.

feeley commented 6 years ago

I'm with @belmarca on this. @fare's development model has the integrator assigning semantics to the modules. In other words it is the integrator that chooses which version of a module to use. But this is madness! When I write a program I must pin down the combination of module versions that works for my program. If the integrator has a say in which versions will actually be used, this could break my program. The integrator (which doesn't know my code in depth) could think his choice of versions is OK, but subtle problems could come up.

Perhaps I don't understand the different roles (integrator, author, other?).

In the system I have designed with @FredericHamel, different versions of a module can be installed on a system, and different versions of a module can be linked with a program. With git this is rather easy to achieve. If versions X and Y of module M are required, then two copies of M's repo are created and a "git checkout X" is done in one copy, and a "git checkout Y" is done in the other, and each one is built independently.

belmarca commented 6 years ago

@feeley The git method is what I initially proposed. Different versions of package B are installed in the same top level directory, package-b, yet HEAD is actually in package-b/current and the other installed versions are in package-b/commit-hash. Importing the lib thus has namespace package-b#commit-hash.

Perhaps I don't understand the different roles (integrator, author, other?).

@fare I believe this might be my case as well. I will think more about your model.

I'm not against authors sharing their successful integrations, if they like, especially as part of unit testing. But these obviously cannot be authoritative for anyone else, and should NOT be part of the build files as such.

I believe this is where I don't quite follow the argument. Can you write a simple example to show me how this would be done in practice? A gist is fine, or a repo.

feeley commented 6 years ago

Let me make a concrete example to help clarify the different models. Say I have a program P that depends on modules A and B, and module A depends on module M, and module B depends on module M. However A uses v1 of M and B uses v2 of M.

In @fare's model (for lack of a better name) the source code indicates the dependencies without the version information. The dependencies are thus

Note that P, A, B and M could have been authored by different developers.

In this model, when P is linked a specific version of A, B and M must be chosen by the integrator. So it is the responsibility of the integrator to ensure that the version of M that is chosen is appropriate for the uses by A and B (either v1, v2 or some other version). That seems like a daunting task as the integrator did not author A or B, so how can he be sure his choice is OK? Moreover it is possible that no version of M is suitable because of incompatible versions (A may depend on a feature of v1 of M that has been removed in v2, and B depends on a feature of v2 that did not exist in v1).

In the model @FredericHamel and I are implementing, the source code indicates these dependencies

When the system is built, both M/v1 and M/v2 are built and linked with P. In some cases, this will just work.

One case where this may not work is when some data created by M/v1 through A/vX is passed to M/v2 through B/vY. For example M could be a regexp library and A/vX uses it to compile a regexp, and B/vY calls M/v2 to execute that compiled regexp. Because M/v1 and M/v2 are different, the data created by M/v1 may be unusable by M/v2 (for example the regexp compiler and execution engine may have evolved in incompatible ways). Note that this situation can be detected by changing the type of the data between versions.

But this situation is not hopeless. A new version of A can be created that replaces the dependency to M/v1 by a dependency to M/v2. Call this version A/vZ. Now the souce code indicates these dependencies

Note that version vZ of A can be obtained by

This can be viewed as a kind of integration work. However it is a more modular form of integration than the global integration needed by @fare's model.

The "incompatible evolution of M" situation can be addressed by creating M/v3 and new versions of A and B (by asking the authors of A, B and M or creating forks, or a combination). This may be a difficult task, but possible, and no more difficult than in @fare's model.

vyzo commented 6 years ago

I believe that tightly integrating the module system with versioning is a mistake -- you get versioning hell, and npm and worse.

I believe by default the system should build versionless from master (which is what gxpkg currently does).

Explicit versioning should be reserved for major events, with interface breaking changes and those can be handled with a git tag (or (gasp) a git commit anchor). Note that multiple versions of the library in the same namespace is an idea I very much frown upon -- code and object duplication, and naming/refering things become difficult.

If there are specific library versions that need to be pinned, then you can make a fresh GERBIL_PATH and pin there with specific commit tags for integration purposes. Note that pinning is not yet explicitly supported by the tooling, mainly because it hasn't been a problem yet, but it's straightforward to add. Nonetheless, that's something that the integrator of the system can handle, and it comes into play only for application deployment that absolutely needs to depend on some specific version.

But at the end of the day, development goes forward. Instead of depending on an obsolete interface, you should update your package to match the latest interface of your dependencies. Otherwise you just create a roadblock that requires a version pin which holds the entire package hierarchy back. And if it's not your package, well, it's still open source -- open a PR and if the author is not responsive, you can fork!

Also, before you enter yet another version quagmire, read on what the go people have to say about the issue: https://github.com/golang/proposal/blob/master/design/24301-versioned-go.md

vyzo commented 6 years ago

I would also like to add that Scheme|Gerbil is not so brittle with regards to interface changes, because there are no type signatures -- just functions (and macros). So as long as you keep your export list tidy with a well-defined interface, it's unlikely to have your dependents blow up because of internal refactorings.

vyzo commented 6 years ago

Another point that is not salient here, is that Gerbil's explicit control of the namespace (with the namespace and package declarations, together with gerbil.pkg hierarchies) allows both new and old interfaces to co-exist.

So when you make a new interface for your module you can keep the old one and introduce a new sub-namespace for your new interface by virtue of namespace control.

So let's say you you have your initial (call it v1.0) interface be:

package: vyzo/my-lib
(export ...)

You can introduce the new interface while preserving the old one in the hierarchy. You can also simply rename your module main api interface to be:

vyzo/my-lib/v1 ; for the old interface
vyzo/my-lib/v2 ; for the new interface

All this is possible with simple operations at the module system and its hierarchical structure. You don't have the baggage with types and interface and all those transitive versions of things. You just have an api module that exports your library's interface. When you introduce a new major api, with whatever external refactorings, you keep the old api stub and you can rename it and move it elsewhere in the namespace hierarchy.

This makes upgrading dependent packages to pin to a specific api trivial -- the import becomes (import :vyzo/my-lib/v1) instead of (import :vyzo/my-lib).

All this indicates to me by using separation of module and build system, with package level integration, and programmer controlled free selection of namespace solves the problem at the language level together with a common sense social contract. We don't have to invent versioning to make a stable package ecosystem that can evolve without friction.

vyzo commented 6 years ago

Note that namespaces can be populated by multiple independent packages. So you made a new shiny interface for your library and don't want to keep around the old one? Fork and re-namespace stuff so that you have v1 and v2 co-exist.

vyzo commented 6 years ago

To finalize my tirade:

I am philosophically opposed to polluting programming with versioning cognitive burden.

The module system should let you control your namespace and your library structure so that you don't have to care about those things. You should simply (import :clan/stuff) and not care where it came from. And if you need to pin to an interface version, then there should be a sub-namespace for that specific interface -- which can live in an entirely different package at the distribution level. So you want a versioned deprecated interface? Use (import :clan/stuff/v1). It comes from some package at the distribution level, fork or not.

As for the details of the version and the exact commit of some package? I would argue that this is the job of the package manager/build system, which is designed exactly to handle assembly of binary artifacts from various sources.

feeley commented 6 years ago

Philosophically, when a programmer writes a module A that imports module B how can the programmer give guarantees on the behaviour of A? In other words, what is the meaning of A? The meaning of A will only be precise if the meaning of B is precise. So the programmer can only guarantee that A works correctly if the meaning of B is known precisely.

For our work on Termite Scheme this is a critical issue for code migration (the ability for one node of a distributed system to send a procedure or continuation to a receiving node, and to resume the computation there). If a procedure or continuation from module A is sent to the receiving node and the receiving node has a different meaning for B (different version installed for example) then the code's meaning will change and the resumed continuation will not behave identically. Having the version information in the module name allows giving it an unambiguous meaning, and code migration can be transparent (there is no need for an initial installation of the same modules on all the nodes of the distributed system, something that is impossible when the distributed system is large and has a disorganized evolution, such as the Web).

Note that currently our module system does not force using version information in the module names. But it allows doing so with a syntax that matches the one from the vcs, for example (import (github.com/feeley/digest/tree/1.0.0)) gets you precisely version 1.0.0 of the digest module, but (import (github.com/feeley/digest)) gets you the repo's HEAD, which is ambiguous because it changes with time. An (import (foo/bar)) gets you the currently installed module foo/bar (no specific version). A program containing an import to an unversioned module name is likely to cause problems in a distributed system.

Another feature we are designing is module name aliases. A module's source code tree can contain a file that defines aliases of the form (define-module-alias digest github.com/feeley/digest/tree/1.0.0) so that within the module's source code tree any (import (digest)) will be equivalent to (import (github.com/feeley/digest/tree/1.0.0)). This makes it easier to change the version of dependent upon modules.

So I have the feeling that both approaches offer the same functionality, just in different ways.

vyzo commented 6 years ago

In real life, there is no program of sufficient complexity that is actually correct, no matter what versions you choose -- all real programs have bugs, inconsistencies, and what have you.

Software is a living thing; libraries evolve, bugs get fixed, stuff gets introduced. If you add versioning in the module system you just added cognitive burden and overhead in development of code. You now have to track versions everywhere and change them every time there is a change for some fix or you risk becoming the bottleneck and suddenly you are the roadblock that forces duplicate versions of some library -- and have i ranted on the madness of this? good god, i think not yet.

For the programmer, what matters is the high level interface of the library and that's what you program against. This is best expressed by saying (import :feeley/digest) rather than involving distribution peculiarities.

At integration and build time, you can pin specific versions if you want, for instance to get hermetic reproducible behaviour with exact semantics. But if you put it in the module system, you just made everyone obsess about versions and make a mess every time there is an update.

feeley commented 6 years ago

In real life, there is no program of sufficient complexity that is actually correct, no matter what versions you choose -- all real programs have bugs, inconsistencies, and what have you.

I'm not just talking about "sufficiently complex" programs or the existence of bugs... a 10 line program that imports an unversioned feeley/digest will no longer work if that module removes a procedure definition that has been determined to be redundant but happens to be used in the 10 line program, even if the old and new versions of the digest module contain no bugs.

But lets move the discussion away from preferences and opinions... how would you tackle the code migration issue mentioned above in practical terms?

vyzo commented 6 years ago

I'm not just talking about "sufficiently complex" programs or the existence of bugs... a 10 line program that imports an unversioned feeley/digest will no longer work if that module removes a procedure

That's an interface breaking change... We want to encourage an ecosystem of well thought out and cooperatively designed software for the commons, where this stuff just doesn't happen. Why was that redundant procedure exported? And why was it then removed, was it too much burden?

And still, the broken program's programmer can just go ahead and fix it or fork the old version and link to that, or with more advanced functionality from gxpkg, pin a specific version for build.

Note that my reference for real-life programs was more in response to your "precision" argument; i don't buy that, you can't really make precision statements about real-life software, stuff is just too complex...

how would you tackle the code migration issue mentioned above in practical terms?

I think you are thinking naively about code migration, and you even assume that it's a good idea :)

Let's say you had a complete versioned reference of the code you want to migrate. How would that load in the new environment -- it requires a precisely the same version of the system (and how are you going to make that happen in a distributed system of any scale is an open question). Or are you going to be loading more libraries dynamically to support that mobile code?

And then you reach that point where you need a sandbox for your mobile code, and then you might as well use whatever code you are linked with and let it break if there is a version mismatch. Or you can use a more intelligent payload format that conveys the version and refuses to load if there is a mismatch.

Point is, either you have the exact same version of the code required for your payload (which you can do without baking it in the module system, simply by hashing stuff) and your code runs, or it isn't and it won't migrate. And if you are thinking loading modules live from the Internet to support migration of a mismatched version of code, what can possibly go wrong?

So no, I don't think that code migration is something that is calling for hard baking of versions within the module system.

vyzo commented 6 years ago

Also lest we forget: i neglected to mention the word "state" in the mobile code stuffs, but it is the actual hard problem in recovering mobile code environment necessary to run your payload.

fare commented 6 years ago

@feeley, once again, you model only works if M is pure and P has no interaction between instances of M imported from A and from B.

Let's say M defines an entity e that is modified (adding, removing, renaming, modifying some record field or function in an incompatible way) — indeed by very hypothesis, if there was any meaningful change in M, then for some e, there is an incompatible difference between A.M.e and B.M.e. By hypothesis, you can never safely use two different versions of M together whenever P wants the two to interact—Yet this interaction is often the whole point: take a math library M, compute some things with algorithms from A then feed the result into algorithms from B, oops, segfault because the internal representation changed between the two copies of M.

Marc's solution is that

  1. Every library author must include his integrations in the source code.
  2. Every library author must then recursively fork each and every of his transitive dependencies to mimic his integration.

Now, if I use 100 libraries that of course don't use the exact same integrations as I, I must fork and maintain git repos for 100 libraries. That's completely crazy and backward.

In my solution (which isn't original to me), you achieve the same effect with no fork whatsoever.

Note that the morons at Hackage did a small fraction of what you propose, by demanding that library authors should publish a maximum version of the libraries they depend on as part of the library itself, and that has been causing a very large amount of pain and strife within the Haskell community. I have proudly rejected several time the antifeature of enabling maximum version specification in ASDF. If there is an essential reason why new versions won't ever work and you won't support them, you must fork the dependency.