Open belmarca opened 6 years ago
Well, the Makefile is redundant -- the package assumes a build script which builds with the standard build tool (:std/make
).
But this is a great proposal overall, I would very much like to have searchable metadata and info for packages.
See https://github.com/vyzo/gerbil/wiki/The-Gerbil-Package-Manager#a-word-of-caution for more discussion about security, signing of packages, etc.
@vyzo Regarding :std/make
, I will have to think about it. As I mentioned in chat, I think the package system should be native to gambit and not necessarily particular to gerbil. Thus using gerbil's build scripts by default would more tightly couple the project to gerbil. However I am being realistic about the efforts needed to implement the equivalent of gerbil's package management in native gambit and I will consider using a build.ss
.
As for sandboxing, this can indeed be a tricky problem. Maybe the package definition/manifest should define the created files. Then maybe we can execute the compilation in a firejail
(https://github.com/netblue30/firejail), and only copy the set of files mentioned in the manifest from the firejail?
I am absolutely not the person to make recommendations on sandboxing, but that looks like an interesting option.
Here is an example of a package repository: https://github.com/belmarca/gxpkg-example.
I am opposed to using Makefiles as the build vehicle.
The package manager already utilizes the build script which covers all basic functionality for building gerbil (and properly namespaced gambit) code, handles dependencies and build order, and so on.
Also, there is a simple wrapper macro that defines the build script with just a couple lines of code -- see :std/build-script
.
So I don't see any good reason to make the build system depend on make, it's a step backwards. Nonetheless, you could include a Makefile that calls the build script itself!
Now, with regards to the basic functionality of search and metadata.
Firstly, we need additional metadata in packages. This can be simply done by having a metadata:
field in the gerbil.pkg
plist. This could include tags and anything else we deem useful.
The second issue is discovery of packages. We could have a server at gxpkg.cons.io that allows you to register packages by linking to the github repo. The server could then fetch the metadata (and periodically update, perhaps with github integration with commit hooks) and store them locally for answering queries.
The implementation of gxpkg search
could then query the metadata server.
The implementation of gxpkg info
can answer queries about locally installed packages or perform a query to the remote server.
We can also cache the package repo metadata locally to avoid having to hit the server for every query.
As you know I have been designing Gambit's module system with @FredericHamel and have some solutions to the issues you raise. But first I'd like to discuss the "build script" as this seems to be contentious. @belmarca suggests a Makefile and @vyzo a Scheme script. Another option we have considered is a shell script. I can see the following pros and cons:
Makefile; fairly standard Unix tool but still there is a dependency on the existence of make (Windows, iOS, ...) and on the semantic variations of the make utility, does not automatically take into account the library dependencies indicated in the source code so these have to be duplicated in the Makefile
Shell script; even lower level, but can call up any tool (make, gsc, git, gfortran, ...) possibly with a bit of work to make the script portable
Scheme script; no need for external tools for simple (normal?) build situations, however for simple situations the build process can be automated (just compile all of the dependencies and link) so maybe a Scheme script doesn't add much value
So perhaps the build can be automated from the dependencies, and if a build shell script or Scheme script exists then it can be used for more complex situations.
Your thoughts?
gxpkg can also work with a shell script, provided it supports a couple of meta-commands (deps
and compile
).
But it won't be able to clean packages without the spec
meta-command, which returns an :std/make
build-spec.
My experience with Common Lisp was that translating ASDF's bootstrap build system from a Makefile plus shell (and perl!) scripts to a pure Common Lisp "script" was intensely satisfying. (ASDF builds CL code from CL, the bootstrap build system builds ASDF from the shell, vanquishing all portability and dependency issues, with lots of additional targets for testing, releasing, etc.)
My co-maintainer Robert Goldman though didn't like it: he experienced a whole lot of portability issues that I had to fix and yet made him deprecate the system. Having to deal with 10 major implementations most of which run in each of 3 major OS families must have played a role in it, though.
I would like a pure Scheme build system... but then, having worked on XCVB, ASDF and Bazel, in addition to using a lot of different build systems... well, I have my grand ideas of what a build system could and should be... https://ngnghm.github.io/blog/2016/04/26/chapter-9-build-systems/
@fare I'll take a look at your writings... sounds interesting.
Portability is a really really really important feature of Gambit so I'd like a build system that is not dependent on the OS. So shell scripts and Makefiles are not ideal. But I worry that a Scheme script will be awkward for building modules that are more OS dependent (for example an interface to a C library with lots of dependencies with other C libraries), where "good old standard" tools are more appropriate for that. I believe there is a spectrum of build situations that require a gradually more detailed/low-level build procedure. I feel there is no single best method, so several build situations should be supported. Something like
This is only to give the general idea... I'm not (yet) proposing this exact hierarchy. The idea is to make the programmer's life as easy as possible for a particular build situation (level of detail).
My experience is that Lisp or Scheme scripting is so much more pleasant than shell-scripting that just... wow.
On the other hand, yes, it's important to be able to call out to external programs including shell-scripts, and to shell out a pipe or two in some cases. In CL, I maintained an awkward uiop:run-program
compatibility layer and inferior-shell:run
as a more usable layer on top. In Scheme, well, I wrote Gerbil's std/misc/process
, that wraps around Gambit's open-process
. It's OK for simple uses, but is missing a lot of the features available in CL, even more features that CL didn't provide, and isn't extensible.
Ideally, you'd want a way to handle, e.g. resetting signal masks before execve, handling arbitrary redirection of arbitrary file descriptors (including user-defined extensions for pipes, ptys, sockets, etc.), tweaking OS personality and arbitrary configuration system calls in the child, not just chdir
, etc. Of course, making that portable, too, is a lot of work, and requires experimentation. GC, multiprocessing and other features should probably be disabled in the child pre-execve, though.
At least, having chosen Gambit, we don't have to deal with 10 different implementations' internals...
In a nutshell, my view is:
Or, things not to do:
Here's another important issue... how are modules/libraries identified unambiguously? Obviously this is important to attribute an unambiguous meaning for a given module. Our approach is to use a URL to name modules, specifically a github URL or other publicly accessible git repo. The repo tag is part of the URL to give it an unambiguous meaning, for example (import (github.com/feeley/digest/tree/1.0.0))
. There are also Gambit "builtin" modules that are unambiguously identified by a short name, for example (import (gambit regex))
. Among other things, this simplifies the migration of code from one node of a distributed system to another (the migrated code contains a reference to the library name and the receiving node, if it does not have that library, can automatically download it, checkout the specific commit/tag, build the library and load it to continue executing it).
I believe there is no escaping managing your own module namespace.
URLs won't do it: a project may change hosting, it may be forked, it may be replaced (mostly)compatibly by something else (from the same authors or different authors), etc. What is the URL for the libc
? Some obscure page from some standards body? The GNU libc webpage? That of *BSD? What about musl
? uclibc
? klibc
? dietlibc
? Magic autodetection of whatever was used? Even if it's a local fork?
I recommend embracing the fact that you're going to have to manage a new namespace.
URL is not encoded in the library's source code, it is determine at compile time. This means that if a project change hosting the namespace also change to the new URL.
Still a bad idea. The URL has no advantage and only disadvantages.
The "real" source code for a library dependency is NOT the choice of the author of the using library, but that of the integrator who builds a complete system from a set of precisely versioned libraries. The exact URL of the source, fork, commit, patch, etc., used for the library depends on the integrator, not on the author of any library.
Just adopt a short namespace, and let the integrators do the mapping not just to URL, but to specific commits/tarballs.
Thanks for the discussion. It seems there are multiple intertwined arguments going on though. I will not concern myself with the internals of the actual build system. I will simply agree with @feeley that we should not impose too much on authors. Having read most of the text by @fare , I understand why gerbil wants to use a build system written in gerbil. However forcing its use upon the package maintainers, library authors etc is not the way to go IMO.
If broader usage of Gambit and Gerbil is a goal, I think Marc's multi-level proposal is the one to follow. That is, Gerbil should have its build system written in gerbil, but any newcomer should also be able to write and share a package using whatever build method suits them best. This only facilitates adoption. One could trivially add a tag to packages to indicate if they are built using anything other than exclusively gerbil's (or gambit's future) build system. Call these "pure" vs "impure" or "dirty" vs "clean" packages. Or even just tag the packages with any and all combinations of "make", "shell" or "build-script" meta tags. This is simple to do and allows users to forbid installation of certain classes of packages. Personally, if I am forced to learn a new build system while also learning a new language and a new runtime, I might well not bother. These things take time and not forcing their usage seems like a good middle ground.
As for unambiguously identifying modules, one way would build a DAG in which vertices are modules/packages and edges dependencies. Packages point to particular commits of dependent packages. Thus if package B at commit b depends on package A at commit a, the particular state of package B at commit b can be rebuilt even if package A gets more commits. Then package B can be marked as having stale dependencies, for example.
In practice this means that the author
metadata is inferred by the package metadata server by looking at the github repo (i.e github.com/author1/packageB). Installation of said package will clone the repo, checkout at the particular commit if needed, then clone the dependencies (github.com/author2/packageA in our case), checkout that package to its required commit, and build.
Am I making any sense?
Note that this scheme does not imply tight coupling to any provider whatsoever. It shouldn't matter if you get your code from wherever. The use of GitHub is for convenience and solves the authorization issue (I can't publish code under author names that I do not control).
The "not imposing a build system" bit reminds me of this Koan about Sussman: http://www.catb.org/jargon/html/koans.html#id3141241
As for pointing to a specific commit of a package dependency in a specific repository: once again, it's the job of the integrator, and emphatically not of the package author. Anything else is crazy... and yes, it's been tried... then when you patch one tiny library for a security bug, you must recursively update all transitive using libraries... nuts.
Please re-read the section of my essay about the four roles around a library. https://ngnghm.github.io/blog/2016/04/26/chapter-9-build-systems/
@fare Let's say my code depends on yours and you go through a major refactoring and large but needed API changes. What am I to do? By fixing my code to a particular commit of yours, I can assure that it works even if you refactor (unless you destroy the commit in question). You do not have to do anything in particular.
And about the build system: it depends on who is your target audience. If you want "ordinary" people to contribute packages (I consider myself one of them), you cannot impose too much IMO. Otherwise it gets in the way. Maybe I can change my mind if the build system has great examples and documentation. It's just less trouble I would think. But then again it depends on who is your target audience.
I didn't find that section in particular very enlightening. I preferred the following section (Pure Functional Reactive Programming) and agree with the conclusions. I'm not sure where we are diverging especially with regards to build determinism. How can I make sure my build is deterministic through time if I do not peg dependencies to particular commits?
You confuse many things. For your local installation, you're an integrator, so pick whichever version you want.
As you publish your code into the wild, others want and need their own integration, and you cannot and should not try to impose your own.
If there's a long-term fork between the code your library needs and the upstream, then it's a fork, so it needs a different name in the namespace.
I might be confusing things but I don't see how what you propose solves this situation:
Package A has a hard dependency on package B. Last known working version of package A used package B at commit b. That is guaranteed by the author of package A.
What you seem to propose is that the package manager treats all package dependencies as referring to HEAD and then the particular integrators, on their own machine, either figure out which particular commit to use or simply make changes to the library?
You confuse the build system and the package manager.
Libraries cannot and should not specify a specific version of their dependencies. That's the job of the integrator / distribution.
Look at how Nix packages are defined: each package captures its direct content, but it's the overall repository (plus overrides) that defines the mapping of names to versioned packages, and does a fixpoint when building.
Library A depends on library B. Library C depends on library B. Which library author gets to set the version of B ? The one with the biggest penis? Nope. Neither. The integrator does.
Library A set which version of libary B it depends. Library C set which version of libary B it depends.
The system could allow multiple version of the same library to be installed.
That's not how any of this works.
Unless B is pure functional and there is no indirect interaction between A and C, e.g. via D that depends on A and C.
When I fix a crucial security hole in openssl (real story), I don't want to propagate that everywhere in every single damn package. I let the package manager compute the fixpoint for me.
The whole point of having a library is to provide an intent that insulates the users from the contents of the library.
This intent has to be coherent between different simultaneous users of the library, i.e. they must see the same content.
If the library depends on installed files to read at runtime, unless you use Nix, there will be conflict.
And if files maintaining read/write state are involved, then even if you use Nix, there will be conflict.
You confuse the build system and the package manager.
That's what I meant by
It seems there are multiple intertwined arguments going on though.
:smile:
Library A depends on library B. Library C depends on library B. Which library author gets to set the version of B ?
@fare This is a non sequitur IMO. The author of package B is free to do whatever they want, whenever they want (event destroy commit history). So in the case of openssl, you just commit code to fix the security hole. You don't care about any package that depends on your library.
Then, like @FredericHamel mentions, it's the dependent libraries that set the depended upon library's particular commit. Thus inside package A we have a reference to library B in the state it was at commit b0 and inside package C we have a reference to library B in the state it was at commit b1.
Now if B is vulnerable, and a patch has been made available, it is the responsibility of the maintainers of A and C to fix their dependencies.
The package metadata repository could notify users that package B at commits below b2 is insecure. This is extra work but can be done.
How does Nix do it? I'm just starting to read the documentation, maybe you could point me to the particular concepts I am missing?
If I have to make a recommendation for the namespace, I'll just say: keep it mapped to the filesystem (modulo search function / mounting / mapping / indirections configurable by the user), just like Perl or Python, and without the idiotic prefixing of Java.
When you're thinking at the level of integration, Nix / GUIX should be the source of inspiration.
D depends on A and C, that both depend on B. Who decides which version of B is used?
(Extra points: E depends on D. F depends on E and directly on B. etc. Generate more non-tree-like DAGs ad nauseam.)
I'm not against authors sharing their successful integrations, if they like, especially as part of unit testing. But these obviously cannot be authoritative for anyone else, and should NOT be part of the build files as such.
It's not ignorance that does so much damage; it's knowing so darned much that ain't so. — Josh Billings
D depends on A and C, that both depend on B. Who decides which version of B is used?
Both, or none if you wish. They each use their own namespaced functions from B.
In A: (package-b#commitX#some-function arguments)
In C: (package-b#commitY#some-function arguments)
The author of A does not care about the version of B the author of C uses, and vice versa. He calls (import :author-b/package-b commitX)
or whatever the syntax, and then the functions from B are namespaced under b#commitX
. If one calls import without a commit, HEAD
is assumed and the functions are simply called under the b#
namespace. The same thing goes for the author of C.
Now D depends on A and C, and thus should never directly call functions from B, so there is no problem here.
What exactly is your objection? More specifically, where does it lie? I cannot figure where you want to put the version pinning in practice. A cursory look at Nix doesn't give me the information I need to see where I'm being mistaken. Just point me to a particular thing to read and I will oblige, but I have to say I don't understand how my proposition wouldn't work.
I'm with @belmarca on this. @fare's development model has the integrator assigning semantics to the modules. In other words it is the integrator that chooses which version of a module to use. But this is madness! When I write a program I must pin down the combination of module versions that works for my program. If the integrator has a say in which versions will actually be used, this could break my program. The integrator (which doesn't know my code in depth) could think his choice of versions is OK, but subtle problems could come up.
Perhaps I don't understand the different roles (integrator, author, other?).
In the system I have designed with @FredericHamel, different versions of a module can be installed on a system, and different versions of a module can be linked with a program. With git this is rather easy to achieve. If versions X and Y of module M are required, then two copies of M's repo are created and a "git checkout X" is done in one copy, and a "git checkout Y" is done in the other, and each one is built independently.
@feeley The git method is what I initially proposed. Different versions of package B are installed in the same top level directory, package-b
, yet HEAD is actually in package-b/current
and the other installed versions are in package-b/commit-hash
. Importing the lib thus has namespace package-b#commit-hash
.
Perhaps I don't understand the different roles (integrator, author, other?).
@fare I believe this might be my case as well. I will think more about your model.
I'm not against authors sharing their successful integrations, if they like, especially as part of unit testing. But these obviously cannot be authoritative for anyone else, and should NOT be part of the build files as such.
I believe this is where I don't quite follow the argument. Can you write a simple example to show me how this would be done in practice? A gist is fine, or a repo.
Let me make a concrete example to help clarify the different models. Say I have a program P that depends on modules A and B, and module A depends on module M, and module B depends on module M. However A uses v1 of M and B uses v2 of M.
In @fare's model (for lack of a better name) the source code indicates the dependencies without the version information. The dependencies are thus
Note that P, A, B and M could have been authored by different developers.
In this model, when P is linked a specific version of A, B and M must be chosen by the integrator. So it is the responsibility of the integrator to ensure that the version of M that is chosen is appropriate for the uses by A and B (either v1, v2 or some other version). That seems like a daunting task as the integrator did not author A or B, so how can he be sure his choice is OK? Moreover it is possible that no version of M is suitable because of incompatible versions (A may depend on a feature of v1 of M that has been removed in v2, and B depends on a feature of v2 that did not exist in v1).
In the model @FredericHamel and I are implementing, the source code indicates these dependencies
When the system is built, both M/v1 and M/v2 are built and linked with P. In some cases, this will just work.
One case where this may not work is when some data created by M/v1 through A/vX is passed to M/v2 through B/vY. For example M could be a regexp library and A/vX uses it to compile a regexp, and B/vY calls M/v2 to execute that compiled regexp. Because M/v1 and M/v2 are different, the data created by M/v1 may be unusable by M/v2 (for example the regexp compiler and execution engine may have evolved in incompatible ways). Note that this situation can be detected by changing the type of the data between versions.
But this situation is not hopeless. A new version of A can be created that replaces the dependency to M/v1 by a dependency to M/v2. Call this version A/vZ. Now the souce code indicates these dependencies
Note that version vZ of A can be obtained by
This can be viewed as a kind of integration work. However it is a more modular form of integration than the global integration needed by @fare's model.
The "incompatible evolution of M" situation can be addressed by creating M/v3 and new versions of A and B (by asking the authors of A, B and M or creating forks, or a combination). This may be a difficult task, but possible, and no more difficult than in @fare's model.
I believe that tightly integrating the module system with versioning is a mistake -- you get versioning hell, and npm and worse.
I believe by default the system should build versionless from master (which is what gxpkg currently does).
Explicit versioning should be reserved for major events, with interface breaking changes and those can be handled with a git tag (or (gasp) a git commit anchor). Note that multiple versions of the library in the same namespace is an idea I very much frown upon -- code and object duplication, and naming/refering things become difficult.
If there are specific library versions that need to be pinned, then you can make a fresh GERBIL_PATH
and pin there with specific commit tags for integration purposes. Note that pinning is not yet explicitly supported by the tooling, mainly because it hasn't been a problem yet, but it's straightforward to add.
Nonetheless, that's something that the integrator of the system can handle, and it comes into play only for application deployment that absolutely needs to depend on some specific version.
But at the end of the day, development goes forward. Instead of depending on an obsolete interface, you should update your package to match the latest interface of your dependencies. Otherwise you just create a roadblock that requires a version pin which holds the entire package hierarchy back. And if it's not your package, well, it's still open source -- open a PR and if the author is not responsive, you can fork!
Also, before you enter yet another version quagmire, read on what the go people have to say about the issue: https://github.com/golang/proposal/blob/master/design/24301-versioned-go.md
I would also like to add that Scheme|Gerbil is not so brittle with regards to interface changes, because there are no type signatures -- just functions (and macros). So as long as you keep your export list tidy with a well-defined interface, it's unlikely to have your dependents blow up because of internal refactorings.
Another point that is not salient here, is that Gerbil's explicit control of the namespace (with the namespace and package declarations, together with gerbil.pkg hierarchies) allows both new and old interfaces to co-exist.
So when you make a new interface for your module you can keep the old one and introduce a new sub-namespace for your new interface by virtue of namespace control.
So let's say you you have your initial (call it v1.0) interface be:
package: vyzo/my-lib
(export ...)
You can introduce the new interface while preserving the old one in the hierarchy. You can also simply rename your module main api interface to be:
vyzo/my-lib/v1 ; for the old interface
vyzo/my-lib/v2 ; for the new interface
All this is possible with simple operations at the module system and its hierarchical structure. You don't have the baggage with types and interface and all those transitive versions of things. You just have an api module that exports your library's interface. When you introduce a new major api, with whatever external refactorings, you keep the old api stub and you can rename it and move it elsewhere in the namespace hierarchy.
This makes upgrading dependent packages to pin to a specific api trivial --
the import becomes (import :vyzo/my-lib/v1)
instead of (import :vyzo/my-lib)
.
All this indicates to me by using separation of module and build system, with package level integration, and programmer controlled free selection of namespace solves the problem at the language level together with a common sense social contract. We don't have to invent versioning to make a stable package ecosystem that can evolve without friction.
Note that namespaces can be populated by multiple independent packages.
So you made a new shiny interface for your library and don't want to keep around the old one?
Fork and re-namespace stuff so that you have v1
and v2
co-exist.
To finalize my tirade:
I am philosophically opposed to polluting programming with versioning cognitive burden.
The module system should let you control your namespace and your library structure so that you don't have to care about those things.
You should simply (import :clan/stuff)
and not care where it came from.
And if you need to pin to an interface version, then there should be a sub-namespace for that specific interface -- which can live in an entirely different package at the distribution level.
So you want a versioned deprecated interface? Use (import :clan/stuff/v1)
. It comes from some package at the distribution level, fork or not.
As for the details of the version and the exact commit of some package? I would argue that this is the job of the package manager/build system, which is designed exactly to handle assembly of binary artifacts from various sources.
Philosophically, when a programmer writes a module A that imports module B how can the programmer give guarantees on the behaviour of A? In other words, what is the meaning of A? The meaning of A will only be precise if the meaning of B is precise. So the programmer can only guarantee that A works correctly if the meaning of B is known precisely.
For our work on Termite Scheme this is a critical issue for code migration (the ability for one node of a distributed system to send a procedure or continuation to a receiving node, and to resume the computation there). If a procedure or continuation from module A is sent to the receiving node and the receiving node has a different meaning for B (different version installed for example) then the code's meaning will change and the resumed continuation will not behave identically. Having the version information in the module name allows giving it an unambiguous meaning, and code migration can be transparent (there is no need for an initial installation of the same modules on all the nodes of the distributed system, something that is impossible when the distributed system is large and has a disorganized evolution, such as the Web).
Note that currently our module system does not force using version information in the module names. But it allows doing so with a syntax that matches the one from the vcs, for example (import (github.com/feeley/digest/tree/1.0.0))
gets you precisely version 1.0.0 of the digest module, but (import (github.com/feeley/digest))
gets you the repo's HEAD, which is ambiguous because it changes with time. An (import (foo/bar))
gets you the currently installed module foo/bar
(no specific version). A program containing an import to an unversioned module name is likely to cause problems in a distributed system.
Another feature we are designing is module name aliases. A module's source code tree can contain a file that defines aliases of the form (define-module-alias digest github.com/feeley/digest/tree/1.0.0)
so that within the module's source code tree any (import (digest))
will be equivalent to (import (github.com/feeley/digest/tree/1.0.0))
. This makes it easier to change the version of dependent upon modules.
So I have the feeling that both approaches offer the same functionality, just in different ways.
In real life, there is no program of sufficient complexity that is actually correct, no matter what versions you choose -- all real programs have bugs, inconsistencies, and what have you.
Software is a living thing; libraries evolve, bugs get fixed, stuff gets introduced. If you add versioning in the module system you just added cognitive burden and overhead in development of code. You now have to track versions everywhere and change them every time there is a change for some fix or you risk becoming the bottleneck and suddenly you are the roadblock that forces duplicate versions of some library -- and have i ranted on the madness of this? good god, i think not yet.
For the programmer, what matters is the high level interface of the library and that's what you program against.
This is best expressed by saying (import :feeley/digest)
rather than involving distribution peculiarities.
At integration and build time, you can pin specific versions if you want, for instance to get hermetic reproducible behaviour with exact semantics. But if you put it in the module system, you just made everyone obsess about versions and make a mess every time there is an update.
In real life, there is no program of sufficient complexity that is actually correct, no matter what versions you choose -- all real programs have bugs, inconsistencies, and what have you.
I'm not just talking about "sufficiently complex" programs or the existence of bugs... a 10 line program that imports an unversioned feeley/digest will no longer work if that module removes a procedure definition that has been determined to be redundant but happens to be used in the 10 line program, even if the old and new versions of the digest module contain no bugs.
But lets move the discussion away from preferences and opinions... how would you tackle the code migration issue mentioned above in practical terms?
I'm not just talking about "sufficiently complex" programs or the existence of bugs... a 10 line program that imports an unversioned feeley/digest will no longer work if that module removes a procedure
That's an interface breaking change... We want to encourage an ecosystem of well thought out and cooperatively designed software for the commons, where this stuff just doesn't happen. Why was that redundant procedure exported? And why was it then removed, was it too much burden?
And still, the broken program's programmer can just go ahead and fix it or fork the old version and link to that, or with more advanced functionality from gxpkg, pin a specific version for build.
Note that my reference for real-life programs was more in response to your "precision" argument; i don't buy that, you can't really make precision statements about real-life software, stuff is just too complex...
how would you tackle the code migration issue mentioned above in practical terms?
I think you are thinking naively about code migration, and you even assume that it's a good idea :)
Let's say you had a complete versioned reference of the code you want to migrate. How would that load in the new environment -- it requires a precisely the same version of the system (and how are you going to make that happen in a distributed system of any scale is an open question). Or are you going to be loading more libraries dynamically to support that mobile code?
And then you reach that point where you need a sandbox for your mobile code, and then you might as well use whatever code you are linked with and let it break if there is a version mismatch. Or you can use a more intelligent payload format that conveys the version and refuses to load if there is a mismatch.
Point is, either you have the exact same version of the code required for your payload (which you can do without baking it in the module system, simply by hashing stuff) and your code runs, or it isn't and it won't migrate. And if you are thinking loading modules live from the Internet to support migration of a mismatched version of code, what can possibly go wrong?
So no, I don't think that code migration is something that is calling for hard baking of versions within the module system.
Also lest we forget: i neglected to mention the word "state" in the mobile code stuffs, but it is the actual hard problem in recovering mobile code environment necessary to run your payload.
@feeley, once again, you model only works if M is pure and P has no interaction between instances of M imported from A and from B.
Let's say M defines an entity e that is modified (adding, removing, renaming, modifying some record field or function in an incompatible way) — indeed by very hypothesis, if there was any meaningful change in M, then for some e, there is an incompatible difference between A.M.e and B.M.e. By hypothesis, you can never safely use two different versions of M together whenever P wants the two to interact—Yet this interaction is often the whole point: take a math library M, compute some things with algorithms from A then feed the result into algorithms from B, oops, segfault because the internal representation changed between the two copies of M.
Marc's solution is that
Now, if I use 100 libraries that of course don't use the exact same integrations as I, I must fork and maintain git repos for 100 libraries. That's completely crazy and backward.
In my solution (which isn't original to me), you achieve the same effect with no fork whatsoever.
Note that the morons at Hackage did a small fraction of what you propose, by demanding that library authors should publish a maximum version of the libraries they depend on as part of the library itself, and that has been causing a very large amount of pain and strife within the Haskell community. I have proudly rejected several time the antifeature of enabling maximum version specification in ASDF. If there is an essential reason why new versions won't ever work and you won't support them, you must fork the dependency.
Short of Gambit's own native module/package system I am using gerbil's, which is quite nice to work with.
In order to facilitate adoption, we could have a searchable package metadata repository. This could enable
gxpkg
usage such as:The
search
andinfo
commands would simply query an HTTP package metadata repository. A list of all packages could be kept locally and updated at will. A call togxpkg install my-package
would then clone the proper repository to~/.gerbil/pkg/my-package
and run theMakefile
.We could require the
Makefile
to contain at minimum thegerbil
rule, used to compile the library withgxc
.gxpkg
would then simply call this rule and the rest would fall into place. Thus the trouble of actually building the required object files (or whatever else needs to be done) is left to the library/package author and requires only one assumption from us, the existence of thegerbil
rule. So if an author wants to write tests, they can, but we don't disallow untested code. Etc.The metadata repository's state could be mutated by
git
(or another VCS) hooks. As an author I can thus write my library locally and push it to GitHub (or BitBucket or whichever provider). Ideally, our metadata server is notified of the latest metadata with a simplePOST
. However git doesn't have post-push hooks, so that could be a little bit annoying.Package versioning could be handled relatively simply. Instead of having a master package whose
HEAD
tracks whatever commit is in the metadata repository, we could use a directory structure such as:With
current
being used whenever(import :user/my-package)
is called. A call such as(import :user/my-package#hash123)
or(import :user/my-package 'tagXYZ)
could then use the library at any particular commit. This allows the use of different versions of a package in different REPLs. If, on the contrary, a call such as(import :user/my-package 'tagXYZ)
simply checked out the particular commit (a functionality that is not undesirable), there would be a single package version available at all times (unless one wants to mess with starting different processes at long enough intervals to let the checkout from one process to complete, etc).This is obviously an incomplete proposal. I haven't discussed important details such as authentication/authorization (who gets to write to the metadata repository?) and signing of packages as well as how much trust to put into said packages (
gxc
is involved after all).Hope this gets the ball rolling :)