masak / alma

ALgoloid with MAcros -- a language with Algol-family syntax where macros take center stage
Artistic License 2.0
137 stars 14 forks source link

Implement the 'package' level of modules #462

Open masak opened 5 years ago

masak commented 5 years ago

There's a risk that when we've implemented modules — that is, being able to import a file from another through the import statement — there's a whole part missing in order to be able to snap together the three modules in the conscious-risks ecosystem ([1] [2] [3]), and we all go "OK, now what". This issue is meant to look one corner ahead and address that.

There are three common meanings flying around for modules, all of which become relevant to 007 users at one point or other:

That last level is a bit hidden to people. It's more like "I just want to install this module". It's one of those things (kinda like with block scopes vs stack frames) where we might be doing best in upholding the illusion of simplicity for the user and going "yes indeed, you want a module — fine", even though they are something that contains modules, not modules themselves.

On the other hand, I really really really don't want to build another package installer. It's supposed to be extremely hard to get right, and there's very little in there that would benefit 007 the language-experiment-for-doing-structured-language-extension.

I want to walk a delicate balance between the simplest thing that could possibly work and aargh no please not another package manager.

Here's my proposal, taken for the particular example of the conscious-risks ecosystem.

Let's say I want to add the two dependencies ascii.header and boxify to my conscious-risks game. I'd issue these two commands:

$ 007-dep add https://github.com/claes-magnus/007-ascii-header-printer/ ascii.header
$ 007-dep add https://github.com/claes-magnus/007-boxify boxify

Both these commands leave zero output if all goes well, but they create or change a project.yaml file to look something like this:

---
provides: {}
depends-on:
  ascii.header:
    uri: https://github.com/claes-magnus/007-ascii-header-printer
    sha1: eac0a54739e56c22dc3bd48d3e16e0c53276197f
    path: lib/ascii/header.007
  boxify:
    uri: https://github.com/claes-magnus/007-boxify
    sha1: 8dfaf262a4fac9a9c3ec7587a8afc4b71376a28d
    path: lib/ascii/boxify.007

(That is correct YAML. I checked. I'm wary of adopting this format, but it's also quite clean, and we could mandate using only the subset we see above: nested dicts with strings. sYAML.)

Long story short, when we later run risks.007 and it contains, say, an import statement pulling in ascii.header, the 007 runtime will know not to look among the project's own modules, because there's a declaration in package.yaml saying which repository, which Git revision, and which file path to go to in order to find that module file.

Note first that we're thereby taking on a kind of dependency on Git. Not Github particularly, but on Git. (A project on GitLab, or even something hosted on someone's server, should work just fine.) I'm OK with that. The way to lock onto a SHA1 is my way of completely sidestepping versions and semver and whatnot. It's not super-elegant, but it's the kind of extreme simplicity I'm looking for.

Note second that going to fetch those dependencies should happen as we run the 007-dep add commands. We can't have it happen every time we run risks.007... so we cache the result in a hidden directory called .dep-cache. Like with node_modules, you're meant to git ignore this directory. As you're also supposed to commit and push your package.yaml file, when your colleagues (or whatever) download your repo, they have to 007-dep install all the third-party stuff. A suitable error message when .dep-cache doesn't exist or doesn't contain things declared in package.yaml will push people in the right direction if they accidentally try to compile something with a missing third-party dependency.

I think that's it. The two depended-on modules in the conscious-risks ecosystem would in time need to (a) move things into the lib/ directory and rename files correctly, and (b) get a project.yaml file of their own. But that's a fair price to pay, in my opinion. I can make the appropriate PRs for that after we have things well-tested on the 007 end of things.

The provides field would be used by the ascii.header and boxify packages to expose their respective modules. The last argument of 007-dep add could be omitted if there's only one module in that field. I think there should be a 007-dep init command to help create the package.yaml file — because, as usual, life is to short to hand-write YAML.

I'm almost a little pleased that this scheme avoids an npm-like central package authority. Instead we rely on Github for that, or rather, on URIs. There's no way to steal the name boxify for ever and ever.

There's supposed to be a 007-dep remove command, not described here. Presumably there could also be a 007-dep update command, for those brave enough to update to a dependency's latest commit.

masak commented 5 years ago

While not true for the conscious-risks ecosystem, in general dependencies can have dependencies. Only the direct dependencies will end up under depends-on in package.yaml, but all of the transitive dependencies will end up in .dep-cache.

So, for a given project we're really looking at a dependency tree. No, wait, a dependency DAG; two projects somewhere in the DAG can very well depend on exactly the same URI-and-SHA1. (The .dep-cache directory should probably be SHA1s on the first level down, and then just copies of the inside of lib/ directories.)

From this, I think it's even fine for different parts of the dependency DAG to pull in the same project at different SHA1s. That should just transparently work.

What's not OK is cycles. Under reasonable assumptions, SHA1s make sure that things are "well-founded" and don't refer to each other cyclically forever. Maybe that's good enough. What I mean by "reasonable assumptions" is that someone might put in the work and compute a special pair of SHA1s of projects that could refer to each other. In that case, I almost feel they deserve whatever error message we put in for that scenario.

There's still the question of projects referring to each other cyclically when SHA1s are discounted. This, I think, we could detect — unfortunately not at 007-dep add time because at that point we don't know the URI of the current project. My feeling is that this will be very rare in practice, though, so I'm fine with not fretting about it so much. Again, in any case, the SHA1s are guaranteed not to be cyclical.

ghost commented 5 years ago

I guess there would be technical difficulties related to the following proposal, and I havn't thought about the details. So take it for what it is, a random thought. And please excuse me, if this would not be the place for that kind of activities.

One solution could perhaps be to make use of a separate repository, a library of a sort. If you don't plan to scale up your ambitions with 007 (an idea which I, as you might have gathered from mail conversations, would love, but I'm not, on the other hand, the one that would have to do actual work :)), it would be possible in practice? That would include the more narrow, partly (or wholly) wrongful view on what a module is. But in a sense, does it really matter what you call things. I am thinking in terms of a .h file. And also a .h file that would meaningful to share with others. This would mean that the header and the box module would be one kind of module (because other people perhaps would want to use them in their own projects), while the game also would be a module but not of another kind; to more precise, the kind of module that other people would use, but not re-use in their own projects. This would, I think, limit the amount of projects included this library-repository; perhaps this repository could be more anarchistic in that case, use a even more free license (a no rules license, beside the rule no rules-ish/MIT).

As a developer you naturally would want to make your own different modules locally, just as you intend. And you'd also want to make use of other peoples modules without being forced to clone their projects and extract the function you need. By use of a anarchistic 'public' library of this sort, the need for a package manager would disappear and you and other people in the 007-team could concentrate on other tasks.

(I know that you can make 'private' (not the keyword) .h files in C/C++, but you get my point... If it is an idea that is good or not, that I don't know.)

Just a thought. :)

masak commented 5 years ago

But in a sense, does it really matter what you call things.

Careless phrasing on your part, perhaps, but... yes, it matters? 😄

\<romeo> A rose by any other name would smell just as sweet. \<bart> Not if you called it "stinkflower"!

Or, to be more precise, welcome to 007. Here, it really matters what you call things.

I am thinking in terms of a .h file.

Modula-2 (one of the first languages to implement modules) also makes a split between "interface" and "implementation", the way .h files do. To me, the interface is declared implicitly, or at least very much inline, by the export statements in a module file. I consider that a feature (and one I don't take credit for; JavaScript does the same) — things can't grow inconsistent if they're only one declaration instead of two files with partially repeated declarations.

And you'd also want to make use of other peoples modules without being forced to clone their projects and extract the function you need.

I think we're on the same page here. That's what I'm trying to do with my musings in this issue — allowing dependencies between projects/modules without (or "rather than") copy-paste.

By use of a anarchistic 'public' library of this sort, the need for a package manager would disappear and you and other people in the 007-team could concentrate on other tasks.

I'm reminded of the "monorepo" structure some projects have chosen. (Though that'd be a single repository, not two.)

I dunno. I think your proposal might solve some problems and cause others. Actually, there's nothing stopping anyone from creating a large repository of everyone's modules like that. But I think it loses a thing I didn't point out above: the project/repository as the boundary of updates/releases. I might expand on that at some point.

Also, while I don't have any illusions 007 will ever grow a sizable community, I have very mixed feelings about shutting such a community into a single repo and telling it to play there.

Please don't consider my answer final. :smile: I'm still mulling over these things.

ghost commented 5 years ago

About the word part. Yes, it totally agree with you. It WAS a very careless, clumsy formulation. Words matters. But in this case, I still think it's a point to this. Partly only partly a point, it only because I lack the accurate terminology I don't know what to call it. And I actually think that because of what you said the other day (that's why I wrote like that), 'module' is not the word I'm looking for, since also a game would a module. :)

I have very mixed feelings about shutting such a community into a single repo and telling it to play there.

I understand that. But I didn't mean this to be the only way, more of A way to handle the situation. No one would stop anyone from creating another lib-repository. I think it would be quite handy to collect all modules in one place, as long as one could choose what to actually include in her/his project.

But you know what, you've already convinced me this is a bad idea. Now I know what I think; I was a bit ambivalent and what way are better then 'testing' your thought on someone else. And you don't seem to mind spar-n-correct

masak commented 5 years ago

I am thinking in terms of a .h file.

By coincidence I ran into this criticism of C# C++ compile speeds. Yes, part of the reason is that C++ encourages putting too much in its .h files — not just interface details, but implementation, too. Thus things need to recompile too often.

Funnily enough, that section ends with the sentence "One suggested solution is to use a module system".

This is also touched upon in the outstanding C++ FQA.

vendethiel commented 5 years ago

That « C# » probably doesn’t want to be here

masak commented 5 years ago

Indeed. Fixed; thank you.

masak commented 5 years ago

Coming back to this one, and thinking about ergonomics:

Let's say I want to add the two dependencies ascii.header and boxify to my conscious-risks game. I'd issue these two commands:

$ 007-dep add https://github.com/claes-magnus/007-ascii-header-printer/ ascii.header
$ 007-dep add https://github.com/claes-magnus/007-boxify boxify

I'm very tempted to go with @claes-magnus's idea of having a library repository, except (a) not as a thing separate from 007/Alma itself, and (b) only listing names of third-party dependencies, linking them to URLs.

That is, in the case of the above invocation, I'd be able to get away with

$ 007-dep add ascii.header
$ 007-dep add boxify

which is of course a lot nicer.

An extra level of nicety would be for users to be able to easily have additional (third-party) lists, somehow. But that doesn't have to be in a minimum viable implementation.

masak commented 2 years ago

I want to add this discursive post about building package managers to this issue. I've skimmed it; need to go back and read it more carefully (and then maybe write a thoughtful summary here). I found it in one of rsc's articles about Go package management.

masak commented 6 months ago

There's also this blog post praising the tip-of-the-iceberg utter simplicity of go run main.go. I want to take something away from that which can be easily summarized. Maybe it's simply that, if you do your build system right, including package management and reproducible builds (as Go does), then the equivalent of go run main.go is the sweet, sweet payoff for you and all of your users.

vendethiel commented 6 months ago

I was re-reading this ticket and "Reason for Modules" recently, as I added exports/imports/modules to a toy Lisp I have on the side. Since I did the simplest thing that could possibly work, I didn't even try to consider how I'd build a package manager. But it's something that was in the back of my head. I think for the most part, you can start (and stay) with an existing option. PureScript used to use bower, and now uses npm, both of which were targeted for the JS ecosystem.

masak commented 6 months ago

I think for the most part, you can start (and stay) with an existing option. PureScript used to use bower, and now uses npm, both of which were targeted for the JS ecosystem.

That is a good point. Using something existing is good not just because of the decreased workload, but it also creates an affinity with something already existing.

After recently seeing Herb Sutter praise backwards compatibility to the skies, it has been on my mind that taking "uncompromising interop" (with something, C or Java or JavaScript or Raku) is a really good idea, or at least something to seriously consider. It the kind of design thing that has to be done from day 1, and can't be bolted on later. But Alma's design was never that beholden to anyone or anything, and it's never too late to have a better day 1 if we want.

masak commented 4 months ago

This post with an overview of Python environment management and packaging tools, makes me think. I guess the question for now is "how many of those five circles ought one design in from the start?".

For now, I have no simple answer. Need to think.

masak commented 2 months ago

In this discursive post defending Rust productivity, I found a compelling argument for Rust's smaller "module" level and bigger "crate" level:

Rust is one of the few languages which has first-class concept of libraries. Rust code is organized on two levels:

  • as a tree of inter-dependent modules inside a crate
  • and as a directed acyclic graph of crates

Cyclic dependencies are allowed between the modules, but not between the crates. Crates are units of reuse and privacy: only crate's public API matters, and it is crystal clear what crate's public API is. Moreover, crates are anonymous, so you don’t get name conflicts and dependency hell when mixing several versions of the same crate in a single crate graph.

This makes it very easy to make two pieces of code not depend on each other (non-dependencies are the essence of modularity): just put them in separate crates. During code review, only changes to Cargo.tomls need to be monitored carefully.