Implement the simplest possible module system

masak / bel

An interpreter for Bel, Paul Graham's Lisp language

GNU General Public License v3.0

26 stars 1 forks source link

Implement the simplest possible module system #391

Open masak opened 3 years ago

masak commented 3 years ago

Inspired by this proposal found in the Goo repo. (Edit: There's also another one right next to it.)

From what I can see for a first minimal iteration:

There's export and use; both of them macros that expand to code that runs at runtime.
(export NAME) expands to something that pushes NAME onto the global list module-exports, which I think we'll initialize in every compilation unit to nil.
(use MODULE) finds MODULE in the file system, executes the whole file in a pristine environment, and then makes sure that the names from the module-exports in that environment are also bound in the current environment.

There's something pleasingly symmetric and small about this set of requirements; there is nothing more to take away — export does the least bit of work to export something, and use (although complicated) does the least bit of work to import something.

Further down the line, there will be fun complications. Let's acknowledge them but not tackle them for now:

Cyclic imports? Disallowed by default; we can even put in a detector that fails gracefully. Later, there may or may not be a good reason to allow this.
Should the exported things be exported as values, or immutable bindings? (The latter option would be what allows cycles. No idea whether it even fits with the rest of the design of Bel, though.)
To what extent should we disallow export and use on anything but the top level? Should we disallow use that runs conditionally, in an if statement?
Related, should we require that the way use/export is used in practice can be "shifted left" and done completely at some static "compile time" phase? How hard should that requirement be — should we fail outright, or just give a warning?
- Relatedly: such a static phase could reasonably be expected to "keep track of names", being able to reason about when imported things override each other, or things defined in the current unit, or are overridden by things in the current unit.
Surely it should still work if exported function A uses non-exported value B? By what mechanism should that work? What if there's also an unrelated B in the unit that imported A?

But enough worrying about the future — let's do a simple module system!

...oh, one last worry:

Since use and export will end up "polluting" the regular global namespace, should there be a flag or a configuration somehow to get a "standard" Bel without them? Something like --standard to get a guaranteed Bel-compatible global namespace. And maybe --no-modules for this specific set.

masak commented 3 years ago

To what extent should we disallow export and use on anything but the top level? Should we disallow use that runs conditionally, in an if statement?

After writing this, I realized that Python allows importing something conditionally, in an if statement. A lot of things happen at runtime in Python, apparently — including imports.

My thinking about it now is that it's a natural thing to want and a natural thing to allow. It makes things a little bit harder for anyone who expects a totally static view of the import structure — but... I think I'm fine with that.

masak commented 3 years ago

Surely it should still work if exported function A uses non-exported value B? By what mechanism should that work? What if there's also an unrelated B in the unit that imported A?

I realized Kernel's $provide! offers an answer here. (And, similarly, Scheme and JavaScript's idiom of exporting a closure or an object of closures.)

The import macro could stuff the imported module code in an anonymous function, and have the function return exactly the list of required exports (whether explicit import list or implicit complete set of exports).

This might also answer the questions about cyclic imports and immutable bindings; at least the above strategy seems to indicate that neither of those can happen in that way.

masak commented 2 years ago

I just found Wren's page about its module system. One refreshing thing is that (as far as I can see) you are forced to state which names you are importing, with no "just take it all" option. Part of me likes that; part of me wonders whether it would be too tedious to list things.

I'm also tickled by the fact that in Wren's model, cyclic imports work.

masak commented 2 years ago

There's export and use [...]

Just wanted to come in here to point out that I called it use in the OP, but then basically assumed from thereon that the macro would be called import. (Edit: Which is funny, because I think I did exactly the same years ago with Alma.) I'm going to take that as a cue to myself to give up and just call it import when implementing it. It has a nice symmetry with export, too.

masak commented 2 years ago

Quoting this blog post:

How things are brought in from the standard library or general foreign code is interesting:
const std = @import("std");
const print = std.debug.print;
There is a builtin compiler marco @import that does the heavy lifting of pulling in the code, and then you assign this into a const some_var variable. This is really neat because you could call that whatever you wanted (to avoid naming conflicts). Also when you want to pull in definitions from within an imported package you just use the same mechanism of assigning the package.with.a.thing.in-it into a constant variable. Most other languages have a using foo::bar::haz::baz; type mechanism for this, but having it use the same mechanism for a bunch of different things means that you don’t have to switch in your head to another tool. I hadn’t considered this language concept before using Zig, and its a very good idea!

Blog post has a point. I want to consider it, even though I haven't come to a conclusion here.

It seems that "import mechanisms" sometimes import the module itself as a namespace, and sometimes just the exported names inside of it. Languages like Python and JavaScript offer both options. Me, I'm somewhat torn between them; I would like something that's simple, with few moving parts, but which works for 95% of the use cases.

masak commented 2 years ago

It seems that "import mechanisms" sometimes import the module itself as a namespace, and sometimes just the exported names inside of it. [...] Me, I'm somewhat torn between them; [...]

Still torn. As fodder for deciding, I notice that A History of Clojure talks glowingly about reifying Clojure namespaces as tangible data at runtime. (Section 3.1.4.)

That is nice, I guess. It maybe counts as a cheap form of "runtime reflection":

It is possible to resolve symbolic names to vars and classes, find documentation, source and other metadata, and directly manipulate the symbolic naming system in ways that in other languages might be relegated to the compiler or loader.

Not sure that pushes me all the way in any particular direction, but it does feel like an actual factor to consider.

masak commented 2 years ago

I'm 12 minutes into this talk, and realizing two things which I need to write down here:

Doing (import MODULE) in the REPL should of course work, and do the appropriate thing. Clojure has as a kind of fundamental principle that importing a module is in fact equivalent to evaluating it, in the appropriate way, on the REPL. A kind of "nothing-up-my-sleeve" principle, which I thoroughly endorse. (This doesn't preclude doing clever things at compile time, in the cases where this proves possible. But it does place the dynamic behavior at center-stage.)
The REPL can import the same module multiple times, with changes in-between. If the first time the module gets imported, it has a function rectangle (from the example in the talk), and the second time it doesn't (because rectangle in the module has been renamed to rect), things might still work in the REPL even though they are actually broken in the module source, due to the rectangle name sticking around. The speaker has a solution reload.cj in a project lazytest (now deprecated). Not having thought enough of the problem yet, I won't say anything more about possible/desirable solutions.

masak commented 2 years ago

Much of the thinking-out-loud in this issue is about looking ahead, and trying to arrive at a module/imports system with, let's say, nice scalability properties. The simplest possible version was nailed down already in the OP:

(export NAME) — export NAME from this module
(import MODULE) — import (the exported names from) MODULE into the current namespace

Short-term, I'm willing to live with a module system that does this in non-perfect ways — for example, overwriting non-exported things in the current namespace.

The reason I'm willing to live with a compromise is that I think it's important to get the modules thinking going. I particularly feel this for a test module. Currently, the code for doing unit tests has been copied indiscriminately into various repositories, instead of being imported.

Some local experimentation gave some disappointing results, though:

The read built-in is too slow — used to read/parse the imported file
The bel built-in is too slow — used to execute the imported file

Both of these are (presumably) blocking on the compiler.

masak commented 2 years ago

Back when I was thinking about "pods" — which I fear I might not have written down anywhere in this repo — I had a few more requirements.

Briefly, pods would be module-like, yes, but primarily they would be independent processes, more like software components or actors. The axis of composition would still be imports/exports, but with a clear focus on lexical dependencies. (I.e. you should be able to import a thing a that depended on another thing b even if you didn't import b and even if you had your own local b which was unrelated.)

What pods also allowed was deleting or replacing definitions. A "delete" would be a reified thing that could be exported. I don't recall if I ever made a decision about what should happen in the case where there were dependents on the deleted thing, or in the case where the thing never existed on the importing side, or was deleted by some other import. Let's assume it's possible to assign all that a consistent, non-annoying semantics. Either way, there are happy paths where none of those cases apply.

A "replace" is similar to a delete; it's a reified deliberate change of something that existed before. Even here, there would be corner cases to consider.

As I write this down, it strikes me I should study Racket's import system a bit more. I know it's rich and cares about fairly advanced things, like provides clauses. The thing discussed here is distinct but maybe related.

masak commented 2 years ago

The README.md mentions using (bootstrap) to mark the place where you want to use the evaluator in the current globals as your evaluator. It would be possible to do your imports, and then immediately run (bootstrap) after that, but we might also want to have a way to signal doing both in a single import directive.

masak commented 2 years ago

A "replace" is similar to a delete; it's a reified deliberate change of something that existed before. Even here, there would be corner cases to consider.

I also realize that there's a special case of "replace" which we could call "extend". In this case, the parameter list is in the obvious subtyping relationship with the original's parameter list, the return type (whether declared or not) is in the other obvious subtyping relationship with the original's, and — probably the hard part — the new version is extensionally equivalent in all the cases where it overlaps with the original. Decidability issues aside, I think that might be doable in a large fraction of cases. The "extend" case is a bit milder than a full "replace", since in some sense it's "the same" entity, only extended.

masak commented 2 years ago

Briefly, pods would be module-like, yes, but primarily they would be independent processes, more like software components or actors. The axis of composition would still be imports/exports, but with a clear focus on lexical dependencies. (I.e. you should be able to import a thing a that depended on another thing b even if you didn't import b and even if you had your own local b which was unrelated.)

Specifically, taking the "independent processes" at face value, a pod A ought to be able to import a pod B, and then (bootstrap) a new evaluator without that affecting the imports from B in any way.

I state that requirement without any force of conviction. Mostly just mapping out logical consequences here. It feels that this would make pods less like modules and more like actors/components; and the function calls going between them would be (in general) more like remote procedure calls between not-necessarily compatible machines.

masak commented 2 years ago

Just doing some drive-by-commenting here: Bel is fundamentally a very interpreted/dynamic language, and modules are a feature that reaches towards the compiled/static end of things. Not that they clash, as such; it's more like they express different preferences. I would like the module system to favor both REPL-based, interactive, live development, while also working really well with the more offline, IDE-based, corporate style of development. Again, the two are not in opposition — it's more like they have different form factors.

masak commented 1 year ago

The README.md mentions using (bootstrap) to mark the place where you want to use the evaluator in the current globals as your evaluator. It would be possible to do your imports, and then immediately run (bootstrap) after that, but we might also want to have a way to signal doing both in a single import directive.

I was curious if I had written on this point in this issue. In writing the above, I seem to assume that we want the client/importing module to do the bootstrapping. But in a way, that feels a bit disconnected — a module is either written to modify the (client's) current evaluator, or it isn't. Introducing even the choice of running bootstrap after an import raises issues both of under-use (forgetting) or over-use (needlessly calling (bootstrap)).

On the other hand, it's not immediately obvious to me how it would look if it was controlled from the provider/module end. Maybe as a different kind of export? The whole thing feels a little bit like declaring static methods in Java, in the sense that the method is technically declared on a whole different level.

masak commented 1 year ago

Briefly, pods would be module-like, yes, but primarily they would be independent processes, more like software components or actors. The axis of composition would still be imports/exports, but with a clear focus on lexical dependencies. (I.e. you should be able to import a thing a that depended on another thing b even if you didn't import b and even if you had your own local b which was unrelated.)

Related to this, I recently started out implementing a language design (with the working name Dodo) whose main feature is that function values close not just over their lexical environment, but over their local evaluator. Different modules could have different evaluators, but their functions could still call each other over a kind of inter-evaluator calling protocol.

Think about how a small metacircular evaluator normally implements a call, for example this one in Ipso (and its Raku translation). We do three things:

Evaluate the operands (ASTs) into arguments (values)
Starting from the outer environment stored as part of the function value, non-destructively append a pairing-up of parameter names and evaluated arguments (from left to right, so that later parameters can shadow earlier ones), creating a "function body environment"
- I guess this is where we also check that those lists are of equal length, or else
- Ipso doesn't do this, but if we run out of arguments but a parameter has a default expression (as in myParam = defaultExpr), then we evaluate the default expression and use its value in the binding
- Ipso doesn't do this either, but if the last parameter is slurpy/rest, then we can bind it to a list of all the (possibly zero) remaining arguments
Evalute the body in the function body environment

Because this typically happens using the same evaluator throughout, we consider this to be one contiguous bit of code. But now picture that this instead is a "handshake" between two evaluators. More like messaging between two actors. (In fact, I think "messaging between actors" should be the underlying primitive here.) In that case, step 1 happens in the caller evaluator, and steps 2 and 3 happen in the callee evaluator. By necessity, the messaging happens in a kind of "step 1.5" in-between, and there is further messaging either on return of a value, or when signaling an error. (This could be handled via some kind of continuation-passing, I guess?)

Dodo complicates things one step further, because it also has operatives, which means that sometimes step 1 shouldn't happen and we should send over the operands (ASTs) instead. Bel's macros work similarly, but with an extra evaluation "flourish" after getting back the result.

Anyway, to tie this back to modules. I think this approach could be very clean and attractive. It sort of hides an actor system in a module/import system, and I especially like how it provides some "stability" in the sense that a function gets to evaluate in the evaluator where it was defined. That's important in a system where the evaluator can change — guaranteeing evaluator stability is similar to guaranteeing lexical scoping.

Both the calling protocol and the module/imports protocol turn into points of stability; within a module/evaluator, things are allowed to change wildly, but as long as the protocols hold, they can all talk to each other. (I don't remember where I read or heard the phrase "communicating with aliens"; probably somewhere in the vicinity of Kay. But that's what's going on here. With actors, we don't get to assume anything about the way a message is received or understood; but we do have some basic guarantees about the message protocol itself.)

Anyway, that's how I envision pods: module-like, actor-like entities whose innards can change wildly (because each pod controls its own evaluator), but whose external contracts and interfaces remain somewhat stable, thanks to the import/export mechanism being rooted in the static.

masak commented 1 year ago

(I don't remember where I read or heard the phrase "communicating with aliens"; probably somewhere in the vicinity of Kay. [...])

Ah, I remember now. I heard it in this Bret Victor talk: The Future of Programming. He credits the idea to Licklider, who considered the problem of how two machines on the network "who just met" would be able to talk to each other.

When you have this global network of computers, you run into what Licklider called 'the "communicating with aliens" problem'. [...] 'How do you get communication started among totally uncorrelated "sapient" beings?' I'll explain what he means by that. [...] These two programs know nothing about each other. [...] They have to talk to each other. [...] Now they need to be able to communicate. So how are they gonna do that? There's only one real answer to that that's scales, that's actually going to work, which is: they have to figure out how to talk to each other.

masak commented 1 year ago

The REPL can import the same module multiple times, with changes in-between. If the first time the module gets imported, it has a function rectangle (from the example in the talk), and the second time it doesn't (because rectangle in the module has been renamed to rect), things might still work in the REPL even though they are actually broken in the module source, due to the rectangle name sticking around. The speaker has a solution reload.cj in a project lazytest (now deprecated). Not having thought enough of the problem yet, I won't say anything more about possible/desirable solutions.

Been thinking more about this one. It feels like the modules/import version of the fragile base class problem. Paraphrasing from the Wikipedia page: "seemingly safe modifications to [the source file], when [imported] by the [REPL], may cause [a sum total of definitions and behaviors that differ from just loading the file]".

Maybe the right attitude to it all is that this is really a version control problem! (I just had this idea.) In other words, the source file is like an upstream, the REPL is like a local branch, importing and re-importing is like fetching from the upstream and then attempting to fast-forward cleanly, and conflicts manifest when different/incompatible updates have been made to the same definition both locally and upstream.

Notably, a function definition being removed in the source file would be "tracked" in the sense that a re-import would "update" that function definition by removing it. This is definitely not as lightweight as parsing the source file and evaluating it into the REPL, but it does seem to solve the above rectangle issue.

I'm quite curious to trying that out in a prototype, at least.

An obvious improvement to the idea is to support "automatic re-import" when an imported source file is saved. This would happen via some kind of file listeners, which I remember are fairly straightforward, at least on Linux/BSD.

masak commented 1 year ago

Just doing some drive-by-commenting here: Bel is fundamentally a very interpreted/dynamic language, and modules are a feature that reaches towards the compiled/static end of things. Not that they clash, as such; it's more like they express different preferences. I would like the module system to favor both REPL-based, interactive, live development, while also working really well with the more offline, IDE-based, corporate style of development. Again, the two are not in opposition — it's more like they have different form factors.

I just wrote about this in https://github.com/masak/alma/issues/302 — in summary, Bel modules (unlike normal Bel code) need to be "scrutable", which means that they are static enough that you can expand the code until you see all the definitions, so that you can statically build an export list.

The main point is that this requirement of scrutability doesn't limit you much in practice. You can still syntactically hide definitions inside of other macros, for example.

masak commented 5 months ago

Something quite close to the idea of pods seems to come up in Motoko's idea of canisters. These are actor-like persistent compilation units communicating with the outside world via asynchronous messages.

Particularly, the idea of orthogonal persistence seemingly falls out of this. It's basically hot-swapping — that is, you get to keep a canister's internal state while upgrading its API and implementation. Persistence happens somewhere (and is enabled by the stable keyword), but the details are entirely abstracted away.