Make an RFC about modules, units, linklets, and linking

jeapostrophe commented 5 years ago

Racket1 has two user-facing methods of modularity: modules and units. Modules are purely compile-time things and always refer to their imports by name, so they are inherently not parameterized. Units are run-time things and support a lot of functionality and power via this nature, such as mutual recursion. Units are implemented purely as macros and work hard to be partially separably compilable (via, for example, macros only inside of unit signatures and not the units themselves.) At the VM level, Racket also supports linklets, which are like a limited kind of unit. Modules are implemented via compilation to sets of linklets.

I would like a modularity mechanism more similar to ML's module system where a module can be abstracted over its imports. I think it is valuable for thinking about programs and performance for this to be a compile-time decision, but I think it is interesting to think about whether the same specification mechanisms could be used for run-time and compile-time situations.

I think there are big risks for making the linking language too complicated to use for small things (which is what I perceive as a major issue with units) and forcing too much tooling outside of the program (which I perceive as a big issue in ML.) I think we can deal with the first by being less ambitious in terms of power and separate-compilability and I think the second is solved by a DSL that is like require/provide.

As a tiny concrete thing, I think it would be extremely useful for parameterized modules (functors in ML) to specify a default instance of their imports that is used when the module is run itself and when no linking specification is used.

I think this requires a connection to #9

sorawee commented 5 years ago

Here's something I want to be able to do:

Say, fancy-app is a package that turns (f _ 1 _) to (lambda (x y) (#%app f x 1 y)). reverse-app is a package that turns (f a b c) to (#%app f c b a).

To use fancy-app or reverse-app, I would write: (require (fancy-app racket)) or (require (reverse-app racket)) respectively. That is, fancy-app and reverse-app are module functors.

If I want my #%app to have both functionalities, I could either write: (require (fancy-app (reverse-app racket))) or (require (reverse-app (fancy-app racket))).

One would transform (f _ _ 1) to (lambda (x y) (f 1 x y)). Another would transform (f _ _ 1) to (lambda (x y) (f 1 y x)).

(thanks to @AlexKnauth for the discussion)

rocketnia commented 5 years ago

One way to handle that scenario within the current module system would be for the fancy-app and reverse-app libraries to export macro-defining macros:

(require (only-in racket/base [#%app base-app]))
(require (only-in fancy-app define-fancy-app))
(require (only-in reverse-app define-reverse-app))

(define-fancy-app my-fancy-app base-app)
(define-reverse-app #%app my-fancy-app)

; ... code that uses the #%app we just defined ...

Similarly, I think a lot of use cases for ML-style functors are attainable using submodule-defining macros.

rocketnia commented 3 years ago

These past few months, I've been thinking about approaches to parameterized modules. While a module-generating macro can work in a pinch, it means we're compiling the same module lots of times, and none of their types are compatible. It would be much better to be able to cache compilation results and run-time module instantiations, just like Racket already does for non-parameterized modules.

But where do we store the compilation results for each invocation of a module? I can imagine several possible answers:

[Dedicated Location] Perhaps compiled module invocations could be stored in independent .zo files in some deterministic location in the filesystem or some kind of code database.
- ⚠️ I think this would involve extensions to Racket's compiler. I'm not sure how to do this for a #lang, but maybe there's a way.
- ℹ️ If extensions are needed, maybe that's within scope for Rhombus.
[Call Sites] Perhaps a compiled module invocation could be stored as part of each invocation site's .zo.
- ⛔ This gives us diamond dependency problems: The call sites in modules {M1, M2, ..., Mn} each compile their own variants of module A with the same arguments. Module Z, which depends on {M1, M2, ..., Mn}, will notice that the types each of them gets from A aren't compatible with each other. (Or worse yet, maybe it won't notice, and any effort to patch it up with consistent versions will actually break it.)
  - 🔨 We can potentially trust module Z's author to resolve the conflict manually. They could use something like namespace-attach to attach the appropriate variant of module A to a "compilation registry" and then compile new versions of {M1, M2, ..., Mn} which draw upon that registry.
  - ⛔ Unfortunately, this approach means Z's .zo contains its own compiled variants of A and {M1, M2, ..., Mn}, not to configure them any differently, but just to resolve a versioning issue. We could prune this down a little by reusing the variant of A from M1's .zo and reusing the M1 variant that's already compiled for that A, but then Z would still compile its own {M2, M3, ..., Mn}. Further generations of dependency on these modules will have to repeat this effort all over again, recompiling A, {M1, M2, ..., Mn}, and Z in order to reconcile Z's chosen variants of A and {M1, M2, ..., Mn} with the variants other modules are using. So the only stage of this process that won't be duplicated effort will be the one that compiles just about everyhing into the final application entrypoint's .zo. In other words, in a worst-case scenario, this strategy is so incapable of reusing compilation results that we might wish we were doing just one whole-program compilation.
- ⛔ Depending on how greedy we are about coalescing compilation results into a compilation registry, this strategy can also cause subtle bugs when a module Z starts to rely on the fact that when it doesn't supply a peer dependency A to its dependencies {M1, M2}. If module A is promoted to the base distribution (or gets automatically coalesced into Z's compilation registry by some other means), then Z's latent bug will start to show up. Since getting a result of #f from equal? isn't a noisy error, and since the person observing the bug might be a complete bystander who's merely upgraded some packages, this could be a difficult bug to diagnose.
[Abstraction Site] Perhaps compiled module invocations could be stored as part of the module's own .zo.
- ⛔ How does the module know what arguments it'll be called with? Alternatively, how do any of the argument-type-defining modules know what other arguments they'll be combined with and what modules they'll be passed to?
  - 🔨 Maybe each invocation of a module needs to have a corresponding declaration in its package's info.rkt, so that one of these defining modules can use find-relevant-directories to find all the call sites.
  - ⛔ But this doesn't help for call sites that appear in one-off scripts and applications. It could only work for packages.
  - ⛔ It also means quite a few compiled files would have logical dependencies on the set of packages installed, which would mean whole-program recompilation whenever a new package was added.
[Data-Defining Sites] Perhaps each compiled module invocation could be stored in the .zos of the modules that define the types used in the arguments of the call.
- ⛔ Which one? If we pick more than one, we have diamond dependency problems (see above).
- ⛔ How does it know what modules these arguments will be passed to? The find-relevant-directories approach has problems (see above).
[Designated Entrypoints] Perhaps compiled module invocations could be stored only where the user specifically requests for them to be, most likely the application entrypoint or the entrypoints of particularly ambitious libraries.
- ⛔ This will effectively be whole-program compilation. Either modules compile their own dependencies as they're compiled, or they ask their caller to pass those dependencies in (or some combination); either way, a whole subtree of transitive dependencies is being compiled together.
- 😅 But at least this way, we're only performing whole-program compilation once, not once for each module in the program.
- ⛔ Since compilation of the same module with the same arguments might be requested in multiple places, we'll have dependency hell to worry about.
- ℹ️ The module-generating macros I mentioned earlier are an example of this kind of system. Designs emphasizing generative functors would likely be examples of this as well.
  - ℹ️ (Most examples of generative functors are in languages without macroexpansion, and in that case, they don't necessarily have the same whole-program compilation problem. If a functor's arguments vary only at run time, then it has a trivial compile-time cache which only needs to be populated with one entry.)

I've done a lot of thinking these past few months about the Designated Entrypoints and Call Sites options. The Call Sites option has been interesting to think about in a lot of ways, and this week I was writing up a proposal based on it, but I'm starting to see its dependency hell as a rather important point against it.

At this point, the Dedicated Location option is the only one I find fully compelling, even if new compiler features turn out to be needed for that one.

What do you all think of the tradeoffs here?

benknoble commented 3 years ago

I wrote a fair bit of ML before switching to Racket, and (while I haven't yet programmed with them) it seems units with import lists are the equivalent of ML functors? Perhaps I'm missing something… or perhaps the original proposal wants to coalesce some features of both modules and units? It's not entirely clear to me.

mfelleisen commented 3 years ago

Units were directly inspired by ML functors and weaknesses we experienced with those. At the time, ML (spec, implementation) did not allow recursion and dynamic loading of functors, nor could functors accept functors. (The inspiration was a particularly kind of extensible language interpretation. It's now known as algebraic-effect interpretation.)

rocketnia commented 3 years ago

perhaps the original proposal wants to coalesce some features of both modules and units?

Yeah, Racket module imports retrieve compile-time entities (like macros) and run-time entities (like functions). Units, as they exist now at least, aren't a full solution for abstracting over Racket module imports because they can only handle the run-time parts.

mfelleisen commented 3 years ago

That's not completely correct; see GPCE 2005 but yes, it's definitely a short-coming of units.

racket / rhombus

Make an RFC about modules, units, linklets, and linking #75