Top-level namespacing - Githubissues

srackham commented 2 years ago

How can you disambiguate same-named components and classes from different source files (aka the module problem)?

I see that this issue was addressed as as "Future Work" in the 2013 paper https://dl.acm.org/doi/10.1145/2491956.2491962

The current EBNF grammar file contains "import" and "export" keywords but I haven't been able to find a description of their semantics:

https://github.com/titzer/virgil/blob/3038dead280099b736f312e2b091b053cb0cfbf7/doc/virgil-grammar.ebnf#L5

https://github.com/titzer/virgil/blob/3038dead280099b736f312e2b091b053cb0cfbf7/doc/virgil-grammar.ebnf#L35

titzer commented 2 years ago

Thanks for the question, I see you are finding the dark corners of the language!

Virgil doesn't yet have a namespacing mechanism. For that, I envision allowing a single namespace foo.bar.baz; at the top of a file will suffice. The internal type resolution machinery has most of the support needed. That'll also be important to keep the Virgil runtime's names separate from the application.

The import/export mechanism you stumbled on is currently only used for the wasm target. It directs the compiler to emit imports and/or exports matching the names declared there. Thus, with that mechanism, a Virgil program can be compiled to a .wasm module that accesses any API.

The import/export mechanism needs some additional features to be useful on other targets. E.g. on the jvm target, it would be useful to be able to import classes from the Java namespace. Similarly, on native targets, the import/export mechanism could be used to generate symbols for linking with other code.

srackham commented 2 years ago

Something like this?

Top-level module statement:

module <module-name>;
Moves top-level declarations to the <module-name> namespace and only exposes them to other files when explicitly imported.

Top-level import statement:

import <module-name> [as <qualifier>];
Imports the top-level module declarations.
The optional as clause puts the declarations into the <qualifier> namespace.

Example files:

module M;
class C {}
def f() {}
var v = 42;

import M;
def c = C.new();
f();
def w = v;

import M as N;
def c = N.C.new();
N.f();
def w = N.v;

This feature would not change the existing behaviour of Virgil's implicitly promiscuous imports.
Since module-based imports are not promiscuous it would make sense to relax the "top-level variables are private" rule and simply expose all top-level declarations except for those explicitly labelled private.

titzer commented 2 years ago

Yeah, that is roughly what I was thinking, modulo the keywords. (E.g. "import" is being used for the Wasm case that I described above--maybe want a different keyword).

Another thing I have thought about is having a convention where a file module.v3 in a directory is automatically interpreted as a module for the whole directory, so that not every file has to repeat it. (that's one thing I find annoying about other languages is a lot of repeated boilerplate for every file).

srackham commented 2 years ago

Another thing I have thought about is having a convention where a file module.v3 in a directory is automatically interpreted as a module for the whole directory, so that not every file has to repeat it.

Wouldn't this tie language semantics to the underlying file storage mechanism? One of the neat things about Virgil is that the language is independent of file locations, file names and file ordering. In my opinion, directory layouts, file paths and URLs should be relegated to the language tools not the language.

titzer commented 2 years ago

It does slightly depend on the file order, because that is the order in which component initializers (and top-level initializations) are run, which is observable (e.g. you can make internal lists and such).

srackham commented 2 years ago

"import" is being used for the Wasm case that I described above--maybe want a different keyword

require? uses?

titzer commented 2 years ago

In my opinion, directory layouts, file paths and URLs should be relegated to the language tools not the language.

I generally agree on that. E.g. I really like that the compiler's job is just "take huge list of files, generate binary" and not go hunting around for packages and other magic. For example, Wizard just has a build.sh file that goes off and assembles the list of files for the target and hands them off to v3c. But I am torn because I hate repeating myself, particularly for all the files that are logically in a module, which 99% of the time is just a directory anyway. That's one thing that drives me nuts about C header files and Java imports alike--all that repeating yourself about what namespace your code is in and what other code it is about to use.

Another idea, that does have some downsides, is that the file->module grouping could be done at the command-line level, e.g. v3c -module=Foo <foo files> -module=Bar <bar files> ... Then build tooling could just assemble up a command line for that. That doesn't deal with imports, though, and it's a pain to type that out if you have to do that manually, which I do a lot in testing.

srackham commented 2 years ago

I really like that the compiler's job is just "take huge list of files, generate binary" and not go hunting around for packages and other magic.

The "just compile the files" is refreshingly transparent -- no magic paths or directory layouts.

I hate repeating myself, particularly for all the files that are logically in a module

the module statement is not too onerous (two words at the start or the file that seldom change), it's the import statements that are tedious because they change so much during the development process and there are multiple import statements per file. This is where I find modern editors and IDEs really helpful:

When you reference an external declaration that is not resolved you get prompted with a list of import candidates and your selection is inserted in the header automatically.
When an import is no longer needed you get prompted for its removal.
Import lists are sorted.

The LSP has revolutionised IDEs and editors to the point were "Quick fixes" like these along with auto-completion, pop-up documentation and semantic syntax highlighting are fast becoming an accepted, if not mandatory, norm. I would find it very difficult to go back to a world monochrome text-only editors.

Another idea, that does have some downsides, is that the file->module grouping could be done at the command-line level, e.g. v3c -module=Foo -module=Bar ...

Command-line language semantics are a bit too magical for my tastes.

titzer commented 2 years ago

Yeah, I think I am convinced now that at least a module Foo; at the beginning of a file isn't so bad.

As for IDEs, I was a huge IntelliJ fan for many years. IDE refactoring is absolutely wonderful, particularly the more advanced things like inline/extract method, navigation, etc.

I'd like to have a VSCode plugin for Virgil, complete with LSP, but every time I go down that road I stumble on the basics. I've been using emacs for so many years that I usually end up getting by with basic syntax highlighting and TAGS. I think if I had IntelliJ-level IDE support I'd be even more productive.

Ok, another idea: if Virgil had using as you suggest, e.g., it would be easy to make a separate build tool that scans the first few lines of .v3 files, consults its own package dictionary which maps module names to directories/files, and assembles the list of files to be supplied to the v3c command. That'd be hidden in each program's build script.

That would still scale fairly well, as the Virgil parser is ~1-2 MLOC/s. Eventually for really big programs an incremental compilation solution would be needed. I'd want it to still "feel" like v3c compiled everything from source, so some transparent caching behind the scenes could speed this up. Generally the compiler spends only about 10-20% of its time in the frontend, so saving only parsing work would only go so far. It'd probably need a binary format for either an intermediate bytecode or target code that can be linked. I'd prefer to hide that format so that there was never any .o hell for users to ever see.

srackham commented 2 years ago

@titzer

Ok, another idea: if Virgil had using as you suggest, e.g., it would be easy to make a separate build tool that scans the first few lines of .v3 files, consults its own package dictionary which maps module names to directories/files, and assembles the list of files to be supplied to the v3c command. That'd be hidden in each program's build script.

Thinking about it, I'm not sure adding module/import statements is necessary or a good idea. Not only is it adding boilerplate and adding to the language, it also introduces (possibly cyclic) import recursion (these issues don't arise with the current scheme).

Why not just extend the existing DEPS file format from <path> to <path> [from <url>] [as <namespace>]? This would obviate the need for module/import statements and leave the language untouched. Examples:

# .v3 files in top-level namespace (current semantics):
./lib/util/*.v3

# All .v3 files in ./baz directory in top-level namespace:
./baz

# All .v3 files in baz directory in qux namespace:
baz as qux

# All .v3 files in Github repo directory lib/baz at Git tag '0.42' in qux namespace:
lib/baz from https://github.com/joebloggs/foobar@0.42 as qux

# All .v3 files in baz directory in baz namespace:
baz as baz

This could be implemented using your command-line file->module grouping idea i.e. scan the DEPS file and generate a file list interspersed with -module=<namespace> options (-module= for top-level). Maybe -namespace would be more accurate than -module.

srackham commented 2 years ago

Why not just extend the existing DEPS file format from <path> to <path> [from <url>] [as <namespace>]?

My previous idea won't work, you can't unilaterally rug-pull namespaces across the entire code base, namespace reassignment has to be done by the client modules themselves. So back to the module/import statements solution. There's a reason why Go and Rust have both taken this approach. My "why haven't I seen this before?" alarm bells should have sounded.

Deno is interesting, there is no separate package manager, it just leverages TypeScript's export * as <namespace> statement in conjunction with an optional deps.ts file cf. the DEPS files used in Virgil projects. See Manage Dependencies | Manual | Deno.

titzer commented 2 years ago

My previous idea won't work, you can't unilaterally rug-pull namespaces across the entire code base, namespace reassignment has to be done by the client modules themselves.

Yeah, I started to realize that module membership is basically intrinsic to the declaring module, so at the very least the module statement should allow a file to declare which module it belongs to.

srackham commented 2 years ago

Module publication and subscription is a surprisingly knotty problem, but without a way for developers to painlessly contribute and consume libraries and applications a language can't scale horizontally and won't gain traction.

Virgil is such a nice language and (by pure serendipity?) a perfect fit for Wasm+WASI. It would be a shame not to take advantage of this window of opportunity.

titzer commented 2 years ago

Virgil is such a nice language

Thanks!

I generally agree with you wrote. I am pulled in many directions and so far have focused on solving my own problems as they pop up. Wasm+WASI is within reach. I chip away at that when I get a chance but can use some help.

Modules would be a great addition to the language so I am really glad to discuss on this thread how this might come about.

srackham commented 2 years ago

Wasm+WASI is within reach.

Great to hear.

I chip away at that when I get a chance but can use some help.

I really do empathise. The amount and quality of the work you've done is phenomenal but there's only so much one person can do.

If I knew how to make a software project go viral based on merit alone I'd be famous. It's frustrating, but the the old adage "make a better mouse trap and they will come" doesn't apply to software.

titzer commented 2 years ago

lol, no worries. It's alright being kinda low-profile but not totally dead. Virgil's been a slow burn for a sizeable chunk of my life now and really my bandwidth for explaining it all lags my imagination to create more stuff, so I write few papers and instead chisel on a thing optimized for my chiseling.

srackham commented 2 years ago

I write few papers and instead chisel on a thing optimized for my chiseling.

Yeah, just stick to what you enjoy.

Maybe the whole module pub/sub issue, along with the need for standard libraries, could be sidestepped with Wasm+WASI+Component Model module imports? Module management being farmed out to an external language-agnostic Wasm package manager such as wapm.

It's still early days regards WASI and the Wasm Component Model but once the dust settles Wasm may well deliver on Java's promises of portability and security.

I don't know if the seamless integration of Wasm+WASI modules is feasible, but from the looks of the Wizard Engine project you have mastered Wasm and have a head start in that direction.

titzer commented 2 years ago

Yeah, that's a good idea. It might help both WASI and Virgil to have a GC'd language targeting WASI from the get go.

I was struggling to reverse-engineer the intended semantics of wasi_snapshot_preview1 from various engines' implementations. Implementing both the producer (Virgil) and consumer (Wizard) of that equation, it is easy to just misinterpret and implement the wrong thing. And debugging on most engines is terrible, so I haven't had the time to get on it. I'm looking forward to having a higher-level interface for WASI but also a little unsure how much work that will be, or whether it will lead to a mismatch.

titzer / virgil

Top-level namespacing #79