Closed zerbina closed 3 months ago
For comparison, a MirNode
currently has a size of 16 bytes. 8 byte is a good size, because it
means that a node fits into a single register (on a 64-bit architecture).
The next step will be implementing a tool that comprehends the grammar from the Markdown files (I'm already using an early implementation thereof locally). My plan is that it is responsible for:
spec
module based on the (merged) grammarsThat'll provide a solid base for quick and easy iteration.
There's the general question of how testing passes should work, as in:
Right now, the second option is chosen, but I think it should be a mixture of both. Whether a language should be compiled down into bytecode and then run should be configurable via a command line option, so that during local development, only a pass' output is checked, whereas in CI, it's also fully compiled and run.
Fully compiling the output of a pass during testing could work by serializing the output to disk and then treating it as a test file for the runner of the lower-level language, repeating the process until reaching the VM tester. This has the benefit of not having to implement the whole pipeline into every runner, but the serialization + file IO + deserialization overhead might be too much.
To keep the cost of changing languages at a reasonable level, I think tests should stick to only covering each language feature in isolation, at least for now.
@saem: Regarding the naming, given that there can (and will) be multiple target languages, I believe having L0
refer to the source language would be better (thought maybe a bit confusing, since lowering then corresponds to the number going up).
However, since we're developing the languages bottom-to-top, having L0
means "target language" is easier for now, I'd say, otherwise a renaming is necessary whenever adding a new higher-level IL. For the in-development top-level candidate, we could use Lx
(or similar) as the name.
@saem: Regarding the naming, given that there can (and will) be multiple target languages, I believe having
L0
refer to the source language would be better (thought maybe a bit confusing, since lowering then corresponds to the number going up).However, since we're developing the languages bottom-to-top, having
L0
means "target language" is easier for now, I'd say, otherwise a renaming is necessary whenever adding a new higher-level IL. For the in-development top-level candidate, we could useLx
(or similar) as the name.
I think since we're going to be stacking languages on top, and the earlier (bottom) languages are going to be around longer, it's fair to assume they'll be more stable over time. I think starting the numbering where the bottom is L0
is likely to be the easiest (fewest renames and most stable references over time).
Although, if at some point we need a secondary numbering scheme, that does go from source -> target, then we could maybe do S0
for source-zero, and then increment the number as it gets lower. This would be the opposite of the Lx
scheme, but it would allow us to reason about depth from source, if and when required.
@saem: I've addressed the review comments, extended the test coverage, made some small language changes, and fixed some bugs. There's still tests missing (arithmetic and comparison operations have no coverage yet), but I think it's okay to add them in post, so that further work depending on lang0 can commence already.
All tests now also make sure the produced bytecode matches the expected one. Beyond making the workings of pass0
easier to understand, this also ensures that the produced bytecode doesn't silently change (when making changes to pass0
).
Summary
PackedTree
implementation for use as the passes' IRDetails
Lanuage Design
In its current form, the language is likely too high-level, but it's a good start to base further language development on.
Pass Design
The pass currently operates on whole modules at a time (though there's no multi-module support at the moment). Beyond some minor assertions, syntax and proper typing is not checked by the pass -- the idea is that during normal compilation, passes trust that their input is sane and sound, with validation implemented separately.
Storage
A simple generic
PackedTree
type (based on NimSkull'sMirTree
) is used as the IR for the pass. The node kind enum to use is provided by thespec
module, alongside S-expression serialization/ deserialization.Nodes only store the kind and a type-erased value. Extra information, such as types, needs to use nodes. This keeps nodes small (currently 8-byte) and serialization easy.
To-Do