nim-works / phy

compiler and vm experiments
MIT License
3 stars 2 forks source link

implement first iteration of the lowest-level language #7

Closed zerbina closed 3 months ago

zerbina commented 3 months ago

Summary

Details

Lanuage Design

In its current form, the language is likely too high-level, but it's a good start to base further language development on.

Pass Design

The pass currently operates on whole modules at a time (though there's no multi-module support at the moment). Beyond some minor assertions, syntax and proper typing is not checked by the pass -- the idea is that during normal compilation, passes trust that their input is sane and sound, with validation implemented separately.

Storage

A simple generic PackedTree type (based on NimSkull's MirTree) is used as the IR for the pass. The node kind enum to use is provided by the spec module, alongside S-expression serialization/ deserialization.

Nodes only store the kind and a type-erased value. Extra information, such as types, needs to use nodes. This keeps nodes small (currently 8-byte) and serialization easy.


To-Do

zerbina commented 3 months ago

For comparison, a MirNode currently has a size of 16 bytes. 8 byte is a good size, because it means that a node fits into a single register (on a 64-bit architecture).

Next Step

The next step will be implementing a tool that comprehends the grammar from the Markdown files (I'm already using an early implementation thereof locally). My plan is that it is responsible for:

That'll provide a solid base for quick and easy iteration.

Thoughts on Testing

There's the general question of how testing passes should work, as in:

  1. should the pass output be compared against an expected version?
  2. should the pass output be compiled down to bytecode and then run?

Right now, the second option is chosen, but I think it should be a mixture of both. Whether a language should be compiled down into bytecode and then run should be configurable via a command line option, so that during local development, only a pass' output is checked, whereas in CI, it's also fully compiled and run.

Fully compiling the output of a pass during testing could work by serializing the output to disk and then treating it as a test file for the runner of the lower-level language, repeating the process until reaching the VM tester. This has the benefit of not having to implement the whole pipeline into every runner, but the serialization + file IO + deserialization overhead might be too much.

To keep the cost of changing languages at a reasonable level, I think tests should stick to only covering each language feature in isolation, at least for now.

zerbina commented 3 months ago

@saem: Regarding the naming, given that there can (and will) be multiple target languages, I believe having L0 refer to the source language would be better (thought maybe a bit confusing, since lowering then corresponds to the number going up).

However, since we're developing the languages bottom-to-top, having L0 means "target language" is easier for now, I'd say, otherwise a renaming is necessary whenever adding a new higher-level IL. For the in-development top-level candidate, we could use Lx (or similar) as the name.

saem commented 3 months ago

@saem: Regarding the naming, given that there can (and will) be multiple target languages, I believe having L0 refer to the source language would be better (thought maybe a bit confusing, since lowering then corresponds to the number going up).

However, since we're developing the languages bottom-to-top, having L0 means "target language" is easier for now, I'd say, otherwise a renaming is necessary whenever adding a new higher-level IL. For the in-development top-level candidate, we could use Lx (or similar) as the name.

I think since we're going to be stacking languages on top, and the earlier (bottom) languages are going to be around longer, it's fair to assume they'll be more stable over time. I think starting the numbering where the bottom is L0 is likely to be the easiest (fewest renames and most stable references over time).

Although, if at some point we need a secondary numbering scheme, that does go from source -> target, then we could maybe do S0 for source-zero, and then increment the number as it gets lower. This would be the opposite of the Lx scheme, but it would allow us to reason about depth from source, if and when required.

zerbina commented 3 months ago

@saem: I've addressed the review comments, extended the test coverage, made some small language changes, and fixed some bugs. There's still tests missing (arithmetic and comparison operations have no coverage yet), but I think it's okay to add them in post, so that further work depending on lang0 can commence already.

All tests now also make sure the produced bytecode matches the expected one. Beyond making the workings of pass0 easier to understand, this also ensures that the produced bytecode doesn't silently change (when making changes to pass0).