nim-works / phy

compiler and vm experiments
MIT License
3 stars 2 forks source link

rework the languages and passes below `L10` #64

Closed zerbina closed 1 week ago

zerbina commented 3 weeks ago

There are a lot of issues with the lower-level ILs, which are going to lead to problems or inefficiencies later on.

This PR aims to address the aforementioned issues, by redesigning the languages and adjusting the passes accordingly, taking into the account the lessons learned so far.


Notes For Reviewers

This is part of a larger rework of the ILs. The first step (i.e., this PR) is focused on reworking the existing ones, with new languages/passes only being introduced where they're needed for pass10 to keep working.

zerbina commented 3 weeks ago

The flow through the languages changes quite a bit. The out-of-SSA transformation now immediately precedes VM code generation (pass0). Aggregate locals are already gone at that stage -- they're now first lowered into stack locations (L6 -> L4), which are then reified into real memory locations (L4 -> L3). Using abstract stack locations (Locs) first and reifying them later means that earlier passes don't have to worry about stack layout and allocation.

In general, the passes become a bit more narrow, which means more passes, but also that each pass needs to do less (in the grander scheme of things).

Another significant change is with how "pinned" locals (i.e., locals that need to have a fixed location in memory) are handled. Instead of the awkward Rename block argument modifier and mixed SSA (i.e., some locals using SSA, some not), all locals that name stack locations rather than merely being a handle to an abstract location need to use Loc. Locs (abstract stack locations) don't use SSA -- instead, they're scoped to the procedure and use explicit markers (StorageLive and StorageEnd) for delimiting their live range.

Goto-with-value is gone too. The small convenience it provided isn't worth complicating the Continue (now Goto) syntax for.

zerbina commented 2 weeks ago

The structure of aggregate type descriptions also changes a bit. First, record types must now specify an alignment requirement, and second, there is now a dedicated Union type.

The lack of any form of memory alignment would have become a problem sooner or later, and for union types, having a dedicated type better communicates and enforces the layout requirements. While the L3 doesn't strictly need the Union (re-using the Record type, like it was done previously, is enough), higher-level languages do, and since removing/modifying types does have overhead, it's better to just keep the type until pass3, where it's removed together with the other aggregate types.

zerbina commented 2 weeks ago

It's not something that I'd do as part of the pass rework (this PR is large enough already), but Select in its current form should not reach pass0.

The Select exit picks a jump target out of list based on an supplied value, where each jump target is associated with either a unique value or range of values. It's used to implement the case construct.

The problem with Select is that in order to correctly emit the multiple comparisons, pass0 needs to spawn a temporary local, as the expression to select the target with might have side-effects (or be non-trivial to recompute). Register/local allocation already took place at that point, so this is effectively not possible. My solution to get around having to allocate a temporary was to restrict the Select operand to a small set of operands that have no side-effects and translate to a single instruction.

What I think would be a better solution is to introduce a BranchTable exit, which is used like so:

(BranchTable e (Goto 1) (Goto 2) (Goto 3))

The first Goto specifies the block to jump to when e == 0, the second Goto the block to jump to when e == 1, and the last Goto the block to jump to otherwise. Select would then be lowered into either BranchTable or a sequence of Branch blocks around the L25 language.

zerbina commented 1 week ago

The pass ordering and work distribution is better than before, but it's still not good. Turning the aggregate locals into stack locations effectively requires an out-of-SSA transformation, which would duplicate parts of the register allocation pass.

More fundamentally, I think the idea of requiring explicit block parameters for every local moving between blocks is wrong. On one hand, it encodes in the IR which locals are alive at the entry of the block explicit (which is good), but on the other hand it increases IR size and requires patching the parameter and argument lists (which is especially annoying with the Cursor API). I think it's better if locals are implicitly available in blocks that are dominated by its definition.

Finally, the Cursor API is too cumbersome to use and too restrictive in what it allows. I have an idea for a better way to build changesets, though it'll requires macros.

I'm going to start over with the rework in a new PR.