Closed zerbina closed 1 week ago
The flow through the languages changes quite a bit. The out-of-SSA transformation now immediately precedes VM code generation (pass0
). Aggregate locals are already gone at that stage -- they're now first lowered into stack locations (L6
-> L4
), which are then reified into real memory locations (L4
-> L3
). Using abstract stack locations (Loc
s) first and reifying them later means that earlier passes don't have to worry about stack layout and allocation.
In general, the passes become a bit more narrow, which means more passes, but also that each pass needs to do less (in the grander scheme of things).
Another significant change is with how "pinned" locals (i.e., locals that need to have a fixed location in memory) are handled. Instead of the awkward Rename
block argument modifier and mixed SSA (i.e., some locals using SSA, some not), all locals that name stack locations rather than merely being a handle to an abstract location need to use Loc
. Loc
s (abstract stack locations) don't use SSA -- instead, they're scoped to the procedure and use explicit markers (StorageLive
and StorageEnd
) for delimiting their live range.
Goto-with-value is gone too. The small convenience it provided isn't worth complicating the Continue
(now Goto
) syntax for.
The structure of aggregate type descriptions also changes a bit. First, record types must now specify an alignment requirement, and second, there is now a dedicated Union
type.
The lack of any form of memory alignment would have become a problem sooner or later, and for union types, having a dedicated type better communicates and enforces the layout requirements. While the L3
doesn't strictly need the Union
(re-using the Record
type, like it was done previously, is enough), higher-level languages do, and since removing/modifying types does have overhead, it's better to just keep the type until pass3
, where it's removed together with the other aggregate types.
It's not something that I'd do as part of the pass rework (this PR is large enough already), but Select
in its current form should not reach pass0
.
The Select
exit picks a jump target out of list based on an supplied value, where each jump target is associated with either a unique value or range of values. It's used to implement the case
construct.
The problem with Select
is that in order to correctly emit the multiple comparisons, pass0
needs to spawn a temporary local, as the expression to select the target with might have side-effects (or be non-trivial to recompute). Register/local allocation already took place at that point, so this is effectively not possible. My solution to get around having to allocate a temporary was to restrict the Select
operand to a small set of operands that have no side-effects and translate to a single instruction.
What I think would be a better solution is to introduce a BranchTable
exit, which is used like so:
(BranchTable e (Goto 1) (Goto 2) (Goto 3))
The first Goto
specifies the block to jump to when e == 0
, the second Goto
the block to jump to when e == 1
, and the last Goto
the block to jump to otherwise. Select
would then be lowered into either BranchTable
or a sequence of Branch
blocks around the L25
language.
The pass ordering and work distribution is better than before, but it's still not good. Turning the aggregate locals into stack locations effectively requires an out-of-SSA transformation, which would duplicate parts of the register allocation pass.
More fundamentally, I think the idea of requiring explicit block parameters for every local moving between blocks is wrong. On one hand, it encodes in the IR which locals are alive at the entry of the block explicit (which is good), but on the other hand it increases IR size and requires patching the parameter and argument lists (which is especially annoying with the Cursor
API). I think it's better if locals are implicitly available in blocks that are dominated by its definition.
Finally, the Cursor
API is too cumbersome to use and too restrictive in what it allows. I have an idea for a better way to build changesets, though it'll requires macros.
I'm going to start over with the rework in a new PR.
There are a lot of issues with the lower-level ILs, which are going to lead to problems or inefficiencies later on.
This PR aims to address the aforementioned issues, by redesigning the languages and adjusting the passes accordingly, taking into the account the lessons learned so far.
Notes For Reviewers
This is part of a larger rework of the ILs. The first step (i.e., this PR) is focused on reworking the existing ones, with new languages/passes only being introduced where they're needed for
pass10
to keep working.