Closed zerbina closed 2 weeks ago
I'm going to wait with merging this addition until the listed passes are actually needed. While I think the language is a solid choice for the state purpose, it's possible that there's a better approach that I haven't yet considered, so I'm not rushing to add the IL (just to remove it again later, should a better alternative emerge).
The first pass out of the listed one that I think will be needed is drop injection.
For making sure the pass is working correctly, and to also get some feedback on performance, I've run pass7
on L7
code translated from the fully processed MIR produced by the NimSkull compiler for the repl.nim
program (~2.5MB of packed nodes).
Besides discovering multiple issues with the data-flow analysis (which are fixed by #54), this also showed that the pass takes up too much time. In a normal debug build, the pass takes 30 seconds (!) to process all procedures, while in -d:release
mode, it takes 4 seconds. Considering that the repl.nim
is only a small to medium sized program (~1000 procedures), this is far too much time.
In order of significance:
Issue with Changeset.replace
. The changeset's node sequence is, for some reason, not moved into the builder, resulting in a full copy of the whole sequence
PackedSet
is slow. Or at least the operations relevant to the data-flow analysis (i.e., union and intersection) are. In addition, a PackedSet
instance itself has a very large static size (320 byte!), ballooning up the static size of BBlock
to 688 byte! This quickly adds up, especially since there are usually a lot of basic blocks in a single procedure.
Type lookup is slow. Looking up a type via its index requires skipping over all predecessor nodes in the tree, which takes longer the further the index is away from the start.
PackedSet.len
is slow. Especially if only used to test whether the set is empty or not.
Number 1 is easy to fix, and 2 and 4 can be addressed by using a Table
-based sparse set implementation. With the aforementioned three things fixed, the pass only takes ~550ms in release mode, which - while a lot better - is still too long.
Reducing the number of basic blocks (by combining them where possible), reducing the maximum number of variables live at the same time (by adding "end of storage duration" markers to the L7
), some general optimization to the pass itself, as well as improving type lookup efficiency (possibly through using a skip list) should together be able to shrink the pass' run time to somewhere below 100ms, which seems acceptable.
Okay, Except
support is now implemented and the fixes from main
are merged. Some tests are missing, but otherwise the bulk of the work should be done.
I did quite a bit of testing with real-world code bases, and I'm now fairly certain that the structure and idea with L7
is the right choice. There are a few things that need to be changed (relative to the current state), namely:
"end of life" marker for locals. Something like StorageEnd
or StorageDead
), in order to mark them as dead. Right now, locals have to be considered alive after their first write (or possible write), which leads to locals that have their address taken usually living much longer than necessary.
A Move
operator. It functions similar to Copy
, with the addition of communicating that the source location is not used afterwards. An L7
Move
would be translated directly to an L4
Move
.
Requiring locals to be initialized prior to being used. (Applies to both the L10
and L7
.) This would simplify some code, by removing the need for auto-spawning, and - more importantly - removes a case of undefined behaviour (i.e.: what's the content of an uninitialized local?). In case not initializing locals prior to their first use should be allowed in the source language - like it is in NimSkull -, there needs to be a separate pass that initializes the problematic locals with their type's default value.
This PR is only concerned with splitting up pass10
, so the changes should happen via follow-up PRs.
The language is somewhat of a reinvention of NimSkull's MIR, but without its problems and unnecessary complexity. Most notably:
finally
. Finally sections are a major source of complexity in the MIR, also being the reason why target lists (another source of complexity) have to exist. They complicate the MIR's structure, the data-flow analysis, and especially code generation. It makes much more sense to try
/finally
early (into try
/except
and block
), instead of just prior to code generation, where it's much harder and more complex to do so.if
. The idea was to keep some structure in order to ease code generation, effectively making it a workaround for shortcomings of the C/JS code generators. The L25
allows arbitrary branching control-flow (as long as it's points forward and doesn't cross into loops)
Summary
Add the
L25
language and integrate it into the pass pipeline. TheL25
features a flat procedure structure and goto-esque control-flow, without using a basic block structure and SSA yet. It comes afterL30
and beforeL4
.Details
The new IL is planned to be the first in a series of ILs that all use flat procedure bodies with goto-esque control-flow constructs. This structure works well for the stage during compilation where data- and control-flow analysis is needed, but where the live range(s) of locals can still change.
Future passes that are currently planned to use this structure are: borrow checking, cursor inference, move analysis, destructor injection, and inlining.
Implementation
pass30
is effectively split into two passes. Turning the structure control-flow constructs (i.e.,If
,Case
,Loop
, etc.) stays inpass30
, while the data-flow analysis and SSA transformation moves topass25
(without being modified).Most of the
pass25
are formerpass30
tests (those concerning data- flow analysis and SSA transformation), the rest are new tests covering the basicL25
toL4
translation.To-Do
Except
support inpass7
Notes for Reviewers