nim-works / phy

compiler and vm experiments
MIT License
3 stars 2 forks source link

implement the `L25` intermediate language #52

Closed zerbina closed 2 weeks ago

zerbina commented 1 month ago

Summary

Add the L25 language and integrate it into the pass pipeline. The L25 features a flat procedure structure and goto-esque control-flow, without using a basic block structure and SSA yet. It comes after L30 and before L4.

Details

The new IL is planned to be the first in a series of ILs that all use flat procedure bodies with goto-esque control-flow constructs. This structure works well for the stage during compilation where data- and control-flow analysis is needed, but where the live range(s) of locals can still change.

Future passes that are currently planned to use this structure are: borrow checking, cursor inference, move analysis, destructor injection, and inlining.

Implementation

pass30 is effectively split into two passes. Turning the structure control-flow constructs (i.e., If, Case, Loop, etc.) stays in pass30, while the data-flow analysis and SSA transformation moves to pass25 (without being modified).

Most of the pass25 are former pass30 tests (those concerning data- flow analysis and SSA transformation), the rest are new tests covering the basic L25 to L4 translation.


To-Do

Notes for Reviewers

zerbina commented 1 month ago

I'm going to wait with merging this addition until the listed passes are actually needed. While I think the language is a solid choice for the state purpose, it's possible that there's a better approach that I haven't yet considered, so I'm not rushing to add the IL (just to remove it again later, should a better alternative emerge).

The first pass out of the listed one that I think will be needed is drop injection.

zerbina commented 1 month ago

For making sure the pass is working correctly, and to also get some feedback on performance, I've run pass7 on L7 code translated from the fully processed MIR produced by the NimSkull compiler for the repl.nim program (~2.5MB of packed nodes).

Besides discovering multiple issues with the data-flow analysis (which are fixed by #54), this also showed that the pass takes up too much time. In a normal debug build, the pass takes 30 seconds (!) to process all procedures, while in -d:release mode, it takes 4 seconds. Considering that the repl.nim is only a small to medium sized program (~1000 procedures), this is far too much time.

Reasons for Slowness

In order of significance:

  1. Issue with Changeset.replace. The changeset's node sequence is, for some reason, not moved into the builder, resulting in a full copy of the whole sequence

  2. PackedSet is slow. Or at least the operations relevant to the data-flow analysis (i.e., union and intersection) are. In addition, a PackedSet instance itself has a very large static size (320 byte!), ballooning up the static size of BBlock to 688 byte! This quickly adds up, especially since there are usually a lot of basic blocks in a single procedure.

  3. Type lookup is slow. Looking up a type via its index requires skipping over all predecessor nodes in the tree, which takes longer the further the index is away from the start.

  4. PackedSet.len is slow. Especially if only used to test whether the set is empty or not.

Number 1 is easy to fix, and 2 and 4 can be addressed by using a Table-based sparse set implementation. With the aforementioned three things fixed, the pass only takes ~550ms in release mode, which - while a lot better - is still too long.

Reducing the number of basic blocks (by combining them where possible), reducing the maximum number of variables live at the same time (by adding "end of storage duration" markers to the L7), some general optimization to the pass itself, as well as improving type lookup efficiency (possibly through using a skip list) should together be able to shrink the pass' run time to somewhere below 100ms, which seems acceptable.

zerbina commented 1 month ago

Okay, Except support is now implemented and the fixes from main are merged. Some tests are missing, but otherwise the bulk of the work should be done.


I did quite a bit of testing with real-world code bases, and I'm now fairly certain that the structure and idea with L7 is the right choice. There are a few things that need to be changed (relative to the current state), namely:

This PR is only concerned with splitting up pass10, so the changes should happen via follow-up PRs.

zerbina commented 3 weeks ago

The language is somewhat of a reinvention of NimSkull's MIR, but without its problems and unnecessary complexity. Most notably: