microvm / microvm-meta

We have moved: https://gitlab.anu.edu.au/mu/general-issue-tracker
https://gitlab.anu.edu.au/mu/general-issue-tracker
3 stars 0 forks source link

How to represent merging of variables / Phi functions #18

Open eliotmoss opened 9 years ago

eliotmoss commented 9 years ago

This is an update of a proposal from a year ago.

The current Mu definition of SSA-form has labels, branches to labels, and Phi functions just after labels.

I propose an alternative, which we might call "goto with values". It is similar to continuation passing style except that all the continuations are statically defined (one for each label). This alternative would work as follows:

Example: x = 3; y = 25; goto l(x, y); ... l: (x2:int<32>,y2:int<8>)

This form makes it clear that the association ("assignment") of values to phi-variables has to happen as part of the control transfer. Phi-functions are simply one way of representing that, but they're not not a way that seems immediately helpful for code generation.

I further observe that the current definition apparently allows any value to be passed in to a Phi, but I think it should be restricted to global/constant SSA variables or SSA variables defined in the sending block. The new form perhaps makes that clear, not least because with it you can explicitly disallow referring to local SSA variables defined in other blocks.

Putting these another way, non-global SSA-variables have a scope only from where they are defined to the end of their block. Some may be defined at the label that start the block and others may be defined later, but all live values must be mentioned at branches and explicitly passed on.

Compared with traditional SSA form this may appear verbose. However it has at least a couple of advantages:

wks commented 9 years ago

This goto-with-values is actually the way used by PyPy's internal representation.

I think it is equivalent to the PHI node. I don't really object this idea, but PHI looks better to me. We may internally move the PHI to the jump site to help the code generator.

However, in the µVM, branching is not the only instruction that causes control flow transfer, but we also have SWITCH (like C's switch statement), INVOKE (calling a function and catch exceptions), TRAP (go to the client and may return to the µVM with an exception) and IINVOKE (calling an "intrinsic function" (which will be renamed "common instruction") and may receive exceptions)

In the future, even more instructions will have exceptional jump targets. They are:

If the values are to be added in the jumping site, all those instructions must be augmented to add the values at the destinations. In fact, doing so does not change the complexity of the code at all (when using PHI functions, all cases must also be covered). But PHI functions simplify the instruction format.

Some simple comparison:

Arithmetic exceptions:

%entry:
    %result = DIV <@i64> %param1 %param2 EXC(%nor %exc)

%nor:
    %a = PHI <@i64> { %entry: %param1, %other_bb: ...; }
    %b = PHI <@float> { %entry: 0.0f, %other_bb: ...; }
    %c = PHI <@i64> {%entry: %result, %other_bb: ...; }
    ...

%exc:
    %error_code = PHI <@i32> { %entry: @SIGFPE, %other_error_source: ...; }
    ...
%entry:
    %result = DIV <@i64> %param1 %param2 EXC(%nor(%a=%param1, %b=0.0f, %c=%result) %exc(%error_code=@SIGFPE))

%nor(%a:@i64, %b:@float, %c:@i64):
    ...

%exc(%error_code:@i32):
    ...

Function call with exception:

NOTE: The CALL instruction with EXC(...) clause is the proposed future grammar. Currently it is a separate "INVOKE" instruction.

%entry:
    %retval = CALL <@func_sig> @func (%arg1 %arg2 ...) EXC(%nor %exc)

%nor:
    ...

%exc:
    %the_exception = LANDINGPAD    // Do we want to remove this, too?
    %another_val = PHI <@i64> { %entry: 42; %other_bb: 43; }
    ...
%entry
    %retval = CALL <@func_sig> @func (%arg1 %arg2 ...) EXC(%nor() %exc(%the_exception=THE_EXCEPTION, %another_val=42))
    // We use the special THE_EXCEPTION keyword to pass the value to another basic block

%nor():
    ...

%exc(%the_exception:@REF_TO_VOID, %another_val:@I64):
    ...

Alternatively we can treat exceptions specially

%entry
    %retval = CALL <@func_sig> @func (%arg1 %arg2 ...) EXC(%nor() %exc(%another_val=42))

%nor():
    ...

%exc(%another_val:@I64):
   %the_exception = LANDINGPAD   // This is still too similar to PHI 
   ...
wks commented 9 years ago

I further observe that the current definition apparently allows any value to be passed in to a Phi, but I think ti should be restricted to global/constant SSA variables or SSA variables defined in the sending block. The new form perhaps makes that clear, not least because with it you can explicitly disallow referring to local SSA variables defined in other blocks.

I agree with the part that there are many restrictions on the validity of the control flow graph, the define-use relation, the phi function and so on. Specifically,

  1. The definition of any variable must dominate all of its uses, except phi.
  2. For the phi function, any right-hand-side variable must dominate the end of the basic block where the control flow comes from. I think this is easier to reason about in the branch-with-value form than phi functions.

Some special cares must be taken for branching instructions that may produce a value. For example, the CALL instruction yields a return value when returned normally, but does not yield that value when an exception is thrown. So

%entry:
    %rv = CALL <...> ... (...) EXC(%bb1 %bb2)

%bb1:
    %v1 = PHI <...> { %entry: %rv; ...: ...; }
    %the_non_existing_exception = LANDINGPAD       // ERROR: the exception is not thrown.
    ...

%bb2:
    %the_exception = LANDINGPAD
    %v2 = PHI <...> { %entry: %rv; ...: ...; }      // ERROR: %rv is not defined because the function does not return normally.
wks commented 9 years ago

More about the goto-with-value form

NOTE: This is not going to the next milestone. Maybe a later milestone.

In recent meetings, we agreed that the goto-with-value form has useful properties.

Eliot (@eliotmoss) mentioned that this form can push the obligation of liveness analysis to the Client and the µVM can do less job.

Based on this form, the Client can submit one basic block at a time rather than whole functions. The compiler can either compile one basic block at a time for quick compiling, or a whole function at a time for generating more optimised code.

Another idea is to replace the concept of "function" with "basic block" and make "tailcall" the default control flow transfer mechanism. This assumption will make all basic blocks globally callable and may force some "calling convention", which may be undesirable. For this article, I will stick to the function-basicblock-instruction model.

The main advantage over the phi-function form is the ability of fast basic-block-at-a-time compilation at the cost of some code quality. For generating high-quality code, this form is equivalent to the traditional phi-function form and conversion algorithms exist: SSA is Functional Programming

Yi (@qinsoon) is going to do some experiments with the new form and see the difference at the code generator level.

The new form

Example

.typedef @i64 = int<64>
.typedef @void = void

.const @I64_0 <@i64> = 0
.const @I64_1 <@i64> = 1
.const @I64_8 <@i64> = 8
.const @I64_12 <@i64> = 12
.const @I64_16 <@i64> = 16

.funcsig @gcd_sig = @i64 (@i64 @i64)
.funcdef @gcd VERSION @gcd_v1 <@gcd_sig> {
  %entry(<@i64> %a <@i64> %b):
    BRANCH
    goto { dest: %head(%a %b) } // not allowed to branch to entry. so use a head basic block.

  %head(<@i64> %a <@i64> %b):
    %b_zero = EQ <@i64> %b @I64_0
    BRANCH2 %b_zero
    goto {
      iftrue: %exit(%a)
      iffalse: %body(%a %b)
    }

  %body(<@i64> %a <@i64> %b):
    %b2 = SREM <@i64> %a %b
    BRANCH
    goto { dest: %head(%b %b2) }

  %exit(<@i64> %a):
    RET <@i64> %a
    goto {}
}

.funcsig @gcd_user_sig = @void
.funcdef @gcd_user VERSION @gcd_user_v1 <@gcd_user_sig> {
  %entry():
    %v1 = CALL <@gcd_sig> @gcd (@I64_12 @I64_8)    // just call
    %v2 = CALL <@gcd_sig> @gcd (@I64_16 @I64_12)    // call and catch exceptions
    goto {
      normal: %bb2(%v1 %v2)
      exception: %handler(%v1, EXCEPTION)  // EXCEPTION is a keyword for the exception
      stack_overflow: %handler2()  // a more serious problem
    }

  %bb2(<@i64> %x <@i64> %y):
    CALL <@gcd_sig> @gcd (@I64_12 @I64_8) // does not catch exception, but handle stack overflow
    goto {
      normal: %bb3(%v1 %v2)
      stack_overflow: %handler2()
    }

  %bb3(<@i64> %x <@i64> %y):
    %a = NEW <@i64>
    %b = NEW <@i64>
    goto {
      normal: %bb4(%a %b %x %y)
      out_of_memory: %handler2()
    }

  %bb4(<@irefi64> %a <@irefi64> %b <@i64> %x <@i64> %y):
    RETVOID
    goto {}

  %handler(<@refvoid> %exception):
    ...
    goto {}

  %handler2():
    ...
    goto {}

}

SWAPSTACK example. Traditional form example is here: https://github.com/microvm/microvm-spec/wiki/instruction-set#swapstack-instruction

.typedef @i64 = int<64>
.typedef @stack = stack
.typedef @void = void
.typedef @refvoid = ref<@void>
.typedef @StopIterationException = ...

.const @I64_1 <@i64> = 1
.const @I64_2 <@i64> = 2
.const @I64_3 <@i64> = 3

.funcsig @sig1 = @void (@stack)

// This coroutine (a generator) yields 1, 2 and 3 sequentially and stop.
// param %from: the stack of the main coroutine, i.e. the "caller" stack
.funcdef @one_two_three VERSION @one_two_three_v1 <@sig1> {
    %entry(<@i64> %from):
        SWAPSTACK %from RET_WITH <@void> PASS_VALUE <@i64> @I64_1
        SWAPSTACK %from RET_WITH <@void> PASS_VALUE <@i64> @I64_2
        SWAPSTACK %from RET_WITH <@void> PASS_VALUE <@i64> @I64_3

        %exc = NEW @StopIterationException
        SWAPSTACK %from KILL_OLD THROW_EXC %exc

        THROW @NULLREF  // unreachable
        goto {}
}

.funcsig @main_sig = @void ()

.funcdef @main VERSION @main_v1 <@main_sig> {
    %entry():
        // Get the reference to the current stack
        %cur_stack = COMMINST @uvm.current_stack

        // Create a new stack
        %coro = NEWSTACK <@sig1> @one_two_three (%cur_stack)
        BRANCH
        goto { dest: %head(%cur_stack %coro) }

    %head(<@stack> %cur_stack <@stack> %coro):
        // Go to the coroutine
        %cur_num = SWAPSTACK %coro RET_WITH <@i64> PASS_VOID
        goto {
            normal: %body(%cur_stack %coro %cur_num)
            exception: %exit(EXCEPTION)
        }

    %body(<@stack> %cur_stack <@stack> %coro <@i64> %cur_num):
        // Do something with %cur_num
        CALL <...> @print_num (%cur_num)

        // Get the next number
        BRANCH
        goto { dest: %head(%cur_stack %coro) }

    %exit(<@refvoid> %exc):
        COMMINST @thread_exit
        goto {}
}

Structure

A goto clause has the following form:

goto {
    reason1: basic_block1(<ty1> arg1 <ty2> arg2 ...)
    reason2: basic_block2(<ty1> arg1 <ty2> arg2 ...)
    ...
}

List of terminators

A terminator is an instruction that can terminates a basic block. It can be: (actually none changed)

Among them, only TAILCALL, RET, RETVOID, THROW leaves the current function. The COMMINST @uvm.thread_exit instruction currently is an ordinary instruction, but since any thread cannot go past that instruction, it is effectively a "leaver", too. All other terminator instructions branch to another basic block and does not leave the current function. Some instructions may throw exceptions out of the function. They are CALL, TRAP, WATCHPOINT and SWAPSTACK.

Instructions removed

List of "reasons"

TODO: Think about a better word for "reason". Maybe "destination name"

Open question: How does the SWITCH instruction look like? One possible option: keep the current form and use an empty goto clause.

An even more alternative form

Allow the goto clause to use some inter-function transfer instructions. This will further reduce the number of basic blocks.

%cond = EQ <..> %v1 %v2
BRANCH %cond
goto {
    iftrue: RET <@T> %val
    iffalse: THROW %some_exception
}

CALL <@sig> @func (%arg1 %arg2 ...) goto {
    normal: TAILCALL <@sig2> @func2 (...)
    exception: TAILCALL <@sig3> @func3 (%arg1 %arg2 EXCEPTION %arg3 ...)
    stack_overflow: COMMINST @uvm.thread_stop
}
wks commented 9 years ago

About "bundle".

Currently "bundle" (the counterpart of JVM class file) is the unit of code delivery from the client to the uvm. It will remain so. Currently "function" is the unit of code redefinition. It may remain so, and it may be changed.

In the new form, it may be the case that a basic block can be left undefined and can be "refined" in a subsequent bundle. Then there will be two level of code (re-)definition: function and basic block.

Further possibilities are:

The first approach is cleaner, because all defined code must be preserved for stack introspection in OSR. That's because any code may have activations deep in the stack. Not allowing basic block-level re-definition can eliminate the need of "version of basic block".

I propose having two levels of redefinition.

The first is the function-level redefinition. It has the same syntax as before.

The second is basic block-level redefinition, which I call "refinement". It has a slightly different syntax:

In an old bundle:

.funcdef @funcname VERSION @funcvername {
...
%basicblockname(params) UNDFEFINED
}

In a new bundle:

.funcdef REFINE @funcname VERSION @funcvername {
%basicblockname(params):
  Instructions...
  goto {...}

%newbasicblockname(params):
  Instructions...
  goto {...}
}

The UNDEFINED keyword indicates that a basic block is not defined. Branching to such a block will trap to the client. The REFINE keyword indicates that it will not define a new function version, but redefine an existing version. It can define the missing basic blocks and introduce more basic blocks. This will give some HipHopVM-like behaviour.

Open questions:

  1. There are two compiling strategy: basic block at a time and function at a time. They have different costs and different code qualities. The client may want to choose between them.
  2. Redefining code makes it self-modifying code. The only safe way to synchronise multiple threads is to let all threads stop at yielding points before doing code redefinition so the cpu will not execute (or even prefetch) the code being modified. Efficiency is a question.
  3. Is the regular TAILCALL instruction (calls another function) efficient enough to eliminate the need to introduce basic block-level definition?
qinsoon commented 9 years ago

I roughly tested the idea. I implemented a trivial local liveness analysis, and made two of my micro-benchmarks work correctly. The others do not work so far, due to some difficulties with my current implementation - I will state later.

A rough conclusion from me is that "goto with values" can reduce the compilation time, but it also imposes some difficulties in the compiler implementation. It is not a plug-in feature, and requires various compiler phases to co-operate with it.

With the old implementation (uVM needs to do global liveness analysis), for a testcase (~64 virtual registers, 113 lines of IR including blank), a breakdown of compilation time is as below:

Total                   414ms
Parsing                 193(0.466184)
Liveness Analysis       63(0.152174)
Register Coalescing     15(0.036232)
Linear Scan             25(0.060386)
Code Emission           76(0.183575)
(trivial phases are ignored, each of which took less than 10 ms): 

A local liveness analysis on this code took ~9ms (trivial).

When using fewer virtual registers (the same testcase but with ~32 virtual registers, 65 lines of IR), the breakdown with global liveness analysis (old impl.) looks like this:

Total                   310ms
Parsing                 159(0.512903)
Liveness Analysis       29(0.093548)
LinearScan              16(0.051613)
Code Emission           62(0.200000)
(trivial phases are ignored, which took less than 10ms each)

A local liveness analysis on this code took ~7ms (trivial).

The rough results will give an idea how much "goto with values" can reduce the compilation time. Global liveness analysis is affected greatly by the number of virtual register and the lines of code. But local liveness analysis, though still affected, is always trivial.

Then, the difficulties with this approach:

The liveness of variables is declared at IR level, but linear scan (the phase that will use those liveness information) is performed at a much later stage on machine code level. Even in the simplest backend implementation, there are a few non-trivial phases lying between original uVM IR and linear scan: uVM IR -> internal IR -> instruction selection -> code transformation (e.g. combining returns, de-form SSA) -> (liveness analysis) -> linear scan.

This means:

(1) A relationship between temporaries at any level and the variables declared as live-in needs to be preserved so that we still have valid liveness information for linear scan.

This restricts freedom in code generation. For example, in my implementation of instruction selection (tree pattern matching), this phase doesn't necessarily preserve the original temporaries, sometimes it renames or deletes temporaries. Then special consideration is needed to preserve a matching between new temporaries and the old ones that are declared as alive. Besides this may also restrict code motion.

So by saving time for doing liveness analysis, we may pay time and implementation difficulties to preserve the liveness information across several compilation phases. This is the reason why fully implementing this idea on my current design will take a lot more time and haven't been finished.

(2) For any code transformation which happen in between, if it changes the liveness or control flow, it needs to take the responsibility to correct the liveness information as well. Compared to (1), this might be trivial and not error-prone.

(3) I am not sure if graph coloring can take advantage of those liveness information. I guess we do not want uVM IR carries certain information that is only useful to one register allocation algorithm, and we do not want to assume any underlying implementation with the IR.

wks commented 9 years ago

The requirement that "all live variables in the beginning of a basic block are exactly those in the parameter list" makes it not equivalent to a "minimal SSA form".

The goto-with-value form is equivalent to SSA form, as shown by a conversion algorithm in SSA is Functional Programming.

However, the placement of PHI nodes is quite liberal in programs in SSA form. Here is an example:

%bb1:
    %a = ...
    BRANCH2 %condition %bb21 %bb22

%bb21:
    ....
    BRANCH %bb3

%bb22:
    ....
    BRANCH %bb3

%bb3:
    // %b = PHI { %bb21: %a; %bb22: %a; } // Redundant PHI node.
    use(%a)
    // use(%b)

The PHI node of %b is redundant because the definition of %a dominates the basic block %bb3. The minimal SSA form only places PHI nodes in the dominance frontier of a variable.

However, if we convert the above program to the goto-with-value form while still require all live variables to be explicitly listed, then the program becomes:

%bb1():
    %a = ...
    BRANCH2 %condition { iftrue: %bb21(%a); iffalse: %bb22(%a); }

%bb21(%x):
    ....
    BRANCH { dest: %bb3(%x); }

%bb22(%y):
    ....
    BRANCH { dest: %bb3(%y); }

%bb3(%z):
    use(%z)

The parameters in %bb3(%x) and %bb3(%y) make it equivalent to have a PHI node in %bb3:

%bb3:
    %z = PHI { %bb21: %x; %bb22: %y; } // Redundant

According to Cytron et al.:

The optimizations that depend on SSA form are still valid if there are some extraneous PHI-functions beyond those that would appear in minimal SSA form. However, extraneous PHI-functions can cause information to be lost, and they always add unnecessary overhead to the optimization process itself. Thus, it is important to place PHI-functions only where they are required.

The "information loss" is certain. But it is questionable whether the uVM still needs a lot of information for optimisation.

However, if the worst case is to introduce MOV instructions, we may (really?) assume that modern CPUs have zero-latency MOV instructions since IvyBridge (how about other architectures?), or the cost of MOV is insignificant (really?).

Appendix: what is the minimal SSA form really equivalent to? I think it is a hierarchical nested functional form, where a basic block Y is nested in basic block X if X dominates Y.

function bb1() {
    var a = ...;
    if (...) { bb21(); } else { bb22(); }

    function bb21(){
        bb3();
    }
    function bb22(){
        bb3();
    }
    function bb3() {
        use(a)
    }
}
wks commented 8 years ago

A branch of the specification is created to adopt this form. https://github.com/microvm/mu-spec-goto

Current decisions are:

%basic_block(<@T1> %p1 <@T2> %p2 <@T3> %p3) [%exc]:
    // instructions go here. %exc (optional) receives the exception (always ref<void>)
    BRANCH %another_basic_block(%p3 %p1 %p2)

%another_basic_block(<@T1> %p1 <@T2> %p2 <@T3> %p3):
    // more instructions...
wks commented 8 years ago

Migrated to the goto-with-values form. This commit shows what have changed (including other unrelated minor fixes, too): https://github.com/microvm/mu-spec-goto/commit/338c63e020ef922174867dd2be0d609b35ac91a1

The reference implementation is still outdated (as it has already been. Many changes have been made last month.).

wks commented 8 years ago

The goto-with-values branch of the reference implementation: https://github.com/microvm/microvm-refimpl2/tree/goto-with-values now implements the goto-with-values form and passes existing tests (means it does not break old features).

sampsyo commented 8 years ago

Hi! Apropos of nothing except that I just saw Steve give a talk where he mentioned this feature of mu IR: I read this slide deck about the mid-level IR for Swift recently, where they talk about the very similar approach used there. They call it "basic block arguments," but it looks almost identical to "goto with values." Cool convergence!

eliotmoss commented 8 years ago

On 11/12/2015 2:25 PM, Adrian Sampson wrote:

Hi! Apropos of nothing except that I just saw Steve give a talk where he mentioned this feature of mu IR: I read this slide deck about the mid-level IR for Swift http://llvm.org/devmtg/2015-10/slides/GroffLattner-SILHighLevelIR.pdf recently, where they talk about the very similar approach used there. They call it "basic block arguments," but it looks almost identical to "goto with values." Cool convergence!

It's not novel to us, but yes, it is an interesting convergence of approaches to representing code. It's related to expressing semantics using continuations, and to continuation passing form in compilers.

Regards -- Eliot