eliotmoss commented 9 years ago

We have concluded that while the official form of Mu IR is SSA form (but see Issue #18 for current thoughts on how to represent that form), many clients will find it more convenient to generate something that is mostly Mu IR but that is not in SSA form, and that is it further desirable to offer a standard tool to convert from some "pre-SSA" form to proper SSA form. This tool may operate in a stand alone manner or be more in bed with an implementation of Mu.

We propose the following specific pre-SSA form, according to how it differs from SSA-form Mu.

"SSA-variables" may be assigned more than once; however, any individual such variable must be used in a type-consistent manner.
PHIs may be omitted (or, in the proposal of #18, values may be omitted at branches and variables omitted at labels)
For convenience we introduce a "copy" operator, var = ID arg, which takes one argument arg of type T and assigns it to variable var. This operator seems to be convenient sometimes from a client perspective.

The converter to SSA-form will perform live-ness analysis and add variables to labels and values to branches as necessary, checking for type consistency. If some variable is live but not initialized, then the converter will insert a safe initialization (to 0 or 0.0 for numeric types, null for a pointer, etc.) at the latest possible point that does not interfere with existing assignments to the variable. (Optimization may move the initialization earlier as deemed appropriate.)

We will undertake to develop the converter in Scala or Java.

wks commented 9 years ago

Sometimes we need to uniquely identify TRAPs and OSR points. Then the client needs to find out the generated SSA Mu name because the trap handler tells the client the Mu name of the current instruction, not the non-SSA variable.

Example: The client cannot tell which TRAP is triggered unless unique names are given:

%bb1:
  %value_from_client = TRAP <@i64>
  BRANCH %exit

%bb2:
  %value_from_client = TRAP <@i64>
  BRANCH %exit

%exit:
  RET %value_from_client

I have two solutions:

Introduce some new syntax. It gives the client more control, but is a bit complex.
Require the client to use non-duplicated non-SSA names. It is cleaner, but always requires the client to ask the converter about the mapping after conversion. If this extra step is okay for the client, I will prefer this approach for simplicity.

Solution 1: new syntax

We use a different sigil for non-SSA variables: $var. Then @globalName is still a global (SSA) Mu name, and %localName is still a syntax sugar of @FuncVersionGlobalName.localName, or @BasicBlockGlobalName.localName in the new goto-with-values form.

We use a special syntax for explicit names: $var = [%ssavar] TRAP <...> ... .... The [%ssavar] is optional. If omitted, the SSA variable name will be automatically generated. The [] is on the right side of = because the name labels the instruction, not the non-SSA variable.

An example:

.funcdef @foo VERSION %v1 <@foo.sig> ($param1 $param2) {
%entry:
  $value_from_client = [%trap1] TRAP <@i64>
  $retval = [%callsite1] CALL <...> @some_func ($value_from_client)
  $value_from_client = [%trap2] TRAP <@i64>

  RET $value_from_client
}

Both of the traps will assign the value returned from the client to the non-SSA variable $value_from_client, but the trap handler can identify which TRAP is triggered by looking at the (SSA) Mu names: @foo.v1.entry.trap1 and @foo.v1.entry.trap2. If OSR happens in @some_func, the client can identify @foo.v1.entry.callsite1 as the "current PC".

Solution 2: use the `ID` pseudo-instruction and ask the converter

Instead of introducing the $ sigil and using explicit naming, we let the to-SSA converter tell the client about the mapping between non-SSA variables and SSA variables. For example in this form:

.funcdef @foo VERSION %v1 <@foo.sig> (%param1 %param2) {
%entry:
  %lt = SLT <@T> %param1 %param2
  BRANCH2 %lt %bb1 %bb2

%bb1:
  %value_from_client = TRAP <@i64>
  %retval = CALL <...> @some_func (%value_from_client)
  BRANCH %exit

%bb2:
  %value_from_client = TRAP <@i64>
  BRANCH %exit

%exit:
  RET %value_from_client
}

The converter will tell the client that

%value_from_client -> [@ssa_name1, @ssa_name3]
%retval -> [@ssa_name2]

Then the client can still find the SSA name for %retval, but cannot determine which TRAP is which from the variable name. To work around this, the client can write the program differently using the ID pseudo-instruction:

.funcdef @foo VERSION %v1 <@foo.sig> (%param1 %param2) {
%entry:
  %lt = SLT <@T> %param1 %param2
  BRANCH2 %lt %bb1 %bb2

%bb1:
  %trap1 = TRAP <@i64>
  %value_from_client = ID %trap1
  %retval = CALL <...> @some_func (%value_from_client)
  BRANCH %exit

%bb2:
  %trap2 = TRAP <@i64>
  %value_from_client = ID %trap2
  BRANCH %exit

%exit:
  RET %value_from_client
}

The converter will tell the client something like:

%trap1 -> [@ssa_name1]
%retval -> [@ssa_name2]
%trap2 -> [@ssa_name3]
%value_from_client -> [@ssa_name1, @ssa_name3]

Although %value_from_client is still ambiguous, both %trap1 and %trap2 are uniquely identified.

eliotmoss commented 9 years ago

I can't say that I am totally "getting" the problem you are pointing out and trying to address, Kunshan, but I am pretty sure I prefer the second approach, since it fits within the proposal as I outlined it. I can see that having a converter does mean that clients that use it need some kind of reverse mapping from the ultimately developed SSA variables back to their variables (and that it may need to be a per-block thing, i.e., this original variable si this SSA variable in block 1, this other SSA variable in block 2, etc.). This is analogous to what a register allocator will need to do inside a Mu implementation to enable mapping back from register contents (and stack locations) to SSA variables.

microvm / microvm-meta

Pre-SSA form #44

Solution 1: new syntax

Solution 2: use the `ID` pseudo-instruction and ask the converter

microvm / microvm-meta

Pre-SSA form #44

Solution 1: new syntax

Solution 2: use the ID pseudo-instruction and ask the converter

Solution 2: use the `ID` pseudo-instruction and ask the converter