poorna2152 / nballerina

WebAssembly Backend for the nBallerina compiler
https://ballerina.io/
Apache License 2.0
4 stars 0 forks source link

`BIR` to SCF #14

Closed poorna2152 closed 2 years ago

poorna2152 commented 2 years ago

Suggestion to add the following field to bir:FunctionCode Region[] regions

type Region readonly & record {|
    Label entry;
    Label? exit;
    Label? parent;
    boolean isLoop;
    Label label;
|};
poorna2152 commented 2 years ago

A Region has blocks in it. The entry field represent the first block in the region and exit represents the last block. So it assumed that the blocks from the entry->exit belongs to that region.

In function codegenWhileStmt,

Is the following assumptions correct? So rather than storing entry and exit do we need to store the set of blocks belonging to the region as an array?

manuranga commented 2 years ago

loopBody and loopExit have to be considered as separate regions.

This doesn't make sense to me. Think loopBody loopBody and loopExit together form a single region. Please think about single entry single exit requirement.

Here a block for loopBody is created. Then here loopExit block is created before loopBody is processed. So loopBody entry->exit cannot be considered as continuos.

What you are trying to saying is loop numbering is not always increasing when you traverse the basic block graph. This is true. I think we only use backwards edges in loops (please verify) so if we picked Region correctly still all of Region blocks should fall between entry and exit.

I am not sure if we should rely on it though.

wback can figure out Region's blcoks by traversing the basic block graph by looking at each ones terminators.

So rather than storing entry and exit do we need to store the set of blocks belonging to the region as an array?

Don't think it is a good option to keep the full list of block as an array in Region since BIR is a publicly exposed format and we want to keep compact.

poorna2152 commented 2 years ago

For the following program

import ballerina/io;

public function main() {
    int x = 10;
    int i = 0;
    int sum = 0;
    while i < x {
        sum = sum + i;
        i = i + 1;
    }
    io:println(sum);
}

Regions-Page-6 drawio

jclark commented 2 years ago

The region for the loop as a entry label that is the label of the "i < x" block and an exit label that is the label for the block starting "io:println". The loop contains all blocks that can be reached from the entry label up to but not including the exit label. (Not sure if these will always be a contiguous numeric range.)

Not sure what the "label" field in your record is doing.

poorna2152 commented 2 years ago

Label field was not needed. It was a mistake. Created regions for if-else statements. Added a new field ty which represents the RegionType instead of the isLoop field.

public enum RegionType {
    Simple,
    Loop,
    Multiple
}
public type Region record {|
    Label entry;
    Label? exit = ();
    Label? parent = ();
    RegionType ty;
|};

Multiple type is for if-else

poorna2152 commented 2 years ago

If the following program is considered,

import ballerina/io;

public function main() {
    int x = 10;
    int i = 0;
    int sum = 0;
    while i < x {
        if i %2 == 0 {
            continue;
        }
        sum = sum + i;
        i = i + 1;
    }
    if sum%2 == 1 {
        io:println(1);
    }
    else {
        io:println(0);
    }
}

In the above region structure diagram the exit block of the Region#2 is also the loopHeader. Is that correct?

Regions-Page-5 drawio

For the above program generated the following WAT file without the use of the relooper.

(module
  (import "console" "log" (func $println (param i64))) 
  (export "main" (func $main)) 
  (func $main 
     (local $0 i64) 
     (local $1 i64) 
     (local $2 i64) 
     (local $3 i32) 
     (local $4 i64) 
     (local $5 i32) 
     (local $6 i64) 
     (local $7 i64) 
     (local $8 i64) 
     (local $9 i32) 
     (local $10 i32) 
     (local $11 i32) 
     (local $12 i32) 
     (block 
       (local.set $0 
         (i64.const 10)) 
       (local.set $1 
         (i64.const 0)) 
       (local.set $2 
         (i64.const 0)) 
       (loop $block$1$break 
         (block 
           (local.set $3 
             (i64.lt_s 
               (local.get $1) 
               (local.get $0))) 
           (if 
             (local.get $3) 
             (block 
               (local.set $4 
                 (i64.rem_s 
                   (local.get $1) 
                   (i64.const 2))) 
               (local.set $5 
                 (i64.eq 
                   (local.get $4) 
                   (i64.const 0))) 
               (if 
                 (local.get $5) 
                 (br $block$1$break)) 
               (block 
                 (local.set $6 
                   (i64.add 
                     (local.get $2) 
                     (local.get $1))) 
                 (local.set $2 
                   (local.get $6)) 
                 (local.set $7 
                   (i64.add 
                     (local.get $1) 
                     (i64.const 1))) 
                 (local.set $1 
                   (local.get $7)) 
                 (br $block$1$break)))))) 
       (block 
         (local.set $8 
           (i64.rem_s 
             (local.get $0) 
             (i64.const 2))) 
         (local.set $9 
           (i64.eq 
             (local.get $8) 
             (i64.const 0))) 
         (if 
           (local.get $9) 
           (call $println 
             (i64.const 0)) 
           (call $println 
             (i64.const 1))) 
         (return))))) 
jclark commented 2 years ago

The way you are drawing diagrams is confusing things. Leave block diagram as is then add regions around blocks using different line style (eg dashed). Draw block diagrams so that text order is the same as in the source. Have arrows coming out of bottom of blocks coming in to top of blocks.

poorna2152 commented 2 years ago

Updated the comment. Is the above regions correct. Is block1 the exit of Region#2?

jclark commented 2 years ago

Regions are not correctimage

poorna2152 commented 2 years ago

Does that mean when a region is created for a loop, entry and exit for that region would both correspond to the loopHeader block

jclark commented 2 years ago

No: the exit for the loop region is the arrow going out of the region into block 3.

poorna2152 commented 2 years ago

It is not a backward branch that I need. I think I said it wrongly in the meeting. I need to identify the break, continue and backward branches when inside a loop body. The function codeGenBreakContinueStmt generates a Branch instruction when there is a continue or a break in the loop. I can set a boolean value to represent breaks and continues in it.

Rather than having a field backward in the BranchInsn can I have field which represent all three types mentioned above. Or should I only represent backward branches and derive the existence of a break instruction by traversing the block structure.

jclark commented 2 years ago

Or should I only represent backward branches and derive the existence of a break instruction by traversing the block structure.

Do this

jclark commented 2 years ago

If you are in a loop region, and you encounter an unconditional branch insn whose destination label is the exit label of the loop region, then I believe you can infer that the branch insn corresponds to a break.

poorna2152 commented 2 years ago
jclark commented 2 years ago

I would suggest you build a data structure from the list of BasicBlocks and Regions that allows you to find this information directly, without traversing anything.

poorna2152 commented 2 years ago

It was mentioned previously that exit block of a region does not belong to that region. If so, I'm considering following as a program which has distinct regions with the same entry block.

import ballerina/io;

public function printBoolean(boolean b){
    if b {
        io:println(1);
    }
    else {
        io:println(0);
    }
}

This currently produces the following region representation,

Regions-Page-8 drawio

Here since block3 is the exit block of the Region#1 it should not belong to the Region#1. Assuming that every block belongs to some region Region#0 is added to contain the block3. block0 is the entry of Region#0 and Region#1.

jclark commented 2 years ago

I agree. So we need region labels distinct from block labels.

poorna2152 commented 2 years ago

In the above case going by exit block definition block3 cannot be considered as the exit of Region#0 but the control flow converges at that point. So does Region#0 has an exit or not.

jclark commented 2 years ago

Not.

poorna2152 commented 2 years ago

Following examples illustrate the Regions generated with and without the addition of REGION_SIMPLE in the codegenFunction

public function main() { int n = 1 + 2; io:println(n); }

![simple](https://user-images.githubusercontent.com/58025190/158783926-a758df9b-aece-45a9-958e-a19249635d47.png)

Here Block0 doesnt belong to any Region.

- Loop

public function main(){ int x = 3; int i =0; int sum =0; while i < x { sum = sum + i; i = i + 1; } io:println(sum); }

![loop](https://user-images.githubusercontent.com/58025190/158784045-480ba779-b72d-47b3-9296-323cf3175581.png)

Here Block0 and Block3 doesnt belong to any Region.

- If condition

public function main(){ boolean b = true; if b { io:println(1); } else { io:println(0); } }



![multiple](https://user-images.githubusercontent.com/58025190/158784158-05a40dcc-57a7-481d-980d-088b94572eed.png)

- Here Block0 doesnt belong to any Region.

So if we are to omit that `REGION_SIMPLE` then the `BasicBlock` with the `RetInsn` will not belong to any region. Isn't this a contradiction to our assumption that every BasicBlock should belong to a Region?