willi19 / swpp202301-compiler-team6

MIT License
0 stars 0 forks source link

Simplification due to infinite possibilities of loops. #23

Open germanium32 opened 1 year ago

germanium32 commented 1 year ago

I would like to simplify the target scope of the loop2sum pass due to high complexity. (which means I could not find a general method that covers 'all' or 'most' types of loop sums) Supplementary implementation can be researched during Review, Sprint3, or Wrap-up periods.

Actually, the terminology, 'loop-sum' itself is already vague. In our natural perspective, it can be simply interpreted as the given pseudocode:

while(criterion) {
  // Operations
  sum += ResultOfOperations
}

However, this itself is extremely vague in code-sense, since there are a myriad of ways that has such form. For example, the sum may be a variable(or register in LLVM IR), or a pointer to a memory. The Operations part can be unexpectedly large and complex. Actually, if the Operations part contains one or more branch operations(which can be derived from if statements), validating whether vectorization is possible, and the actual vectorization can be extremely cumbersome.

Loops in LLVM IR comes in infinitely may types. However, most programs follow the common case. In the assumption under optimizing with 'SimplifyCFG', I suggest that there are simple types of loop-sum that cover most of the cases.

We first define the terminology of loops: Header: The basic block that the loop starts. Latch(Tail): The basic block that can return to the header directly; blocks that have backedges. Body: The basic blocks that all operations within the loop occurs. Induction Variable: The variable that controls recursion. for(i) -> i is the induction variable Accumulation Variable: The variable that stores the cumulative sum. Usually denoted as sum, result, etc.

Type 1 : 1-Block Loop-Sum The loop is consisted of a single basic block. (Head = Latch = Body) Do - While structure

Type 2 : 2-Block Loop Sum The loop is consisted of two basic blocks. (Head / Latch = Body) While - Do structure

Do-While structure is the more common method in assembly, but I checked that most c-to-ll conversions with loop-sum structures resulted in the Type 2 structure. Hence, I first implement the optimization over the 2-Block type. The SimplifyCFG removes/merges redundant 'increment blocks' and 'end blocks' that separate Latch and Body blocks, which is not good for vectorization. If all instructions are compressed in one block(body=latch), vectorization is expected to be easier.

goranmoomin commented 1 year ago

The SimplifyCFG removes/merges redundant 'increment blocks' and 'end blocks' that separate Latch and Body blocks, which is not good for vectorization.

FYI, AFAIK the SimplifyCFG's loop canonicalization probably is helping the LLVM's loop passes to detect them.

sharaelong commented 1 year ago

So it means you are struggling now in implement 2-block type optimization pass?