During the offline meeting in 06/01, we led to a conclusion that Loop2Sum is overcomplex; we should divide the pass into some simple pieces. We propose a simpler pass structure: LoopVectorize + Add2Sum.
For the case LoopVectorize, we may choose to modify our implementation, or use a pre-existing pass. Most of the code implemented in in loop2sum can be reused in LoopVectorize, or we may use a slightly more delicate pre-existing pass to use: InnerLoopVectorizer.
For the case Add2Sum, the issue is fairly simple. Traverse over add instructions, and check whether if it is chained to a simple addition operation. That is:
res = a + b + c + d + e
will be transformed into ll by iterated additions, such as the form:
Such pass is highly applicable; compared to Loop2Sum. We can optimize multiple additions without loop structure, and concatenating only 3 operands results in a cost equivalence, where 4 operands results in cost reduction.
If we vectorize our loop using LoopVectorize, then apply Add2Sum, it will give an identical result from our first objective. I will try to implement the more applicable version: Add2Sum, then try to modify LoopVectorize.
During the offline meeting in 06/01, we led to a conclusion that
Loop2Sum
is overcomplex; we should divide the pass into some simple pieces. We propose a simpler pass structure:LoopVectorize
+Add2Sum
.For the case
LoopVectorize
, we may choose to modify our implementation, or use a pre-existing pass. Most of the code implemented in in loop2sum can be reused inLoopVectorize
, or we may use a slightly more delicate pre-existing pass to use:InnerLoopVectorizer
.For the case
Add2Sum
, the issue is fairly simple. Traverse over add instructions, and check whether if it is chained to a simple addition operation. That is:will be transformed into ll by iterated additions, such as the form:
which then can be optimized in the form:
Such pass is highly applicable; compared to
Loop2Sum
. We can optimize multiple additions without loop structure, and concatenating only 3 operands results in a cost equivalence, where 4 operands results in cost reduction.If we vectorize our loop using
LoopVectorize
, then applyAdd2Sum
, it will give an identical result from our first objective. I will try to implement the more applicable version:Add2Sum
, then try to modifyLoopVectorize
.