Open sharaelong opened 1 year ago
You can check our discord channel to download log-2
zip file.
By the way, I don't know why branchpredict
and arithmetic
shows exactly same results in benchmark, which is different from default pass solely.
Also default pass (simplifyCFG, promote) gives hugely negative performance to especially two test cases: gcd and collatz.
I didn't check collatz, but for GCD, a simple diff suggests that the negative performance comes from simplifycfg
canonicalizing branches. The result of diff gcd/src/gcd.ll <(opt --passes='simplifycfg' -S gcd/src/gcd.ll)
is...
10c10
< br i1 %cmp, label %if.then, label %if.end
---
> br i1 %cmp, label %return, label %if.end
12,14d11
< if.then: ; preds = %entry
< br label %return
<
17c14
< br i1 %cmp1, label %if.then2, label %if.end3
---
> br i1 %cmp1, label %return, label %if.end3
19,21d15
< if.then2: ; preds = %if.end
< br label %return
<
40,41c34,35
< return: ; preds = %if.end7, %if.then2, %if.then
< %retval.0 = phi i64 [ %y, %if.then ], [ %x, %if.then2 ], [ %call, %if.end7 ]
---
> return: ; preds = %if.end, %entry, %if.end7
> %retval.0 = phi i64 [ %call, %if.end7 ], [ %y, %entry ], [ %x, %if.end ]
This suggests that, if we get to implement BranchPredictPass
properly, we should be able to get all of the performance back.
I can’t follow what is happened here… It didn’t seem to be a reason that simplified branch gives lower performance than the naive one. Can you give more detailed explanation about why this harms performance?
My logic was that since the differences only occur in branching instructions, I was just assuming that maybe the truthy branch and the falsy branch got switched over? But yeah with a closer look it just seems to be eliminating a block, and I have no idea why that would be causing any trouble at all.
Hello, I made a scripts for benchmarking automatically. It will executes in
utils/
directory located in root. Usecompile.sh
->benchmark.c
order. Baseline data has to be located in same directory aslog-2
. (or change hardcoded name inbenchmark.c
if you want) This is very naive structure since it is not main changes for our project, but it will be advanced. Also it doesn't check output is consistent with baseline!Furthermore, I tested combinations of our sprint 1 pass one by one. When applying only llvm original pass (simplifyCFG, promote), it showed like this:
With default pass and load2aload pass applied,
With default pass and branchpredict pass applied,
For last, default pass and arithmetic pass results.
Finally this is our final result which apply every pass we implemented in sprint 1.
It seemed strange that optimization performance is really far from simple accumulate of each pass effect, but this appears to have a high likelihood of which load2aload pass 'eats' previously optimized IR. Also default pass (simplifyCFG, promote) gives hugely negative performance to especially two test cases: gcd and collatz. It has to be analyzed more.