Closed goranmoomin closed 1 year ago
Current status on the miscompilation: I'm running the benchmark against a version of the pass that doesn't consider costs; it convert all load
s into aload
s without cost consideration, and the resulting program is failing to run on the interpreter (with an out-of-bounds error). Alive2 isn't helping, calling the transformations valid. I've tried removing aload
s and just pushing load
s upwards, and they're not working as well.
The miscompilation is found from the ricehub.c
file; The runtime error occurs when the optimization pass pushes 10 loads upwards (pushing %14 = load i32, i32* @R, align 4
in front of: %i.1 = phi i32 [ 1, %if.end ], [ %inc60, %for.inc59 ]
). 9 pushes are fine, though I'm not sure if the other 9 optimizations are correct.
The runtime error is happening in the .if.end47
basic block, and the optimization that causes the error is in the .for.cond21
block. The .for.cond21
block is the for loop condition: i <= R - mid
, with the R
access optimized. The .if.end47
block is the line hab += X[j] - X[x];
, and I don't know yet what part is erroring out.
I think checking whether this instruction is from swpp class, and get expected cost will be useful to others. So I think it would be better to make it is another file. Moreover, to decrease your code length, it will be benificial to add it with other pull request with short code. And further improve it on additional branch.
To do this, I think we should slightly fix the code.
From line 58~60 of load2alod.cpp
"""
if (isa
"""
Also for this pull request, I think Instead of switch-case, using vector to save instruction with the same cost and check whether opcode of given instruction is inside the vector will decrease the length of the code. I wonder you have some update of your current status. Otherwise we can look for it tommorow.
One idea to reduce PR line-diff; instead of using lengthy switch-case arguments, you may use some "set"-type data structure to simplify it. Similarly, you can simplify the lengthy if-else structures. (I think such simple code change will not hurt comilation time limits.)
Just a slight concern about ~200 line-diff issues. You may use the residues for better uses.
Your code seems to raise errors due to invalid aload pushes; which you may also know.
I may have understood your code wrongly, so if the following suggestion may be invalid. The implementation checks whether its valid using the "canStoreMemory()" function. I expect the logic to be similar to the code in the pass which I'm in charge of: "mem2reg.cpp > PromoteMemoryToRegister.cpp > isAllocaPromotable()".
Since the function checks if the memory uses only loads and stores, not other lethal instructions, you may consider the logic used there.
In fact, your function checks for StoreInst, CallBase, IntrinsicInst, but the isAllocaPromotable() function checks for LoadInst, StoreInst, isVolatile, InstrinsicInst, BitCastInst, GetElementPtrInst, AddrSpaceCast. Check some where you may have misconsidered. Since *.c -> *.ll generates some superfluous but toxic instructions, some problems may have occured here.
Quick note: this probably have basic bugs. Seems like phi nodes have to be at the top of the BB.
Overview
This WIP PR adds an
Load2AloadPass
that converts existingload
instructions into anaload_i*
intrinsic, and pushes them up until a certain threshold.Checklist
Below is a listup of issues that needs to be resolved at the time of the current PR.