Open llvmbot opened 7 years ago
I'm not sure how to implement this. This assembly
test: // @test
// BB#0: // %entry
mul x8, x1, x0
lsr x8, x8, #​58
ldr w0, [x2, x8, lsl#2]
ret
I get when I skip fold (shl (srl x, c1), c2) -> (and (srl x, (sub c1, c2), MASK). I'm not sure what opt-addr-mode means. As I understood I need to add an if statement with canFoldInAddressingMode before fold to check add mode. Please, correct me.
Here's the test in IR:
define i32 @test(i64 %a, i64 %b, i32* nocapture readonly %table) {
entry:
%mul = mul i64 %b, %a
%shr = lshr i64 %mul, 58
%arrayidx = getelementptr inbounds i32, i32* %table, i64 %shr
%0 = load i32, i32* %arrayidx, align 4
ret i32 %0
}
And the initial DAG:
Initial selection DAG: BB#0 'test:entry'
SelectionDAG has 19 nodes:
t0: ch = EntryToken
t13: i64 = Constant<0>
t6: i64,ch = CopyFromReg t0, Register:i64 %vreg2
t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
t7: i64 = mul t4, t2
t9: i64 = srl t7, Constant:i64<58>
t11: i64 = shl t9, Constant:i64<2>
t12: i64 = add t6, t11
t15: i32,ch = load<LD4[%arrayidx]> t0, t12, undef:i64
t17: ch,glue = CopyToReg t0, Register:i32 %W0, t15
t18: ch = AArch64ISD::RET_FLAG t17, Register:i32 %W0, t17:1
In DAGCombiner.cpp around line 5590 we have this target-independent combine which is folding the 'srl' and introducing the 'and'.
// fold (shl (srl x, c1), c2) -> (and (shl x, (sub c2, c1), MASK) or
// (and (srl x, (sub c1, c2), MASK)
// Only fold this if the inner shift has no other uses -- if it does, folding
// this will increase the total number of instructions.
@llvm/issue-subscribers-good-first-issue
is this still open to work on ?
Still seems to be a problem in trunk: https://simd.godbolt.org/z/595f991fP
can this issue be resolved by disabling optimization fold (shl (srl x, c1), c2) -> (and (shl x, (sub c2, c1), MASK)
if code size-related option enabled?
Extended Description
Test case:
Current assembly:
We should be able to remove the 'and' for a code size with as follows:
I'm not interested in pursuing this optimization, but I figured I'd file the bug, regardless.