llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.7k stars 11.87k forks source link

[ppc] Slow code for rotate shift and add #32164

Open echristo opened 7 years ago

echristo commented 7 years ago
Bugzilla Link 32817
Version trunk
OS All
CC @weiguozhi,@hfinkel,@nemanjai

Extended Description

Seems to go about 25% slower than the gcc equivalent.

echristo@athyra ~/tmp> cat bar.c unsigned int bar(int k, unsigned int x) { return ((1 + (x >> k)) + k); } echristo@athyra ~/tmp> clang -target powerpc64le-linux-gnu -S -O2 -o - bar.c .text .abiversion 2 .file "bar.c" .globl bar .p2align 4 .type bar,@function bar: # @​bar .Lfunc_begin0:

BB#0:

srw 4, 4, 3
add 3, 3, 4
addi 3, 3, 1
clrldi   3, 3, 32
blr
.long   0
.quad   0

.Lfunc_end0: .size bar, .Lfunc_end0-.Lfunc_begin0

.ident  "clang version google3-trunk (trunk r301047)"
.section    ".note.GNU-stack","",@progbits

echristo@athyra ~/tmp> powerpc64le-linux-gnu-gcc -S -O2 -o - bar.c .file "bar.c" .abiversion 2 .section ".toc","aw" .section ".text" .machine power8 .align 2 .p2align 4,,15 .globl bar .type bar, @​function bar: addi 9,3,1 srw 4,4,3 add 3,9,4 rldicl 3,3,0,32 blr .long 0 .byte 0,0,0,0,0,0,0,0 .size bar,.-bar .ident "GCC: (Google_crosstoolv18-gcc-4.9.x-powerpc64le-grtev4-linux-gnu) 4.9.x-google" .section .note.GNU-stack,"",@progbits

nemanjai commented 7 years ago

Anyone have an opinion on which approach we want to pursue to solve this issue?

nemanjai commented 7 years ago

I see, your patch specifically test following case ((k + 1) + (x >> k))

It looks too constrained. If I have following DAG

((x >> k) + k) + 1

Or a little different expression

((m + 1) + (x >> k))

LLVM still can't handle them.

Fair enough, the last one still has the problem. Of course, completely disabling this reassociation in the SDAG makes the problem go away on all of them.

So I guess we should decide how to proceed with this. I don't think the MachineCombiner has the infrastructure to easily undo this. We have a few options:

weiguozhi commented 7 years ago

I see, your patch specifically test following case ((k + 1) + (x >> k))

It looks too constrained. If I have following DAG

((x >> k) + k) + 1

Or a little different expression

((m + 1) + (x >> k))

LLVM still can't handle them.

nemanjai commented 7 years ago

OK, I called it "path length" you call it "dependence chain". They're the same thing aren't they? With that DAG combine, you can't do the shift and the add-immediate in parallel. Disabling the DAG combine allows that to happen. The code LLVM emits with that patch is:

srw 4, 4, 3 addi 3, 3, 1 add 3, 3, 4 clrldi 3, 3, 32

weiguozhi commented 7 years ago

Actually MachineCombiner may have a very hard time with something like this due to a couple of reasons:

  • We only want to run it at -O3
  • Different opcodes

It is actually reassociation on the SDAG (in the DAGCombiner) that messes this up to begin with. It is done without regard to whether it will increase path length.

I don't really see how this reduction in ILP is a gain on any processor. This patch fixes this particular issue and if you think it is worth posting on Phabricator, I can certainly do so (with this test case and possibly some for the other affected nodes):

Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp

--- lib/CodeGen/SelectionDAG/DAGCombiner.cpp (revision 304396) +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp (working copy) @@ -880,7 +880,9 @@ SDValue DAGCombiner::ReassociateOps(unsigned Opc, return DAG.getNode(Opc, DL, VT, N0.getOperand(0), OpNode); return SDValue(); }

  • if (N0.hasOneUse()) {
  • // Reassociation will increase the path length if N0 and N1 use the same
  • // value. Don't reassociate in this case.
  • if (N0.hasOneUse() && !N0.getOperand(0).isOperandOf(N1.getNode())) { // reassoc. (op (op x, c1), y) -> (op (op x, y), c1) iff x+c1 has one // use SDValue OpNode = DAG.getNode(Opc, SDLoc(N0), VT, N0.getOperand(0), N1); @@ -900,7 +902,9 @@ SDValue DAGCombiner::ReassociateOps(unsigned Opc, return DAG.getNode(Opc, DL, VT, N1.getOperand(0), OpNode); return SDValue(); }
  • if (N1.hasOneUse()) {
  • // Reassociation will increase the path length if N0 and N1 use the same
  • // value. Don't reassociate in this case.
  • if (N1.hasOneUse() && !N1.getOperand(0).isOperandOf(N0.getNode())) { // reassoc. (op x, (op y, c1)) -> (op (op x, y), c1) iff x+c1 has one // use SDValue OpNode = DAG.getNode(Opc, SDLoc(N0), VT, N0, N1.getOperand(0));

P.S. The patch does not cause any failures on a double bootstrap build.

The gcc version is faster is not because of one of (op x, (op y, c1)) and (op (op x, y), c1) is faster than another one, the reason is it has shorter dependence chain.

nemanjai commented 7 years ago

Actually MachineCombiner may have a very hard time with something like this due to a couple of reasons:

It is actually reassociation on the SDAG (in the DAGCombiner) that messes this up to begin with. It is done without regard to whether it will increase path length.

I don't really see how this reduction in ILP is a gain on any processor. This patch fixes this particular issue and if you think it is worth posting on Phabricator, I can certainly do so (with this test case and possibly some for the other affected nodes):

Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp

--- lib/CodeGen/SelectionDAG/DAGCombiner.cpp (revision 304396) +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp (working copy) @@ -880,7 +880,9 @@ SDValue DAGCombiner::ReassociateOps(unsigned Opc, return DAG.getNode(Opc, DL, VT, N0.getOperand(0), OpNode); return SDValue(); }

P.S. The patch does not cause any failures on a double bootstrap build.

nemanjai commented 7 years ago

We can certainly turn on reassociation to try to address this problem. But reassociation currently only considers instructions that have the same opcode. We'd need to make it understand the relation between reg/reg and reg/imm forms - for example PPC::ADD4/PPC::ADDI.

weiguozhi commented 7 years ago

Feel free to take it.

llvmbot commented 7 years ago

Carrot, are you working on this? If so, can you please assign it to yourself? If not, someone from our team can take a look at it. Thanks.

weiguozhi commented 7 years ago

It is expression reassociation problem. But in PPC backend:

1 PPCInstrInfo::getMachineCombinerPatterns enables it with -O3 only.

2 PPCInstrInfo::isAssociativeAndCommutative enables it for a very limited number of floating point operations.