Tests:
bool test1(int a, int b) { return (a + b) == 0; }
bool test2(int a, int b) { return (a + b) != 0; }
Clang currently generates:
test1: // @test1
// BB#0: // %entry
neg w8, w1
cmp w8, w0
cset w0, eq
ret
test2: // @test2
// BB#0: // %entry
neg w8, w1
cmp w8, w0
cset w0, ne
ret
However, we should generate:
test1: // @test1
// BB#0: // %entry
cmn w0, w1
cset w0, eq
ret
test2: // @test2
// BB#0: // %entry
cmn w0, w1
cset w0, ne
ret
Reassociation will canonicalizes to these tests to 0-x == y and 0-x != y. And then for X86 a DAG combine will covert these back to x+y == 0 and x+y != 0, respectively.
Here's a patch for AArch64 that is a straight cut and paste from X86 to AArch64:
diff --git a/lib/Target/AArch64/AArch64ISelLowering.cpp b/lib/Target/AArch64/AArch64ISelLowering.cpp
index 083ca21..8286838 100644
--- a/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -545,6 +545,7 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
SDValue AArch64TargetLowering::PerformDAGCombine(SDNode N,
DAGCombinerInfo &DCI) const {
SelectionDAG &DAG = DCI.DAG;
@@ -10249,6 +10278,8 @@ SDValue AArch64TargetLowering::PerformDAGCombine(SDNode N,
break;
case ISD::STORE:
return performSTORECombine(N, DCI, DAG, Subtarget);
case ISD::SETCC:
return performSetCCCombine(N, DAG);
case AArch64ISD::BRCOND:
return performBRCONDCombine(N, DCI, DAG);
case AArch64ISD::TBNZ:
The above patch will fix the problem, but I'm not sure it's the best solution and it doesn't impact any of the workloads I care about. Therefore, I'm not interested in pursuing this further, but figured I'd file a bug..
Extended Description
Tests: bool test1(int a, int b) { return (a + b) == 0; } bool test2(int a, int b) { return (a + b) != 0; }
Clang currently generates: test1: // @test1 // BB#0: // %entry neg w8, w1 cmp w8, w0 cset w0, eq ret test2: // @test2 // BB#0: // %entry neg w8, w1 cmp w8, w0 cset w0, ne ret
However, we should generate: test1: // @test1 // BB#0: // %entry cmn w0, w1 cset w0, eq ret test2: // @test2 // BB#0: // %entry cmn w0, w1 cset w0, ne ret
Reassociation will canonicalizes to these tests to 0-x == y and 0-x != y. And then for X86 a DAG combine will covert these back to x+y == 0 and x+y != 0, respectively.
Here's a patch for AArch64 that is a straight cut and paste from X86 to AArch64: diff --git a/lib/Target/AArch64/AArch64ISelLowering.cpp b/lib/Target/AArch64/AArch64ISelLowering.cpp index 083ca21..8286838 100644 --- a/lib/Target/AArch64/AArch64ISelLowering.cpp +++ b/lib/Target/AArch64/AArch64ISelLowering.cpp @@ -545,6 +545,7 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setTargetDAGCombine(ISD::MUL);
@@ -10204,6 +10205,34 @@ static SDValue performNVCASTCombine(SDNode *N) { return SDValue(); }
+static SDValue performSetCCCombine(SDNode *N, SelectionDAG &DAG) {
SDValue AArch64TargetLowering::PerformDAGCombine(SDNode N, DAGCombinerInfo &DCI) const { SelectionDAG &DAG = DCI.DAG; @@ -10249,6 +10278,8 @@ SDValue AArch64TargetLowering::PerformDAGCombine(SDNode N, break; case ISD::STORE: return performSTORECombine(N, DCI, DAG, Subtarget);
The above patch will fix the problem, but I'm not sure it's the best solution and it doesn't impact any of the workloads I care about. Therefore, I'm not interested in pursuing this further, but figured I'd file a bug..