llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.04k stars 11.98k forks source link

Patterns to generate RAX1 and XAR on AArch64 #61584

Closed bwesterb closed 11 months ago

bwesterb commented 1 year ago

LLVM already generates BCAX and EOR3 from the ARMv8.2-SHA crypto extension set. It doesn't for RAX1 and XAR yet.

llvmbot commented 1 year ago

@llvm/issue-subscribers-backend-aarch64

jedisct1 commented 1 year ago

That would indeed provide a significant speedup of some cryptographic primitives on AArch64.

llvmbot commented 1 year ago

@llvm/issue-subscribers-good-first-issue

BK1603 commented 1 year ago

Hi! I'd like to work on this issue.

From what I understand from the shared example:

We generate a bcax instr for st[i * 5 + j] = t[j] ^ (~t[(j + 1) % 5] & t[(j + 2) % 5]);

And for st[j * 5 + i] ^= t[(i + 4) % 5] ^ math.rotl(@Vector(l, T), t[(i + 1) % 5], 1) We generate:

        ushr.2d v25, v12, #63
        shl.2d  v31, v12, #1
        orr.16b v31, v31, v25
        eor3.16b        v25, v10, v31, v5
        eor3.16b        v14, v10, v31, v20
        eor3.16b        v20, v10, v31, v21
        eor3.16b        v21, v10, v31, v22
        eor3.16b        v24, v10, v31, v23

This uses an unsigned shift right and a shift left to first perform the rotation, and then performs a 3 way xor to get the results. Instead of this, we want to see something like:

        rax1.2d        v31, v12, v10
        eor.16b        v25, v31, v5
        eor.16b        v14, v31, v20
        eor.16b        v20, v31, v21
        eor.16b        v21, v31, v22
        eor.16b        v24, v31, v23

Does this look right?

davemgreen commented 1 year ago

These should match the instructions in isolation: https://godbolt.org/z/WMv8csvo4 I believe the rax1 matches what you are saying.

See https://reviews.llvm.org/D120112 and https://reviews.llvm.org/D108793 for where the existing eor3 and bcax instructions were added.

bwesterb commented 1 year ago

Does this look right?

@BK1603 Yeah. For reference, I expect roughly something like this in the end.

BK1603 commented 1 year ago

Thanks @davemgreen and @bwesterb

I'll start working on this. Will raise more questions here in case I get stuck.

davemgreen commented 1 year ago

It sounds like you are on the right track. If you take a test file like:

define <2 x i64> @rax1(<2 x i64> %x, <2 x i64> %y) {
    %a = call <2 x i64> @llvm.fshl.v2i64(<2 x i64> %y, <2 x i64> %y, <2 x i64> <i64 1, i64 1>)
    %b = xor <2 x i64> %x, %a
    ret <2 x i64> %b
}

declare <2 x i64> @llvm.fshl.v2i64(<2 x i64>, <2 x i64>, <2 x i64>)

And run it like this:

bin/llc -mtriple aarch64 -mattr=+sha3 test.ll -debug

You should see debug output that includes:

Optimized legalized selection DAG: %bb.0 'rax1:'
SelectionDAG has 14 nodes:
  t0: ch,glue = EntryToken
  t4: v2i64,ch = CopyFromReg t0, Register:v2i64 %1
      t2: v2i64,ch = CopyFromReg t0, Register:v2i64 %0
        t22: v2i64 = AArch64ISD::VSHL t4, Constant:i32<1>
        t24: v2i64 = AArch64ISD::VLSHR t4, Constant:i32<63>
      t20: v2i64 = or t22, t24
    t8: v2i64 = xor t2, t20
  t10: ch,glue = CopyToReg t0, Register:v2i64 $q0, t8
  t11: ch = AArch64ISD::RET_GLUE t10, Register:v2i64 $q0, t10:1

Those are the nodes that will need to be matched, some of which will be AArch64 specific nodes.

BK1603 commented 1 year ago

Hi @davemgreen

Thanks a lot for the help! The debug output from llc was immensely helpful. I have made a commit and pushed to fabricator here:

https://reviews.llvm.org/D147887

I had a question, we chose to use the arm instructions vshl vlshr for pattern matching. Although in the debug output, I saw that we did create slr and shl nodes as well. Is there a reason as to why we did not use them as inputs for pattern matching?

Something like:

def : Pat <(xor (or ((slr (vi264 V128:$Vm), (i32 63)), (shl (v2i64 V128:$Vm), (i32 1))), (v2164 V128:$Vn)),
                (RAX1 (v2i64 V128:$Vm), (v2i64 V128:$Vn)>
BK1603 commented 1 year ago

Hi @davemgreen

I fixed the operand order in my commit. Also I seem to be stuck again. For XAR we want to be able to rotate the element by any variable value. For which the dag nodes generated are as follows:

Optimized legalized selection DAG: %bb.0 'xar_54:'
SelectionDAG has 14 nodes:
  t0: ch,glue = EntryToken
    t2: v2i64,ch = CopyFromReg t0, Register:v2i64 %0
    t4: v2i64,ch = CopyFromReg t0, Register:v2i64 %1
  t5: v2i64 = xor t2, t4
      t24: v2i64 = AArch64ISD::VSHL t5, Constant:i32<10>
      t26: v2i64 = AArch64ISD::VLSHR t5, Constant:i32<54>
    t22: v2i64 = or t24, t26
  t10: ch,glue = CopyToReg t0, Register:v2i64 $q0, t22
  t11: ch = AArch64ISD::RET_GLUE t10, Register:v2i64 $q0, t10:1

I was trying something like:

def SubtractFrom64 : SDNodeXForm<timm, [{ return 64 - N->getZextValue(); }]>;

def : Pat<(or (Aarch64vshl (xor (v2i64 V128:$Vn), (v2i64 V128:$Vm)), (timm0_63:$imm)), (AArch64vlshr (xor (v2i64 V128:$Vn), (v2i64 V128:$Vm)), (SubtractFrom64 timm0_63:$imm)))>;

But ran into an error saying I can not use SubtractFrom64 as an input pattern (why is that so?). Could you please tell me if there are any instructions with similar requirements that I could learn from? Are there no ways to look for immediates that follow a constraint in tablegen?

I asked around at discord and was told that I won't be able to get a pattern to match for this. I'd have to look at both immediates and ensure that they add upto 64. And that this would be possible by altering the switch statement in AArch64DAGToDAGISel::Select

davemgreen commented 1 year ago

Yeah that might be correct. It may be possible to add a complex pattern that detects the or(AArch64ISD::VSHL, AArch64ISD::VLSHR) with matching shift amounts, but at that point you might as well do the matching in AArch64DAGToDAGISel.

BK1603 commented 1 year ago

Hi @davemgreen I was trying something like this:

bool AArch64DAGToDAGISel::SelectXAR(SDNode *N) {
  assert(N->getOpcode() == ISD::OR && "Expected OR instruction");

  SDNode *N0 = cast<SDNode>(N->getOperand(0));
  SDNode *N1 = cast<SDNode>(N->getOperand(1));

  if (!(N0->getOpcode() == AArch64ISD::VSHL) || !(N1->getOpcode() == AArch64ISD::VLSHR))
    return false;

  if (!(N0->getOperand(0)->getOpcode() == ISD::XOR) || !(N1->getOperand(0)->getOpcode() == ISD::XOR))
    return false;

  SDNode *XOR = cast<SDNode>(N0->getOperand(0));
  SDValue R1 = XOR->getOperand(0);
  SDValue R2 = XOR->getOperand(1);
  SDValue imm = N1->getOperand(1);

  unsigned HsAmt = cast<ConstantSDNode>(N0->getOperand(1))->getZExtValue();
  unsigned ShAmt = cast<ConstantSDNode>(N1->getOperand(1))->getZExtValue();

  if (!(HsAmt + ShAmt == 64))
    return false;

  SDValue Ops[] = { R1, R2, imm };
  CurDAG->SelectNodeTo(N, AArch64::XAR, MVT::i64, Ops);

  return false;
}

(I know there are a lot of rough edges to this, please let me know things that I can fix that you can spot right away.)

and then in the Select function if we can use the following condition check:

  case ISD::OR:
    if (tryBitfieldInsertOp(Node))
      return;
    // Try matching XAR instruction here
    if (SelectXAR(Node))
      return;
    break;

But this leads to the following assert failing in SelectCodeCommon

 `!NodeToMatch->isMachineOpcode() && "Node already selected!"' 

I think it is happening becuase I used AArch64::XAR opcode. I tried using AArch64ISD::XAR but there is no such value present in the enum. Can you please guide me at how I should proceed with this?

BK1603 commented 1 year ago

Found the issue, return should be true, I'll make a patch and send it for review.

BK1603 commented 1 year ago

Hi @davemgreen Made a patch for this here: https://reviews.llvm.org/D150076

Please review and let me know, thanks!

jedisct1 commented 1 year ago

Any plans on picking this up?

BK1603 commented 11 months ago

Sorry for the inactivity on the patch, @davemgreen I have pushed another patch resolving your comments. Can you please review? https://reviews.llvm.org/D150076

BK1603 commented 11 months ago

Hi @davemgreen Updated the patch again, notifying here as well in case phabricator notifications do not work post workflow change.

Edit: If everything looks good can you please merge the change as well? I don't have the permissions for a merge. Can use the following for sign off: Shreyansh Chouhan <chouhan.shreyansh2702@gmail.com>