llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
25.95k stars 10.59k forks source link

Riscv branch peephole opt #90451

Open zengdage opened 2 weeks ago

zengdage commented 2 weeks ago

After block-placement and machine-cp, the following situations may exist and require optimization. I don't know how to do this optimization, so I try to add a new pass and run it after machine-cp. Maybe this is not the right way to do it.

    bne a0, a0, %bb.2  ->  remove it
    blt a0, a0, %bb.2  ->  remove it
    bltu a0, a0, %bb.2 ->  remove it

    beq a0, a0, %bb.2  ->  jal x0, %bb.2
    bge a0, a0, %bb.2  ->  jal x0, %bb.2
    bgeu a0, a0, %bb.2 ->  jal x0, %bb.2
llvmbot commented 2 weeks ago

@llvm/pr-subscribers-backend-risc-v

Author: Zhijin Zeng (zengdage)

Changes After `block-placement` and `machine-cp`, the following situations may exist and require optimization. I don't know how to do this optimization, so I try to add a new pass and run it after ` machine-cp`. Maybe this is not the right way to do it. ``` bne a0, a0, %bb.2 -> remove it blt a0, a0, %bb.2 -> remove it bltu a0, a0, %bb.2 -> remove it beq a0, a0, %bb.2 -> jal x0, %bb.2 bge a0, a0, %bb.2 -> jal x0, %bb.2 bgeu a0, a0, %bb.2 -> jal x0, %bb.2 ``` --- Full diff: https://github.com/llvm/llvm-project/pull/90451.diff 6 Files Affected: - (modified) llvm/lib/Target/RISCV/CMakeLists.txt (+1) - (modified) llvm/lib/Target/RISCV/RISCV.h (+3) - (added) llvm/lib/Target/RISCV/RISCVMIPeepholeOpt.cpp (+121) - (modified) llvm/lib/Target/RISCV/RISCVTargetMachine.cpp (+2) - (modified) llvm/test/CodeGen/RISCV/O3-pipeline.ll (+1) - (added) llvm/test/CodeGen/RISCV/peephole-branch.mir (+464) ``````````diff diff --git a/llvm/lib/Target/RISCV/CMakeLists.txt b/llvm/lib/Target/RISCV/CMakeLists.txt index 8715403f3839a6..429a22a5323795 100644 --- a/llvm/lib/Target/RISCV/CMakeLists.txt +++ b/llvm/lib/Target/RISCV/CMakeLists.txt @@ -45,6 +45,7 @@ add_llvm_target(RISCVCodeGen RISCVISelLowering.cpp RISCVMachineFunctionInfo.cpp RISCVMergeBaseOffset.cpp + RISCVMIPeepholeOpt.cpp RISCVOptWInstrs.cpp RISCVPostRAExpandPseudoInsts.cpp RISCVRedundantCopyElimination.cpp diff --git a/llvm/lib/Target/RISCV/RISCV.h b/llvm/lib/Target/RISCV/RISCV.h index d405395dcf9ec4..d808eca6169662 100644 --- a/llvm/lib/Target/RISCV/RISCV.h +++ b/llvm/lib/Target/RISCV/RISCV.h @@ -78,6 +78,9 @@ void initializeRISCVRedundantCopyEliminationPass(PassRegistry &); FunctionPass *createRISCVMoveMergePass(); void initializeRISCVMoveMergePass(PassRegistry &); +FunctionPass *createRISCVMIPeepholeOptPass(); +void initializeRISCVMIPeepholeOptPass(PassRegistry &); + FunctionPass *createRISCVPushPopOptimizationPass(); void initializeRISCVPushPopOptPass(PassRegistry &); diff --git a/llvm/lib/Target/RISCV/RISCVMIPeepholeOpt.cpp b/llvm/lib/Target/RISCV/RISCVMIPeepholeOpt.cpp new file mode 100644 index 00000000000000..e7d8328985ceda --- /dev/null +++ b/llvm/lib/Target/RISCV/RISCVMIPeepholeOpt.cpp @@ -0,0 +1,121 @@ +//===- RISCVMIPeepholeOpt.cpp - RISC-V MI peephole optimization pass ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This pass performs below peephole optimizations on MIR level. +// +// 1. Remove redundant branch instrunctions which may be generated by +// block-placement. +// +// bne a1, a1, %bb.1 +// blt a1, a1, %bb.1 +// bltu a1, a1, %bb.1 +// +// These instruction is bound to fallthrough to next basic block, rather +// than into the %bb.1, so it should be removed here. +// +// 2. beq a1, a1, %bb.1 -> j %bb.1 +// bge a1, a1, %bb.1 -> j %bb.1 +// bgeu a1, a1, %bb.1 -> j %bb.1 +// +// These instruction is bound to go into %bb.1, so it should be replaced by +// j pseudo instruction. +// +//===----------------------------------------------------------------------===// + +#include "RISCVInstrInfo.h" +#include "llvm/ADT/Statistic.h" +#include "llvm/CodeGen/LiveIntervals.h" +#include "llvm/CodeGen/MachineFunctionPass.h" +#include "llvm/CodeGen/MachineInstrBuilder.h" +#include "llvm/CodeGen/MachineRegisterInfo.h" +#include "llvm/Support/Debug.h" + +using namespace llvm; + +#define DEBUG_TYPE "riscv-mi-peephole-opt" + +namespace { + +struct RISCVMIPeepholeOpt : public MachineFunctionPass { + static char ID; + + RISCVMIPeepholeOpt() : MachineFunctionPass(ID) { + initializeRISCVMIPeepholeOptPass(*PassRegistry::getPassRegistry()); + } + + const RISCVInstrInfo *TII; + + bool visitBranch(MachineInstr &MI); + bool runOnMachineFunction(MachineFunction &MF) override; + + StringRef getPassName() const override { + return "RISC-V MI Peephole Optimization pass"; + } +}; + +char RISCVMIPeepholeOpt::ID = 0; + +} // end anonymous namespace + +INITIALIZE_PASS(RISCVMIPeepholeOpt, "riscv-mi-peephole-opt", + "RISC-V MI Peephole Optimization", false, false) + +bool RISCVMIPeepholeOpt::visitBranch(MachineInstr &MI) { + Register FirstReg = MI.getOperand(0).getReg(); + Register SecondReg = MI.getOperand(1).getReg(); + + if (FirstReg != SecondReg) + return false; + + switch (MI.getOpcode()) { + default: + break; + case RISCV::BEQ: + case RISCV::BGE: + case RISCV::BGEU: + BuildMI(*MI.getParent(), &MI, MI.getDebugLoc(), TII->get(RISCV::JAL)) + .addReg(RISCV::X0) + .add(MI.getOperand(2)); + break; + } + MI.eraseFromParent(); + + return true; +} + +bool RISCVMIPeepholeOpt::runOnMachineFunction(MachineFunction &MF) { + if (skipFunction(MF.getFunction())) + return false; + + TII = static_cast(MF.getSubtarget().getInstrInfo()); + + bool Changed = false; + + for (MachineBasicBlock &MBB : MF) { + for (MachineInstr &MI : make_early_inc_range(MBB)) { + switch (MI.getOpcode()) { + default: + break; + case RISCV::BNE: + case RISCV::BEQ: + case RISCV::BLT: + case RISCV::BLTU: + case RISCV::BGE: + case RISCV::BGEU: + Changed |= visitBranch(MI); + break; + } + } + } + + return Changed; +} + +FunctionPass *llvm::createRISCVMIPeepholeOptPass() { + return new RISCVMIPeepholeOpt(); +} diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp index 0876f46728a10c..1af33aac601698 100644 --- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp +++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp @@ -111,6 +111,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeRISCVTarget() { initializeRISCVCodeGenPreparePass(*PR); initializeRISCVPostRAExpandPseudoPass(*PR); initializeRISCVMergeBaseOffsetOptPass(*PR); + initializeRISCVMIPeepholeOptPass(*PR); initializeRISCVOptWInstrsPass(*PR); initializeRISCVPreRAExpandPseudoPass(*PR); initializeRISCVExpandPseudoPass(*PR); @@ -501,6 +502,7 @@ void RISCVPassConfig::addPreEmitPass() { void RISCVPassConfig::addPreEmitPass2() { if (TM->getOptLevel() != CodeGenOptLevel::None) { addPass(createRISCVMoveMergePass()); + addPass(createRISCVMIPeepholeOptPass()); // Schedule PushPop Optimization before expansion of Pseudo instruction, // ensuring return instruction is detected correctly. addPass(createRISCVPushPopOptimizationPass()); diff --git a/llvm/test/CodeGen/RISCV/O3-pipeline.ll b/llvm/test/CodeGen/RISCV/O3-pipeline.ll index 4121d111091117..731d1aea8c14a0 100644 --- a/llvm/test/CodeGen/RISCV/O3-pipeline.ll +++ b/llvm/test/CodeGen/RISCV/O3-pipeline.ll @@ -195,6 +195,7 @@ ; CHECK-NEXT: Machine Optimization Remark Emitter ; CHECK-NEXT: Stack Frame Layout Analysis ; CHECK-NEXT: RISC-V Zcmp move merging pass +; CHECK-NEXT: RISC-V MI Peephole Optimization pass ; CHECK-NEXT: RISC-V Zcmp Push/Pop optimization pass ; CHECK-NEXT: RISC-V pseudo instruction expansion pass ; CHECK-NEXT: RISC-V atomic pseudo instruction expansion pass diff --git a/llvm/test/CodeGen/RISCV/peephole-branch.mir b/llvm/test/CodeGen/RISCV/peephole-branch.mir new file mode 100644 index 00000000000000..4c5d087269be39 --- /dev/null +++ b/llvm/test/CodeGen/RISCV/peephole-branch.mir @@ -0,0 +1,464 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4 +# RUN: llc -o - %s -mtriple=riscv32 -run-pass=riscv-mi-peephole-opt | FileCheck %s + +--- | + define void @peephole_bne(ptr %a, ptr %b, ptr %c) { + entry: + %e = load i32, ptr %a, align 4 + %p = icmp ne i32 %e, %e + br i1 %p, label %block1, label %block2 + + block1: ; preds = %entry + store i32 %e, ptr %b, align 4 + br label %end_block + + block2: ; preds = %entry + store i32 87, ptr %c, align 4 + br label %end_block + + end_block: ; preds = %block2, %block1 + ret void + } + + define void @peephole_beq(ptr %a, ptr %b, ptr %c) { + entry: + %e = load i32, ptr %a, align 4 + %p = icmp eq i32 %e, %e + br i1 %p, label %block1, label %block2 + + block1: ; preds = %entry + store i32 %e, ptr %b, align 4 + br label %end_block + + block2: ; preds = %entry + store i32 87, ptr %c, align 4 + br label %end_block + + end_block: ; preds = %block2, %block1 + ret void + } + + define void @peephole_blt(ptr %a, ptr %b, ptr %c) { + entry: + %e = load i32, ptr %a, align 4 + %p = icmp slt i32 %e, %e + br i1 %p, label %block1, label %block2 + + block1: ; preds = %entry + store i32 %e, ptr %b, align 4 + br label %end_block + + block2: ; preds = %entry + store i32 87, ptr %c, align 4 + br label %end_block + + end_block: ; preds = %block2, %block1 + ret void + } + + define void @peephole_bltu(ptr %a, ptr %b, ptr %c) { + entry: + %e = load i32, ptr %a, align 4 + %p = icmp ult i32 %e, %e + br i1 %p, label %block1, label %block2 + + block1: ; preds = %entry + store i32 %e, ptr %b, align 4 + br label %end_block + + block2: ; preds = %entry + store i32 87, ptr %c, align 4 + br label %end_block + + end_block: ; preds = %block2, %block1 + ret void + } + + define void @peephole_bge(ptr %a, ptr %b, ptr %c) { + entry: + %e = load i32, ptr %a, align 4 + %p = icmp sge i32 %e, %e + br i1 %p, label %block1, label %block2 + + block1: ; preds = %entry + store i32 %e, ptr %b, align 4 + br label %end_block + + block2: ; preds = %entry + store i32 87, ptr %c, align 4 + br label %end_block + + end_block: ; preds = %block2, %block1 + ret void + } + + define void @peephole_bgeu(ptr %a, ptr %b, ptr %c) { + entry: + %e = load i32, ptr %a, align 4 + %p = icmp uge i32 %e, %e + br i1 %p, label %block1, label %block2 + + block1: ; preds = %entry + store i32 %e, ptr %b, align 4 + br label %end_block + + block2: ; preds = %entry + store i32 87, ptr %c, align 4 + br label %end_block + + end_block: ; preds = %block2, %block1 + ret void + } + +... +--- +name: peephole_bne +alignment: 4 +tracksRegLiveness: true +liveins: + - { reg: '$x10' } + - { reg: '$x11' } + - { reg: '$x12' } +body: | + ; CHECK-LABEL: name: peephole_bne + ; CHECK: bb.0.entry: + ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000) + ; CHECK-NEXT: liveins: $x10, $x11, $x12 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: renamable $x13 = ADDI $x0, 1 + ; CHECK-NEXT: PseudoBR %bb.1 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: bb.1.block1: + ; CHECK-NEXT: successors: %bb.3(0x80000000) + ; CHECK-NEXT: liveins: $x10, $x11 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: renamable $x10 = LW killed renamable $x10, 0 + ; CHECK-NEXT: SW killed renamable $x10, killed renamable $x11, 0 + ; CHECK-NEXT: PseudoBR %bb.3 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: bb.2.block2: + ; CHECK-NEXT: successors: %bb.3(0x80000000) + ; CHECK-NEXT: liveins: $x12 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: renamable $x10 = ADDI $x0, 87 + ; CHECK-NEXT: SW killed renamable $x10, killed renamable $x12, 0 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: bb.3.end_block: + ; CHECK-NEXT: PseudoRET + bb.0.entry: + successors: %bb.1, %bb.2 + liveins: $x10, $x11, $x12 + + renamable $x13 = ADDI $x0, 1 + BNE killed renamable $x13, $x13, %bb.2 + PseudoBR %bb.1 + + bb.1.block1: + liveins: $x10, $x11 + + renamable $x10 = LW killed renamable $x10, 0 + SW killed renamable $x10, killed renamable $x11, 0 + PseudoBR %bb.3 + + bb.2.block2: + liveins: $x12 + + renamable $x10 = ADDI $x0, 87 + SW killed renamable $x10, killed renamable $x12, 0 + + bb.3.end_block: + PseudoRET + +... +--- +name: peephole_beq +alignment: 4 +tracksRegLiveness: true +liveins: + - { reg: '$x10' } + - { reg: '$x11' } + - { reg: '$x12' } +body: | + ; CHECK-LABEL: name: peephole_beq + ; CHECK: bb.0.entry: + ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000) + ; CHECK-NEXT: liveins: $x10, $x11, $x12 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: JAL $x0, %bb.2 + ; CHECK-NEXT: PseudoBR %bb.1 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: bb.1.block1: + ; CHECK-NEXT: successors: %bb.3(0x80000000) + ; CHECK-NEXT: liveins: $x10, $x11 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: renamable $x10 = LW killed renamable $x10, 0 + ; CHECK-NEXT: SW killed renamable $x10, killed renamable $x11, 0 + ; CHECK-NEXT: PseudoBR %bb.3 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: bb.2.block2: + ; CHECK-NEXT: successors: %bb.3(0x80000000) + ; CHECK-NEXT: liveins: $x12 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: renamable $x10 = ADDI $x0, 87 + ; CHECK-NEXT: SW killed renamable $x10, killed renamable $x12, 0 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: bb.3.end_block: + ; CHECK-NEXT: PseudoRET + bb.0.entry: + successors: %bb.1, %bb.2 + liveins: $x10, $x11, $x12 + + BEQ $x0, $x0, %bb.2 + PseudoBR %bb.1 + + bb.1.block1: + liveins: $x10, $x11 + + renamable $x10 = LW killed renamable $x10, 0 + SW killed renamable $x10, killed renamable $x11, 0 + PseudoBR %bb.3 + + bb.2.block2: + liveins: $x12 + + renamable $x10 = ADDI $x0, 87 + SW killed renamable $x10, killed renamable $x12, 0 + + bb.3.end_block: + PseudoRET + +... +--- +name: peephole_blt +alignment: 4 +tracksRegLiveness: true +liveins: + - { reg: '$x10' } + - { reg: '$x11' } + - { reg: '$x12' } +body: | + ; CHECK-LABEL: name: peephole_blt + ; CHECK: bb.0.entry: + ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000) + ; CHECK-NEXT: liveins: $x10, $x11, $x12 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: renamable $x13 = ADDI $x0, 1 + ; CHECK-NEXT: PseudoBR %bb.1 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: bb.1.block1: + ; CHECK-NEXT: successors: %bb.3(0x80000000) + ; CHECK-NEXT: liveins: $x10, $x11 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: renamable $x10 = LW killed renamable $x10, 0 + ; CHECK-NEXT: SW killed renamable $x10, killed renamable $x11, 0 + ; CHECK-NEXT: PseudoBR %bb.3 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: bb.2.block2: + ; CHECK-NEXT: successors: %bb.3(0x80000000) + ; CHECK-NEXT: liveins: $x12 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: renamable $x10 = ADDI $x0, 87 + ; CHECK-NEXT: SW killed renamable $x10, killed renamable $x12, 0 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: bb.3.end_block: + ; CHECK-NEXT: PseudoRET + bb.0.entry: + successors: %bb.1, %bb.2 + liveins: $x10, $x11, $x12 + + renamable $x13 = ADDI $x0, 1 + BLT killed renamable $x13, $x13, %bb.2 + PseudoBR %bb.1 + + bb.1.block1: + liveins: $x10, $x11 + + renamable $x10 = LW killed renamable $x10, 0 + SW killed renamable $x10, killed renamable $x11, 0 + PseudoBR %bb.3 + + bb.2.block2: + liveins: $x12 + + renamable $x10 = ADDI $x0, 87 + SW killed renamable $x10, killed renamable $x12, 0 + + bb.3.end_block: + PseudoRET + +... +--- +name: peephole_bltu +alignment: 4 +tracksRegLiveness: true +liveins: + - { reg: '$x10' } + - { reg: '$x11' } + - { reg: '$x12' } +body: | + ; CHECK-LABEL: name: peephole_bltu + ; CHECK: bb.0.entry: + ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000) + ; CHECK-NEXT: liveins: $x10, $x11, $x12 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: renamable $x13 = ADDI $x0, 1 + ; CHECK-NEXT: PseudoBR %bb.1 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: bb.1.block1: + ; CHECK-NEXT: successors: %bb.3(0x80000000) + ; CHECK-NEXT: liveins: $x10, $x11 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: renamable $x10 = LW killed renamable $x10, 0 + ; CHECK-NEXT: SW killed renamable $x10, killed renamable $x11, 0 + ; CHECK-NEXT: PseudoBR %bb.3 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: bb.2.block2: + ; CHECK-NEXT: successors: %bb.3(0x80000000) + ; CHECK-NEXT: liveins: $x12 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: renamable $x10 = ADDI $x0, 87 + ; CHECK-NEXT: SW killed renamable $x10, killed renamable $x12, 0 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: bb.3.end_block: + ; CHECK-NEXT: PseudoRET + bb.0.entry: + successors: %bb.1, %bb.2 + liveins: $x10, $x11, $x12 + + renamable $x13 = ADDI $x0, 1 + BLTU killed renamable $x13, $x13, %bb.2 + PseudoBR %bb.1 + + bb.1.block1: + liveins: $x10, $x11 + + renamable $x10 = LW killed renamable $x10, 0 + SW killed renamable $x10, killed renamable $x11, 0 + PseudoBR %bb.3 + + bb.2.block2: + liveins: $x12 + + renamable $x10 = ADDI $x0, 87 + SW killed renamable $x10, killed renamable $x12, 0 + + bb.3.end_block: + PseudoRET + +... +--- +name: peephole_bge +alignment: 4 +tracksRegLiveness: true +liveins: + - { reg: '$x10' } + - { reg: '$x11' } + - { reg: '$x12' } +body: | + ; CHECK-LABEL: name: peephole_bge + ; CHECK: bb.0.entry: + ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000) + ; CHECK-NEXT: liveins: $x10, $x11, $x12 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: JAL $x0, %bb.2 + ; CHECK-NEXT: PseudoBR %bb.1 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: bb.1.block1: + ; CHECK-NEXT: successors: %bb.3(0x80000000) + ; CHECK-NEXT: liveins: $x10, $x11 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: renamable $x10 = LW killed renamable $x10, 0 + ; CHECK-NEXT: SW killed renamable $x10, killed renamable $x11, 0 + ; CHECK-NEXT: PseudoBR %bb.3 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: bb.2.block2: + ; CHECK-NEXT: successors: %bb.3(0x80000000) + ; CHECK-NEXT: liveins: $x12 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: renamable $x10 = ADDI $x0, 87 + ; CHECK-NEXT: SW killed renamable $x10, killed renamable $x12, 0 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: bb.3.end_block: + ; CHECK-NEXT: PseudoRET + bb.0.entry: + successors: %bb.1, %bb.2 + liveins: $x10, $x11, $x12 + + BGE $x0, $x0, %bb.2 + PseudoBR %bb.1 + + bb.1.block1: + liveins: $x10, $x11 + + renamable $x10 = LW killed renamable $x10, 0 + SW killed renamable $x10, killed renamable $x11, 0 + PseudoBR %bb.3 + + bb.2.block2: + liveins: $x12 + + renamable $x10 = ADDI $x0, 87 + SW killed renamable $x10, killed renamable $x12, 0 + + bb.3.end_block: + PseudoRET + +... +--- +name: peephole_bgeu +alignment: 4 +tracksRegLiveness: true +liveins: + - { reg: '$x10' } + - { reg: '$x11' } + - { reg: '$x12' } +body: | + ; CHECK-LABEL: name: peephole_bgeu + ; CHECK: bb.0.entry: + ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000) + ; CHECK-NEXT: liveins: $x10, $x11, $x12 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: JAL $x0, %bb.2 + ; CHECK-NEXT: PseudoBR %bb.1 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: bb.1.block1: + ; CHECK-NEXT: successors: %bb.3(0x80000000) + ; CHECK-NEXT: liveins: $x10, $x11 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: renamable $x10 = LW killed renamable $x10, 0 + ; CHECK-NEXT: SW killed renamable $x10, killed renamable $x11, 0 + ; CHECK-NEXT: PseudoBR %bb.3 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: bb.2.block2: + ; CHECK-NEXT: successors: %bb.3(0x80000000) + ; CHECK-NEXT: liveins: $x12 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: renamable $x10 = ADDI $x0, 87 + ; CHECK-NEXT: SW killed renamable $x10, killed renamable $x12, 0 + ; CHECK-NEXT: {{ $}} + ; CHECK-NEXT: bb.3.end_block: + ; CHECK-NEXT: PseudoRET + bb.0.entry: + successors: %bb.1, %bb.2 + liveins: $x10, $x11, $x12 + + BGEU $x0, $x0, %bb.2 + PseudoBR %bb.1 + + bb.1.block1: + liveins: $x10, $x11 + + renamable $x10 = LW killed renamable $x10, 0 + SW killed renamable $x10, killed renamable $x11, 0 + PseudoBR %bb.3 + + bb.2.block2: + liveins: $x12 + + renamable $x10 = ADDI $x0, 87 + SW killed renamable $x10, killed renamable $x12, 0 + + bb.3.end_block: + PseudoRET + +... ``````````
wangpc-pp commented 2 weeks ago

Do you have an end-to-end C language example that will produce these instructions?

zengdage commented 2 weeks ago

Do you have an end-to-end C language example that will produce these instructions?

The following is the minimal code which can produce the bne a0, a0, %bb1 instruction. It's cropped from spec2006/400.perlbench/Base64.c/XS_MIME__QuotedPrint_encode_qp function which can produce bne instruction too.

#define qp_isplain(c) ((c) == '\t' || (((c) >= ' ' && (c) <= '~') && (c) != '='))
int test_bne(char *beg, char *end)
{
    char *p;
    char *p_beg;

    p = beg;
    while (1) {
            p_beg = p;

            while (p < end && qp_isplain(*p)) {
                p++;
            }
            if (p == end || *p == '\n') {
            while (p > p_beg && (*(p - 1) == '\t' || *(p - 1) == ' '))
               p--;
            }
    }
}

Build command: clang -S -fno-PIC -g -march=rv64imafdc_zba_zbb_zbc_zbs -mabi=lp64d -Ofast -o xxx.S xxx.c

topperc commented 2 weeks ago

Do you have an end-to-end C language example that will produce these instructions?

The following is the minimal code which can produce the bne a0, a0, %bb1 instruction. It's cropped from spec2006/400.perlbench/Base64.c/XS_MIME__QuotedPrint_encode_qp function which can produce bne instruction too.

#define qp_isplain(c) ((c) == '\t' || (((c) >= ' ' && (c) <= '~') && (c) != '='))
int test_bne(char *beg, char *end)
{
    char *p;
    char *p_beg;

    p = beg;
    while (1) {
            p_beg = p;

            while (p < end && qp_isplain(*p)) {
                p++;
            }
            if (p == end || *p == '\n') {
            while (p > p_beg && (*(p - 1) == '\t' || *(p - 1) == ' '))
               p--;
            }
    }
}

Build command: clang -S -fno-PIC -g -march=rv64imafdc_zba_zbb_zbc_zbs -mabi=lp64d -Ofast -o xxx.S xxx.c

That compiles to an empty function on compiler explorer https://godbolt.org/z/ss3fqYTo4

topperc commented 2 weeks ago

Is this a widespread problem? In my local testing I see 2 places in perlbench.

Do you expect a performance gain from fixing this on perlbench?

I'm hesitant to add a new pass for a problem that occurs twice if it doesn't improve performance.

zengdage commented 2 weeks ago

clang -S -fno-PIC -g -march=rv64imafdc_zba_zbb_zbc_zbs -mabi=lp64d -Ofast -o xxx.S xxx.c

Sorry, these are 13 places in my local SPEC2006 perlbench. Build command: clang -c -DSPEC_CPU -DNDEBUG -DPERL_CORE -fno-pic -DSPEC_CPU_LP64 -Wno-everything -DSPEC_CPU_LINUX_X64 -std=gnu89 -Ofast -march=rv64gc_zba_zbb_zbc_zbs -mabi=lp64d xxx.c -o xxx.o

A list of these instruction from the above build command. If add -mllvm -inline-threshold=3420 -mllvm -unroll-max-count=15 to build, I will get 21 places in my local SPEC2006 perlbench.

mg.c
MI:BEQ $x0, $x0, %bb.17

perlio.c
MI:BEQ $x0, $x0, %bb.18

pp_pack.c
MI:BGEU $x22, renamable $x22, %bb.73

regexec.c
MI:BEQ $x0, $x0, %bb.548
MI:BNE $x0, $x0, %bb.548
MI:BNE $x0, $x0, %bb.56
MI:BEQ $x0, $x0, %bb.102
MI:BEQ $x0, $x0, %bb.215
MI:BNE $x0, $x0, %bb.215

toke.c
MI:BLTU $x10, renamable $x10, %bb.1778

util.c
MI:BNE $x0, $x0, %bb.5

Base64.c
MI:BNE $x27, renamable $x27, %bb.33

And my llvm project based on 93e69abfc77b0bd90f3669e36e510dd4f45aab14, and I use the following commands to build it.

cmake -DLLVM_PARALLEL_LINK_JOBS=1 -DLLVM_TARGETS_TO_BUILD=RISCV -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang;lld" -DCMAKE_INSTALL_PREFIX=/toolchain/gitlab_llvm_install/centos  -DLLVM_DEFAULT_TARGET_TRIPLE=riscv64-unknown-linux-gnu -DLLVM_USE_LINKER=gold -DLLVM_BINUTILS_INCDIR=/toolchain/gitlab_llvm_install/lto/install/include -G Ninja ../llvm
zengdage commented 2 weeks ago

Is this a widespread problem? In my local testing I see 2 places in perlbench.

Do you expect a performance gain from fixing this on perlbench?

I'm hesitant to add a new pass for a problem that occurs twice if it doesn't improve performance.

Negligible performance gain for perlbench, may be 0.1%.

topperc commented 2 weeks ago

clang -S -fno-PIC -g -march=rv64imafdc_zba_zbb_zbc_zbs -mabi=lp64d -Ofast -o xxx.S xxx.c

Sorry, these are 13 places in my local SPEC2006 perlbench. Build command: clang -c -DSPEC_CPU -DNDEBUG -DPERL_CORE -fno-pic -DSPEC_CPU_LP64 -Wno-everything -DSPEC_CPU_LINUX_X64 -std=gnu89 -Ofast -march=rv64gc_zba_zbb_zbc_zbs -mabi=lp64d xxx.c -o xxx.o

A list of these instruction from the above build command. If add -mllvm -inline-threshold=3420 -mllvm -unroll-max-count=15 to build, I will get 21 places in my local SPEC2006 perlbench.

mg.c
MI:BEQ $x0, $x0, %bb.17

perlio.c
MI:BEQ $x0, $x0, %bb.18

pp_pack.c
MI:BGEU $x22, renamable $x22, %bb.73

regexec.c
MI:BEQ $x0, $x0, %bb.548
MI:BNE $x0, $x0, %bb.548
MI:BNE $x0, $x0, %bb.56
MI:BEQ $x0, $x0, %bb.102
MI:BEQ $x0, $x0, %bb.215
MI:BNE $x0, $x0, %bb.215

toke.c
MI:BLTU $x10, renamable $x10, %bb.1778

util.c
MI:BNE $x0, $x0, %bb.5

Base64.c
MI:BNE $x27, renamable $x27, %bb.33

My grep of the disassembly missed the beq/bne x0, x0 cases because they are beqz/bnez with the compressed extension and I didn't check for that.

zengdage commented 1 week ago

Hello, is this pr worth merging? If not, should I close it?