llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.53k stars 11.79k forks source link

[RISCV] Why RISCVRegisterInfo.td doesn't generates sub register for general purpose register? #106347

Closed tianboh closed 1 month ago

tianboh commented 1 month ago

I find that floating point register generate sub registers but general purpose register not. For example, you can use F10_H but there is no X10_H.

I checked the RISCVRegisterInfo.td, and find that floating point register support more fine-grained sub registers, check below

// Floating point registers
let RegAltNameIndices = [ABIRegAltName] in {
  def F0_H  : RISCVReg16<0, "f0", ["ft0"]>, DwarfRegNum<[32]>;
  def F1_H  : RISCVReg16<1, "f1", ["ft1"]>, DwarfRegNum<[33]>;
  def F2_H  : RISCVReg16<2, "f2", ["ft2"]>, DwarfRegNum<[34]>;
  def F3_H  : RISCVReg16<3, "f3", ["ft3"]>, DwarfRegNum<[35]>;
  def F4_H  : RISCVReg16<4, "f4", ["ft4"]>, DwarfRegNum<[36]>;
  def F5_H  : RISCVReg16<5, "f5", ["ft5"]>, DwarfRegNum<[37]>;
  def F6_H  : RISCVReg16<6, "f6", ["ft6"]>, DwarfRegNum<[38]>;
  def F7_H  : RISCVReg16<7, "f7", ["ft7"]>, DwarfRegNum<[39]>;
  def F8_H  : RISCVReg16<8, "f8", ["fs0"]>, DwarfRegNum<[40]>;
  def F9_H  : RISCVReg16<9, "f9", ["fs1"]>, DwarfRegNum<[41]>;
  def F10_H : RISCVReg16<10,"f10", ["fa0"]>, DwarfRegNum<[42]>;
  def F11_H : RISCVReg16<11,"f11", ["fa1"]>, DwarfRegNum<[43]>;
  def F12_H : RISCVReg16<12,"f12", ["fa2"]>, DwarfRegNum<[44]>;
  def F13_H : RISCVReg16<13,"f13", ["fa3"]>, DwarfRegNum<[45]>;
  def F14_H : RISCVReg16<14,"f14", ["fa4"]>, DwarfRegNum<[46]>;
  def F15_H : RISCVReg16<15,"f15", ["fa5"]>, DwarfRegNum<[47]>;
  def F16_H : RISCVReg16<16,"f16", ["fa6"]>, DwarfRegNum<[48]>;
  def F17_H : RISCVReg16<17,"f17", ["fa7"]>, DwarfRegNum<[49]>;
  def F18_H : RISCVReg16<18,"f18", ["fs2"]>, DwarfRegNum<[50]>;
  def F19_H : RISCVReg16<19,"f19", ["fs3"]>, DwarfRegNum<[51]>;
  def F20_H : RISCVReg16<20,"f20", ["fs4"]>, DwarfRegNum<[52]>;
  def F21_H : RISCVReg16<21,"f21", ["fs5"]>, DwarfRegNum<[53]>;
  def F22_H : RISCVReg16<22,"f22", ["fs6"]>, DwarfRegNum<[54]>;
  def F23_H : RISCVReg16<23,"f23", ["fs7"]>, DwarfRegNum<[55]>;
  def F24_H : RISCVReg16<24,"f24", ["fs8"]>, DwarfRegNum<[56]>;
  def F25_H : RISCVReg16<25,"f25", ["fs9"]>, DwarfRegNum<[57]>;
  def F26_H : RISCVReg16<26,"f26", ["fs10"]>, DwarfRegNum<[58]>;
  def F27_H : RISCVReg16<27,"f27", ["fs11"]>, DwarfRegNum<[59]>;
  def F28_H : RISCVReg16<28,"f28", ["ft8"]>, DwarfRegNum<[60]>;
  def F29_H : RISCVReg16<29,"f29", ["ft9"]>, DwarfRegNum<[61]>;
  def F30_H : RISCVReg16<30,"f30", ["ft10"]>, DwarfRegNum<[62]>;
  def F31_H : RISCVReg16<31,"f31", ["ft11"]>, DwarfRegNum<[63]>;

  foreach Index = 0-31 in {
    def F#Index#_F : RISCVReg32<!cast<RISCVReg16>("F"#Index#"_H")>,
      DwarfRegNum<[!add(Index, 32)]>;
  }

  foreach Index = 0-31 in {
    def F#Index#_D : RISCVReg64<!cast<RISCVReg32>("F"#Index#"_F")>,
      DwarfRegNum<[!add(Index, 32)]>;
  }
}

You can access 16 bit, 32 bit or 64 bit of the same floating pointer register by adding suffiex _H, _F and _D.

However, the general purpose register is different: you can only have the whole register even though you just want to access a sub register of it. Check its definition below

let RegAltNameIndices = [ABIRegAltName] in {
  let isConstant = true in
  def X0  : RISCVReg<0, "x0", ["zero"]>, DwarfRegNum<[0]>;
  let CostPerUse = [0, 1] in {
  def X1  : RISCVReg<1, "x1", ["ra"]>, DwarfRegNum<[1]>;
  def X2  : RISCVReg<2, "x2", ["sp"]>, DwarfRegNum<[2]>;
  def X3  : RISCVReg<3, "x3", ["gp"]>, DwarfRegNum<[3]>;
  def X4  : RISCVReg<4, "x4", ["tp"]>, DwarfRegNum<[4]>;
  def X5  : RISCVReg<5, "x5", ["t0"]>, DwarfRegNum<[5]>;
  def X6  : RISCVReg<6, "x6", ["t1"]>, DwarfRegNum<[6]>;
  def X7  : RISCVReg<7, "x7", ["t2"]>, DwarfRegNum<[7]>;
  }
  def X8  : RISCVReg<8, "x8", ["s0", "fp"]>, DwarfRegNum<[8]>;
  def X9  : RISCVReg<9, "x9", ["s1"]>, DwarfRegNum<[9]>;
  def X10 : RISCVReg<10,"x10", ["a0"]>, DwarfRegNum<[10]>;
  def X11 : RISCVReg<11,"x11", ["a1"]>, DwarfRegNum<[11]>;
  def X12 : RISCVReg<12,"x12", ["a2"]>, DwarfRegNum<[12]>;
  def X13 : RISCVReg<13,"x13", ["a3"]>, DwarfRegNum<[13]>;
  def X14 : RISCVReg<14,"x14", ["a4"]>, DwarfRegNum<[14]>;
  def X15 : RISCVReg<15,"x15", ["a5"]>, DwarfRegNum<[15]>;
  let CostPerUse = [0, 1] in {
  def X16 : RISCVReg<16,"x16", ["a6"]>, DwarfRegNum<[16]>;
  def X17 : RISCVReg<17,"x17", ["a7"]>, DwarfRegNum<[17]>;
  def X18 : RISCVReg<18,"x18", ["s2"]>, DwarfRegNum<[18]>;
  def X19 : RISCVReg<19,"x19", ["s3"]>, DwarfRegNum<[19]>;
  def X20 : RISCVReg<20,"x20", ["s4"]>, DwarfRegNum<[20]>;
  def X21 : RISCVReg<21,"x21", ["s5"]>, DwarfRegNum<[21]>;
  def X22 : RISCVReg<22,"x22", ["s6"]>, DwarfRegNum<[22]>;
  def X23 : RISCVReg<23,"x23", ["s7"]>, DwarfRegNum<[23]>;
  def X24 : RISCVReg<24,"x24", ["s8"]>, DwarfRegNum<[24]>;
  def X25 : RISCVReg<25,"x25", ["s9"]>, DwarfRegNum<[25]>;
  def X26 : RISCVReg<26,"x26", ["s10"]>, DwarfRegNum<[26]>;
  def X27 : RISCVReg<27,"x27", ["s11"]>, DwarfRegNum<[27]>;
  def X28 : RISCVReg<28,"x28", ["t3"]>, DwarfRegNum<[28]>;
  def X29 : RISCVReg<29,"x29", ["t4"]>, DwarfRegNum<[29]>;
  def X30 : RISCVReg<30,"x30", ["t5"]>, DwarfRegNum<[30]>;
  def X31 : RISCVReg<31,"x31", ["t6"]>, DwarfRegNum<[31]>;
  }
}

My question is, why do we support more fine-grained access for floating point registers but not for general purpose registers?

topperc commented 1 month ago

Is this a question about Zfinx/Zhinx? It might make sense to have GPR sub registers there.

For integer, we only have 1 legal type in SelectionDAG, XLenVT. All integer instructions write the whole GPR. We keep track of the upper bits so we can remove unnecessary sign extends using ComputeNumSignBits.

For FP registers, we don't care about the nan-boxing behavior of the upper bits.

llvmbot commented 1 month ago

@llvm/issue-subscribers-backend-risc-v

Author: Tianbo (tianboh)

I find that floating point register generate sub registers but general purpose register not. For example, you can use `F10_H` but there is no `X10_H`. I checked the RISCVRegisterInfo.td, and find that floating point register support more fine-grained sub registers, check below ``` // Floating point registers let RegAltNameIndices = [ABIRegAltName] in { def F0_H : RISCVReg16<0, "f0", ["ft0"]>, DwarfRegNum<[32]>; def F1_H : RISCVReg16<1, "f1", ["ft1"]>, DwarfRegNum<[33]>; def F2_H : RISCVReg16<2, "f2", ["ft2"]>, DwarfRegNum<[34]>; def F3_H : RISCVReg16<3, "f3", ["ft3"]>, DwarfRegNum<[35]>; def F4_H : RISCVReg16<4, "f4", ["ft4"]>, DwarfRegNum<[36]>; def F5_H : RISCVReg16<5, "f5", ["ft5"]>, DwarfRegNum<[37]>; def F6_H : RISCVReg16<6, "f6", ["ft6"]>, DwarfRegNum<[38]>; def F7_H : RISCVReg16<7, "f7", ["ft7"]>, DwarfRegNum<[39]>; def F8_H : RISCVReg16<8, "f8", ["fs0"]>, DwarfRegNum<[40]>; def F9_H : RISCVReg16<9, "f9", ["fs1"]>, DwarfRegNum<[41]>; def F10_H : RISCVReg16<10,"f10", ["fa0"]>, DwarfRegNum<[42]>; def F11_H : RISCVReg16<11,"f11", ["fa1"]>, DwarfRegNum<[43]>; def F12_H : RISCVReg16<12,"f12", ["fa2"]>, DwarfRegNum<[44]>; def F13_H : RISCVReg16<13,"f13", ["fa3"]>, DwarfRegNum<[45]>; def F14_H : RISCVReg16<14,"f14", ["fa4"]>, DwarfRegNum<[46]>; def F15_H : RISCVReg16<15,"f15", ["fa5"]>, DwarfRegNum<[47]>; def F16_H : RISCVReg16<16,"f16", ["fa6"]>, DwarfRegNum<[48]>; def F17_H : RISCVReg16<17,"f17", ["fa7"]>, DwarfRegNum<[49]>; def F18_H : RISCVReg16<18,"f18", ["fs2"]>, DwarfRegNum<[50]>; def F19_H : RISCVReg16<19,"f19", ["fs3"]>, DwarfRegNum<[51]>; def F20_H : RISCVReg16<20,"f20", ["fs4"]>, DwarfRegNum<[52]>; def F21_H : RISCVReg16<21,"f21", ["fs5"]>, DwarfRegNum<[53]>; def F22_H : RISCVReg16<22,"f22", ["fs6"]>, DwarfRegNum<[54]>; def F23_H : RISCVReg16<23,"f23", ["fs7"]>, DwarfRegNum<[55]>; def F24_H : RISCVReg16<24,"f24", ["fs8"]>, DwarfRegNum<[56]>; def F25_H : RISCVReg16<25,"f25", ["fs9"]>, DwarfRegNum<[57]>; def F26_H : RISCVReg16<26,"f26", ["fs10"]>, DwarfRegNum<[58]>; def F27_H : RISCVReg16<27,"f27", ["fs11"]>, DwarfRegNum<[59]>; def F28_H : RISCVReg16<28,"f28", ["ft8"]>, DwarfRegNum<[60]>; def F29_H : RISCVReg16<29,"f29", ["ft9"]>, DwarfRegNum<[61]>; def F30_H : RISCVReg16<30,"f30", ["ft10"]>, DwarfRegNum<[62]>; def F31_H : RISCVReg16<31,"f31", ["ft11"]>, DwarfRegNum<[63]>; foreach Index = 0-31 in { def F#Index#_F : RISCVReg32<!cast<RISCVReg16>("F"#Index#"_H")>, DwarfRegNum<[!add(Index, 32)]>; } foreach Index = 0-31 in { def F#Index#_D : RISCVReg64<!cast<RISCVReg32>("F"#Index#"_F")>, DwarfRegNum<[!add(Index, 32)]>; } } ``` You can access 16 bit, 32 bit or 64 bit of the same floating pointer register by adding suffiex `_H`, `_F` and `_D`. However, the general purpose register is different: you can only have the whole register even though you just want to access a sub register of it. Check its definition below ``` let RegAltNameIndices = [ABIRegAltName] in { let isConstant = true in def X0 : RISCVReg<0, "x0", ["zero"]>, DwarfRegNum<[0]>; let CostPerUse = [0, 1] in { def X1 : RISCVReg<1, "x1", ["ra"]>, DwarfRegNum<[1]>; def X2 : RISCVReg<2, "x2", ["sp"]>, DwarfRegNum<[2]>; def X3 : RISCVReg<3, "x3", ["gp"]>, DwarfRegNum<[3]>; def X4 : RISCVReg<4, "x4", ["tp"]>, DwarfRegNum<[4]>; def X5 : RISCVReg<5, "x5", ["t0"]>, DwarfRegNum<[5]>; def X6 : RISCVReg<6, "x6", ["t1"]>, DwarfRegNum<[6]>; def X7 : RISCVReg<7, "x7", ["t2"]>, DwarfRegNum<[7]>; } def X8 : RISCVReg<8, "x8", ["s0", "fp"]>, DwarfRegNum<[8]>; def X9 : RISCVReg<9, "x9", ["s1"]>, DwarfRegNum<[9]>; def X10 : RISCVReg<10,"x10", ["a0"]>, DwarfRegNum<[10]>; def X11 : RISCVReg<11,"x11", ["a1"]>, DwarfRegNum<[11]>; def X12 : RISCVReg<12,"x12", ["a2"]>, DwarfRegNum<[12]>; def X13 : RISCVReg<13,"x13", ["a3"]>, DwarfRegNum<[13]>; def X14 : RISCVReg<14,"x14", ["a4"]>, DwarfRegNum<[14]>; def X15 : RISCVReg<15,"x15", ["a5"]>, DwarfRegNum<[15]>; let CostPerUse = [0, 1] in { def X16 : RISCVReg<16,"x16", ["a6"]>, DwarfRegNum<[16]>; def X17 : RISCVReg<17,"x17", ["a7"]>, DwarfRegNum<[17]>; def X18 : RISCVReg<18,"x18", ["s2"]>, DwarfRegNum<[18]>; def X19 : RISCVReg<19,"x19", ["s3"]>, DwarfRegNum<[19]>; def X20 : RISCVReg<20,"x20", ["s4"]>, DwarfRegNum<[20]>; def X21 : RISCVReg<21,"x21", ["s5"]>, DwarfRegNum<[21]>; def X22 : RISCVReg<22,"x22", ["s6"]>, DwarfRegNum<[22]>; def X23 : RISCVReg<23,"x23", ["s7"]>, DwarfRegNum<[23]>; def X24 : RISCVReg<24,"x24", ["s8"]>, DwarfRegNum<[24]>; def X25 : RISCVReg<25,"x25", ["s9"]>, DwarfRegNum<[25]>; def X26 : RISCVReg<26,"x26", ["s10"]>, DwarfRegNum<[26]>; def X27 : RISCVReg<27,"x27", ["s11"]>, DwarfRegNum<[27]>; def X28 : RISCVReg<28,"x28", ["t3"]>, DwarfRegNum<[28]>; def X29 : RISCVReg<29,"x29", ["t4"]>, DwarfRegNum<[29]>; def X30 : RISCVReg<30,"x30", ["t5"]>, DwarfRegNum<[30]>; def X31 : RISCVReg<31,"x31", ["t6"]>, DwarfRegNum<[31]>; } } ``` My question is, why do we support more fine-grained access for floating point registers but not for general purpose registers?
tianboh commented 1 month ago

@topperc Thank you for your reply ! 😄

Is this a question about Zfinx/Zhinx?

Actually not, I am just curious about the design philosophy between GPR and FPR here.

For integer, we only have 1 legal type in SelectionDAG, XLenVT. All integer instructions write the whole GPR. We keep track of the upper bits so we can remove unnecessary sign extends using ComputeNumSignBits.

So it seems that we need to write the whole GPR even though the instruction is of OP-32 or OP-IMM-32 type, like ADDW and ADDWI. However, I don't understand the benefits of removing unnecessary sign extends here, since we are writing the whole GPR. I am assuming we are either writing all 1s or 0s to the upper 32-bit.

For FP registers, we don't care about the nan-boxing behavior of the upper bits.

Why we don't need to worry about the nan-boxing behavior? Is it because the instruction(fadd.s, fadd.d) can tell us the floating point type?

I am also curious about if we are REALLY accessing/writing/reading the sub register of FPR in hardware. Is this a decision made by hardware manufacture? For example, you can either read/write the whole FPR or a subset of it to have the same functionality of fadd.s.

jrtc27 commented 1 month ago

The basic idea is that floating-point instructions have the full set of arithmetic operations available for each width, for which the upper nan-boxed bits aren't read. Contrast that with integer instructions where only a select few arithmetic operations exist for non-XLEN width, and so any less-than-XLEN operations which don't have a dedicated instruction must use the full XLEN ones with the inputs suitably sign/zero-extended and the output truncated. That's why they're modelled differently in LLVM.

topperc commented 1 month ago

For integer, we only have 1 legal type in SelectionDAG, XLenVT. All integer instructions write the whole GPR. We keep track of the upper bits so we can remove unnecessary sign extends using ComputeNumSignBits.

So it seems that we need to write the whole GPR even though the instruction is of OP-32 or OP-IMM-32 type, like ADDW and ADDWI. However, I don't understand the benefits of removing unnecessary sign extends here, since we are writing the whole GPR. I am assuming we are either writing all 1s or 0s to the upper 32-bit.

RV64 doesn't have compare instructions that use only the lower 32 bits. We also don't have W forms of AND/OR/XOR. Compare uses the whole GPR. We try very hard to use W instructions and to track the number of sign bits across AND/OR/XOR and other instruction so that we don't unncessarily put a sext.w in front of compare. This is just one example.

For FP registers, we don't care about the nan-boxing behavior of the upper bits.

Why we don't need to worry about the nan-boxing behavior? Is it because the instruction(fadd.s, fadd.d) can tell us the floating point type?

I mean the compiler doesn't do any optimizations that rely on knowing that the upper bits are nan-boxed.

I am also curious about if we are REALLY accessing/writing/reading the sub register of FPR in hardware. Is this a decision made by hardware manufacture? For example, you can either read/write the whole FPR or a subset of it to have the same functionality of fadd.s.

The hardware writes the whole register. If the CPU implements the D extension, the fadd.s instruction must set the upper 32 bits of the 64-bit FPR to all 1s to nan-box the value. If the inputs to the fadd.s don't have all 1s in the upper 32 bits, the input is treated as a NaN regardless of what the lower 32 bits are.

tianboh commented 1 month ago

@topperc @jrtc27 Thank you all for the excellent answers!

To briefly sumup, its important to distinguish register definition/representation of RISCV SPEC, LLVM, and hardware.

For RISCV SPEC, there is no sub register stuff, you only have x0-x31 and f0-f31.

For LLVM, as @jrtc27 suggests, F-extension provides different FLEN operations, so LLVM provides more fine-grained register representations. On the contrary, only a few integer operations have different XLEN size instructions, so that LLVM only supports whole GPR representation.

Things become the same in the hardware, as @topperc says, instructions will write the whole FPR and GPR even though they are manipulating a smaller sized variable, because of sign-extending and nan-boxing behaviour.