Closed tianboh closed 1 month ago
Is this a question about Zfinx/Zhinx? It might make sense to have GPR sub registers there.
For integer, we only have 1 legal type in SelectionDAG, XLenVT. All integer instructions write the whole GPR. We keep track of the upper bits so we can remove unnecessary sign extends using ComputeNumSignBits.
For FP registers, we don't care about the nan-boxing behavior of the upper bits.
@llvm/issue-subscribers-backend-risc-v
Author: Tianbo (tianboh)
@topperc Thank you for your reply ! 😄
Is this a question about Zfinx/Zhinx?
Actually not, I am just curious about the design philosophy between GPR and FPR here.
For integer, we only have 1 legal type in SelectionDAG, XLenVT. All integer instructions write the whole GPR. We keep track of the upper bits so we can remove unnecessary sign extends using ComputeNumSignBits.
So it seems that we need to write the whole GPR even though the instruction is of OP-32
or OP-IMM-32
type, like ADDW
and ADDWI
. However, I don't understand the benefits of removing unnecessary sign extends here, since we are writing the whole GPR. I am assuming we are either writing all 1s
or 0s
to the upper 32-bit.
For FP registers, we don't care about the nan-boxing behavior of the upper bits.
Why we don't need to worry about the nan-boxing behavior? Is it because the instruction(fadd.s
, fadd.d
) can tell us the floating point type?
I am also curious about if we are REALLY accessing/writing/reading the sub register of FPR in hardware. Is this a decision made by hardware manufacture? For example, you can either read/write the whole FPR or a subset of it to have the same functionality of fadd.s
.
The basic idea is that floating-point instructions have the full set of arithmetic operations available for each width, for which the upper nan-boxed bits aren't read. Contrast that with integer instructions where only a select few arithmetic operations exist for non-XLEN width, and so any less-than-XLEN operations which don't have a dedicated instruction must use the full XLEN ones with the inputs suitably sign/zero-extended and the output truncated. That's why they're modelled differently in LLVM.
For integer, we only have 1 legal type in SelectionDAG, XLenVT. All integer instructions write the whole GPR. We keep track of the upper bits so we can remove unnecessary sign extends using ComputeNumSignBits.
So it seems that we need to write the whole GPR even though the instruction is of
OP-32
orOP-IMM-32
type, likeADDW
andADDWI
. However, I don't understand the benefits of removing unnecessary sign extends here, since we are writing the whole GPR. I am assuming we are either writing all1s
or0s
to the upper 32-bit.
RV64 doesn't have compare instructions that use only the lower 32 bits. We also don't have W forms of AND/OR/XOR. Compare uses the whole GPR. We try very hard to use W instructions and to track the number of sign bits across AND/OR/XOR and other instruction so that we don't unncessarily put a sext.w in front of compare. This is just one example.
For FP registers, we don't care about the nan-boxing behavior of the upper bits.
Why we don't need to worry about the nan-boxing behavior? Is it because the instruction(
fadd.s
,fadd.d
) can tell us the floating point type?
I mean the compiler doesn't do any optimizations that rely on knowing that the upper bits are nan-boxed.
I am also curious about if we are REALLY accessing/writing/reading the sub register of FPR in hardware. Is this a decision made by hardware manufacture? For example, you can either read/write the whole FPR or a subset of it to have the same functionality of
fadd.s
.
The hardware writes the whole register. If the CPU implements the D extension, the fadd.s instruction must set the upper 32 bits of the 64-bit FPR to all 1s to nan-box the value. If the inputs to the fadd.s don't have all 1s in the upper 32 bits, the input is treated as a NaN regardless of what the lower 32 bits are.
@topperc @jrtc27 Thank you all for the excellent answers!
To briefly sumup, its important to distinguish register definition/representation of RISCV SPEC, LLVM, and hardware.
For RISCV SPEC, there is no sub register stuff, you only have x0-x31 and f0-f31.
For LLVM, as @jrtc27 suggests, F-extension provides different FLEN operations, so LLVM provides more fine-grained register representations. On the contrary, only a few integer operations have different XLEN size instructions, so that LLVM only supports whole GPR representation.
Things become the same in the hardware, as @topperc says, instructions will write the whole FPR and GPR even though they are manipulating a smaller sized variable, because of sign-extending and nan-boxing behaviour.
I find that floating point register generate sub registers but general purpose register not. For example, you can use
F10_H
but there is noX10_H
.I checked the RISCVRegisterInfo.td, and find that floating point register support more fine-grained sub registers, check below
You can access 16 bit, 32 bit or 64 bit of the same floating pointer register by adding suffiex
_H
,_F
and_D
.However, the general purpose register is different: you can only have the whole register even though you just want to access a sub register of it. Check its definition below
My question is, why do we support more fine-grained access for floating point registers but not for general purpose registers?