riscv / riscv-bitmanip

Working draft of the proposed RISC-V Bitmanipulation extension
https://jira.riscv.org/browse/RVG-122
Creative Commons Attribution 4.0 International
206 stars 66 forks source link

Proposal: func7 "Quadrants" for OP[32/128][+/-IM] opcode family #42

Closed xphung closed 4 years ago

xphung commented 4 years ago

RISC V context for this proposal

This "OP Quadrant" proposal below has global implications for RISC V instruction encoding, but I propose it here in BitManip, as this is the first extension (other than 'M') to need such organisation within func7. (func7 & func3 have the usual meaning for the R-type instruction format).

Note: the RISC V user ISA spec explicitly states that RV128 may introduce new 128 bit instructions into an OP128 major opcode, which is the reverse of what happened for RV64. I assume this will be the case, in discussion below.

Why Quadrants are needed

"Contiguous" reserved opcode space is a precious resource. RISC V has only three reserved major opcodes left for future standard extensions.

Up to now, the only values of func7 for instructions within the OP-INT family of major opcodes are 0b0000000, 0b0000001 (MUL/DIV), and 0b0100000 (SUB/SRA). Bitmanip will substantially expand the usage of func7 values. It is important this is done in a rational way, as func7 values chosen within OP will also have major side effects on OP-IM, OP32[IM], and OP128[IM].

Within OP32 and OP128, up to 50% of these major opcodes are available as contiguous reserved space (for func3 values = 0bX1X (where X = 0 or 1), ie: do not correspond to any "Q" or "W" instruction. Care needs to be taken not to punch "holes" into this space. (Unfortunately, two "M" instructions break this rule in OP32, reducing OP32 continuous free opcode space slightly)

Problems with proposed v0.90 BitManip encoding

The current BitManip v0.90 encoding proposal are bit problematic in this regard as it punches "holes" into "non-W" sections of OP32. These non-W sections otherwise form part of an unused 50% of OP32/OP128, and scattered holes within them will limit the long term usefulness for other future extensions. (An example of a "hole" created in OP32 is BDEPW, which has a func3 value of 0b010).

BitManip v0.90 also unnecessarily introduces a new two source R-type format specifically for one instruction, FSRI, which moves the rs2 register field to a new position. This will complicate implementation of superscalar out-of-order microarchitectures, and breaks the existing RISC V approach of keeping rs1 and rs2 in the same positions for every relevant instruction.

Why a Quadrant division is intrinsically imposed onto func7 organisation

The choice of 4x32 value Quadrants is not an arbitrary choice. It is in fact fundamental to the organisation of RV32, RV64 and RV128.

There is a 32 x value func7 constraint for I-type shift instructions with a 7 bit (RV128) immediate fields. For RV64, I-type shift instructions have a 6 bit immediate field and can encode 64 values in their remaining instruction bits, hence translating into a 64 x value func7 constraint. (Hence Quadrants A & B need to be created for these distinct 2x32 value subsets of func7).

Also, dividing up func7 into Quadrants is natural for ternary instructions, as blocks of 32 x func7 values are needed to introduce an "rs3" instruction format (hence Quadrant "D" needs to be created for such rs3-type instructions).

Quadrants in detail

Below is an outline of how func7 should be structured into Quadrants A-D, based on the last two bit values of func7 (shown below as ' | 00' to ' | 11' ):

Quadrant A1 (n=1): instructions with func7 = 0b00000 | 00

Quadrant A2 (n=29): instructions with func7 in range 0b00001 | 00 to 0b11101 | 00

Quadrant A3 (n=2, but could grow if needed): instructions with func7 = 0b1111X | 00

Quadrant B (n=33): func7 value in range 0b00000 | 10 to 0b11111 | 10

Quadrant C (n=32): func7 value in range 0b00000 | 01 to 0b11111 | 01

Quadrant D (n=32): func7 value in range 0b00000 | 11 to 0b11111 | 11

Example BitManip encoding using Quadrants

Below is an example of how the above quadrants can be used to organise the BitManip proposed instructions:

func7 rs2 rs1 rd opcode func3►
000 100 001 101 010 011 110 111
Group A1 00000.00 rs2 rs1 rd 0110011 ADD XOR SLL SRL SLT SLTU OR AND
Group A2 01000.00 rs2 rs1 rd 0110011 SUB XNOR SBINV SRA ORN ANDN
00001.00 rs2 rs1 rd 0110011 ADDU.W PACK SBSET GREV MIN MINU
01001.00 rs2 rs1 rd 0110011 SUBU.W SBCLR SBEXT MAX MAXU
00010.00 rs2 rs1 rd 0110011 ROL ROR SLO SRO
01010.00 rs2 rs1 rd 0110011 BDEP BEXT SHFL UNSHFL
Group A3 11111.00 rs2 rs1 rd 0110011 CLMUL CLMULR CLMULH
Group B xxxxx.10
Group C 00000.01 rs2 rs1 rd 0110011 MUL DIV MULH DIVU MULHSU MULHU REM REMU
Group D rs3/imm5.11 rs2 rs1 rd 0110011 FSLI FSRI FSL FSR CMOVI CMOV CMIX

Note 1: OP-IM, OP32 and OP-32IM are not shown as these are automatically implied by the quadrant in which each instruction is added Note 2: RORI not included as can be replaced by FSRL/FSLI, and bitmatrix instructions not shown as these are RV64I only and best placed in OP32 with func3=0bX1X Note 3: Unary instructions not shown, are placed into OP-IM in the slot occupied by CLMULH (ie: Group A3 with func3=0bX01).

cliffordwolf commented 4 years ago

I think completely re-doing the encodings at this time is unpractical. There have been several calls for feedback on the encodings a few months ago. Now it's a bit late for that.

Regarding some of your points:

RISC V user ISA spec explicitly states that RV128 may introduce new 128 bit instructions into an OP128 major opcode, which is the reverse of what happened for RV64.

That's not true. In fact, it explicitly states the opposite:

image

Regarding your concern that the bitmanip encoding clutters funct7: The minor opcodes 001 and 101 are special in that the immediate field in OP-IMM is shorter for those two minor opcodes.

If you decode OP into funct3[1:0]=01 and funct3[1:0]!=01 first, you will find that each of those groups are encoded in a very compact manner with very conservative use of funct7 bits.

BitManip v0.90 also unnecessarily introduces a new two source R-type format specifically for one instruction, FSRI, which moves the rs2 register field to a new position.

No, the format doesn't move rs2 to another position. Like with all immediate instructions, rs2 is replaced by the immediate. The new field is rs3 and it is in the usual rs3 position.

Using FSRL/FSLI for RORI is impractical, because not all implementations will want to support ternary operations.

Also, there is no CMOVI and I don't think moving some immediate instructions from OP-IMM to OP is a good idea (and apparently that's what your FSRI/FSLI/Quadrant-D encoding does).

xphung commented 4 years ago

I think completely re-doing the encodings at this time is unpractical. There have been several calls for feedback on the encodings a few months ago. Now it's a bit late for that.

The example was done to illustrate the concept as simply as possible. Let me see if I can redo the example to retain as much of the existing encoding as possible. Unavoidable changes however will include changing BDEP/BEXT, MIN/MAX, SBINV and FSR/FSL/FSRI and associate "W" instructions. BDEPW/BEXTW are particularly problematic in v0.90 not because of excessive use of func7 in OP, but due to side effects on OP-32.

RISC V user ISA spec explicitly states that RV128 may introduce new 128 bit instructions into an OP128 major opcode, which is the reverse of what happened for RV64.

That's not true. In fact, it explicitly states the opposite:

image

See quote below from Page 42 of latest draft of ISA manual. Although I assumed (for discussion purposes) OP128 instead of OP64 will happen, this is assumption only has relevance to Quadrant B of my proposal, which is not used in my example BitManip encoding.

To improve compatibility with RV64, in a reverse of how RV32 to RV64 was handled, we might change the decoding around to rename RV64I ADD as a 64-bit ADDD, and add a 128-bit ADDQ in what was previously the OP-64 major opcode (now renamed the OP-128 major opcode).

Regarding your concern that the bitmanip encoding clutters funct7: The minor opcodes 001 and 101 are special in that the immediate field in OP-IMM is shorter for those two minor opcodes.

If you decode OP into funct3[1:0]=01 and funct3[1:0]!=01 first, you will find that each of those groups are encoded in a very compact manner with very conservative use of funct7 bits.

Yes for the case of OP I agree v0.90 is efficient in use of func7 bits.

For OP-IM[32] however, v0.90 uses 6 func bits (not counting ADDIU[W], which unavoidably uses all imm12 bits), when it's possible to only use 3 func bits (again, not counting ADDIU[W]). (Spare func bits OP-IM[32] is much more scarce than OP and needs to be prioritised for conservation).

BitManip v0.90 also unnecessarily introduces a new two source R-type format specifically for one instruction, FSRI, which moves the rs2 register field to a new position.

No, the format doesn't move rs2 to another position. Like with all immediate instructions, rs2 is replaced by the immediate. The new field is rs3 and it is in the usual rs3 position.

FSRI is not a ternary instruction (from the source register point of view). It is a two input instruction, using what is essentially the rs1 & rs2 source fields (ignoring the renaming of rs2 to rs3). It's register port usage & register side effect/conflict profile (in speculative execution architectures) is closest to the two register input R-type instruction, not the R4-type 3 register input instructions.

Using FSRL/FSLI for RORI is impractical, because not all implementations will want to support ternary operations.

See above, when considered from register port point of view FSRI/FSLI is only a 2 input instruction (not 3 input instruction). It uses an immediate field as a 3rd input but so do the SW/SB/SH instructions... any implementation with these instructions can already handle two register inputs + one immediate input.

Also, there is no CMOVI and I don't think moving some immediate instructions from OP-IMM to OP is a good idea (and apparently that's what your FSRI/FSLI/Quadrant-D encoding does).

I just put CMOVI in there for fun, it's not serious :)

However, see above on prioritising conservation of OP-IMM over OP.

cliffordwolf commented 4 years ago

Let me see if I can redo the example to retain as much of the existing encoding as possible.

Please don't. Obviously changing 30% of it would be as bad as changing all of it.

Yes for the case of OP I agree v0.90 is efficient in use of func7 bits. For OP-IM[32] however, v0.90 uses 6 func bits.

I don't know where you get these ideas. Obviously this is not true.

We are using 3 of 5 available funct7 bits in OP-IMM: insn[30], insn[29], and insn[27]. (The lowest two bits of funct7 are part of the immediate and are therefore not available for encoding instructions. The exception is of course 32-bit and 64-bit FSRI. There is no 128-bit FSRI.)

Please just look at the existing opcode encodings before wasting my time like this.

xphung commented 4 years ago

We are using 3 of 5 available funct7 bits in OP-IMM: insn[30], insn[29], and insn[27]. (The lowest two bits of funct7 are part of the immediate and are therefore not available for encoding instructions. The exception is of course 32-bit and 64-bit FSRI. There is no 128-bit FSRI.)

I am well aware of which immediate bits FSRI uses, and the 6 vs 7 bit shift requirements of RV64 vs RV128 (that's the whole point of distinguishing Quadrant A from Quadrant B within func7).

I count FSRI use of func7 as 6 bits, as my discussion takes into account the possibility RV128 goes down the OP128 path instead of OP64 path (as per page 42 of latest draft of RISC V ISA manual). A key advantage of doing OP128 instead of OP64 (aside from binary compatibility) is that the second last bit of func7 becomes useable in immediate I-type instructions where func3=0 bX01.

Bitmanip with FSRI will thus (1) use up an otherwise useable 6 bits of OP-IM in the above scenario, (2) complicate the use of the matching 6 func7 bits in OP itself (can only put instructions which don't have corresponding I-type instructions in the 5 bits used by FSRI), (3) introduce a new two source register instruction format just for a single instruction.

Also on such a path for RV128, leaving out FSRI won't be viable as the key advantage of the OP128 path (binary compatibility between RV64 and RV128) gets lost.

I think FSRI is a great instruction and think it should be core to Bitmap base by the way.... I just don't like it's encoding.

xphung commented 4 years ago

(accidental duplicate comment deleted) In relation to FSRI comment (2) above, FSL is technically an example of instruction without corresponding I type but I concede it is a minor point and key issues with FSRI are (1) and (3) above

cliffordwolf commented 4 years ago

This is now devolving into an argument about RISC-V design decisions beyond the scope of the bitmanip task group (should RV128 add OP128 instead of OP64). I'd direct you to the RV128 task group for that.

I've sent an email to the mailing list about this GitHub issue two days ago and so far nobody felt it was interesting enough to join this discussion. Therefore I'm now closing this issue.