Proposal: func7 "Quadrants" for OP[32/128][+/-IM] opcode family

RISC V context for this proposal

This "OP Quadrant" proposal below has global implications for RISC V instruction encoding, but I propose it here in BitManip, as this is the first extension (other than 'M') to need such organisation within func7. (func7 & func3 have the usual meaning for the R-type instruction format).

Note: the RISC V user ISA spec explicitly states that RV128 may introduce new 128 bit instructions into an OP128 major opcode, which is the reverse of what happened for RV64. I assume this will be the case, in discussion below.

Why Quadrants are needed

"Contiguous" reserved opcode space is a precious resource. RISC V has only three reserved major opcodes left for future standard extensions.

Up to now, the only values of func7 for instructions within the OP-INT family of major opcodes are 0b0000000, 0b0000001 (MUL/DIV), and 0b0100000 (SUB/SRA). Bitmanip will substantially expand the usage of func7 values. It is important this is done in a rational way, as func7 values chosen within OP will also have major side effects on OP-IM, OP32[IM], and OP128[IM].

Within OP32 and OP128, up to 50% of these major opcodes are available as contiguous reserved space (for func3 values = 0bX1X (where X = 0 or 1), ie: do not correspond to any "Q" or "W" instruction. Care needs to be taken not to punch "holes" into this space. (Unfortunately, two "M" instructions break this rule in OP32, reducing OP32 continuous free opcode space slightly)

Problems with proposed v0.90 BitManip encoding

The current BitManip v0.90 encoding proposal are bit problematic in this regard as it punches "holes" into "non-W" sections of OP32. These non-W sections otherwise form part of an unused 50% of OP32/OP128, and scattered holes within them will limit the long term usefulness for other future extensions. (An example of a "hole" created in OP32 is BDEPW, which has a func3 value of 0b010).

BitManip v0.90 also unnecessarily introduces a new two source R-type format specifically for one instruction, FSRI, which moves the rs2 register field to a new position. This will complicate implementation of superscalar out-of-order microarchitectures, and breaks the existing RISC V approach of keeping rs1 and rs2 in the same positions for every relevant instruction.

Why a Quadrant division is intrinsically imposed onto func7 organisation

The choice of 4x32 value Quadrants is not an arbitrary choice. It is in fact fundamental to the organisation of RV32, RV64 and RV128.

There is a 32 x value func7 constraint for I-type shift instructions with a 7 bit (RV128) immediate fields. For RV64, I-type shift instructions have a 6 bit immediate field and can encode 64 values in their remaining instruction bits, hence translating into a 64 x value func7 constraint. (Hence Quadrants A & B need to be created for these distinct 2x32 value subsets of func7).

Also, dividing up func7 into Quadrants is natural for ternary instructions, as blocks of 32 x func7 values are needed to introduce an "rs3" instruction format (hence Quadrant "D" needs to be created for such rs3-type instructions).

Quadrants in detail

Below is an outline of how func7 should be structured into Quadrants A-D, based on the last two bit values of func7 (shown below as ' | 00' to ' | 11' ):

Quadrant A1 (n=1): instructions with func7 = 0b00000 | 00

have matching I-type instruction in OP-IM/OP-32IM/OP-128IM
have matching "W" or "Q" instruction in OP-32/OP-128 if func3 = 0bX0X
does NOT have matching "W" or "Q" instruction in OP-32/OP-128 if func3 = 0bX1X

Quadrant A2 (n=29): instructions with func7 in range 0b00001 | 00 to 0b11101 | 00

(for func3=0bX01 only) have matching I-type instruction in OP-IM/OP-32IM/OP-128IM
have matching "W" or "Q" instruction in OP-32/OP-128 if func3 = 0bX0X
does NOT have matching "W" or "Q" instruction in OP-32/OP-128 if func3 = 0bX1X

Quadrant A3 (n=2, but could grow if needed): instructions with func7 = 0b1111X | 00

(if func3=0bX01) unary instructions within OP[32/64/128]IM, ie: the lower 5 bits of imm12 field is replaced by a func5 operand, which specifies a unary function operating on rs1 and storing result in rd. The func5 operand can specify 32 unary functions, of which func5 values 0b00000 & 0b00001 are reserved for functions which are derived from taking the corresponding two input Quadrant A3 OP function, and applying the value of zero or one to one of the two inputs (to yield a unary function). The remaining 30 unary functions can be arbitrary unary functions.
otherwise the rules for Quadrant A2 also apply to Quadrant A3

Quadrant B (n=33): func7 value in range 0b00000 | 10 to 0b11111 | 10

same as Quadrant A, except does not have corresponding I-type instructions for OP128IM.

Quadrant C (n=32): func7 value in range 0b00000 | 01 to 0b11111 | 01

currently used only by MUL/DIV, which (unfortunately) punches a hole in unused OP32 opcode space, by putting W version instructions into func3=0bX1X. (Maybe OP128 can avoid doing this in future, to reserve a fully contiguous half of the OP128 major for non-"Q" instruction uses).
does not have matching I-type instruction
can have matching "W" or "Q" instruction in OP-32/OP-128 for any func3 value

Quadrant D (n=32): func7 value in range 0b00000 | 11 to 0b11111 | 11

reserved for ternary functions (ie: instructions with an additional rs3 field, or with an additional 5 bit immediate operand). In this case, the FSRI instruction (with 64 shift range) can be replaced with FSLI and FSRI instructions, each with a 32 shift range.
rs3 field exists if func3=0bXX1 otherwise imm5 exists if func3=0bXX0 (note the instruction is still placed within OP, and not OP-IM despite the existence of imm5 as there are two source register inputs)
can have matching "W" or "Q" instruction in OP-32/OP-128 for func3 values = 0bX0X

Example BitManip encoding using Quadrants

Below is an example of how the above quadrants can be used to organise the BitManip proposed instructions:

	func7	rs2	rs1	rd	opcode	func3►
	▼				▼	000	100	001	101	010	011	110	111
Group A1	00000.00	rs2	rs1	rd	0110011	ADD	XOR	SLL	SRL	SLT	SLTU	OR	AND
Group A2	01000.00	rs2	rs1	rd	0110011	SUB	XNOR	SBINV	SRA			ORN	ANDN
	00001.00	rs2	rs1	rd	0110011	ADDU.W	PACK	SBSET	GREV	MIN	MINU
	01001.00	rs2	rs1	rd	0110011	SUBU.W		SBCLR	SBEXT	MAX	MAXU
	00010.00	rs2	rs1	rd	0110011	ROL	ROR	SLO	SRO
	01010.00	rs2	rs1	rd	0110011	BDEP	BEXT	SHFL	UNSHFL
Group A3	11111.00	rs2	rs1	rd	0110011	CLMUL	CLMULR	CLMULH
Group B	xxxxx.10
Group C	00000.01	rs2	rs1	rd	0110011	MUL	DIV	MULH	DIVU	MULHSU	MULHU	REM	REMU
Group D	rs3/imm5.11	rs2	rs1	rd	0110011	FSLI	FSRI	FSL	FSR	CMOVI	CMOV		CMIX

Note 1: OP-IM, OP32 and OP-32IM are not shown as these are automatically implied by the quadrant in which each instruction is added Note 2: RORI not included as can be replaced by FSRL/FSLI, and bitmatrix instructions not shown as these are RV64I only and best placed in OP32 with func3=0bX1X Note 3: Unary instructions not shown, are placed into OP-IM in the slot occupied by CLMULH (ie: Group A3 with func3=0bX01).

I think completely re-doing the encodings at this time is unpractical. There have been several calls for feedback on the encodings a few months ago. Now it's a bit late for that.

Regarding some of your points:

RISC V user ISA spec explicitly states that RV128 may introduce new 128 bit instructions into an OP128 major opcode, which is the reverse of what happened for RV64.

That's not true. In fact, it explicitly states the opposite:

Regarding your concern that the bitmanip encoding clutters funct7: The minor opcodes 001 and 101 are special in that the immediate field in OP-IMM is shorter for those two minor opcodes.

If you decode OP into funct3[1:0]=01 and funct3[1:0]!=01 first, you will find that each of those groups are encoded in a very compact manner with very conservative use of funct7 bits.

BitManip v0.90 also unnecessarily introduces a new two source R-type format specifically for one instruction, FSRI, which moves the rs2 register field to a new position.

No, the format doesn't move rs2 to another position. Like with all immediate instructions, rs2 is replaced by the immediate. The new field is rs3 and it is in the usual rs3 position.

Using FSRL/FSLI for RORI is impractical, because not all implementations will want to support ternary operations.

Also, there is no CMOVI and I don't think moving some immediate instructions from OP-IMM to OP is a good idea (and apparently that's what your FSRI/FSLI/Quadrant-D encoding does).

I think completely re-doing the encodings at this time is unpractical. There have been several calls for feedback on the encodings a few months ago. Now it's a bit late for that.

The example was done to illustrate the concept as simply as possible. Let me see if I can redo the example to retain as much of the existing encoding as possible. Unavoidable changes however will include changing BDEP/BEXT, MIN/MAX, SBINV and FSR/FSL/FSRI and associate "W" instructions. BDEPW/BEXTW are particularly problematic in v0.90 not because of excessive use of func7 in OP, but due to side effects on OP-32.

RISC V user ISA spec explicitly states that RV128 may introduce new 128 bit instructions into an OP128 major opcode, which is the reverse of what happened for RV64.

That's not true. In fact, it explicitly states the opposite:

See quote below from Page 42 of latest draft of ISA manual. Although I assumed (for discussion purposes) OP128 instead of OP64 will happen, this is assumption only has relevance to Quadrant B of my proposal, which is not used in my example BitManip encoding.

To improve compatibility with RV64, in a reverse of how RV32 to RV64 was handled, we might change the decoding around to rename RV64I ADD as a 64-bit ADDD, and add a 128-bit ADDQ in what was previously the OP-64 major opcode (now renamed the OP-128 major opcode).

Regarding your concern that the bitmanip encoding clutters funct7: The minor opcodes 001 and 101 are special in that the immediate field in OP-IMM is shorter for those two minor opcodes.

If you decode OP into funct3[1:0]=01 and funct3[1:0]!=01 first, you will find that each of those groups are encoded in a very compact manner with very conservative use of funct7 bits.

Yes for the case of OP I agree v0.90 is efficient in use of func7 bits.

For OP-IM[32] however, v0.90 uses 6 func bits (not counting ADDIU[W], which unavoidably uses all imm12 bits), when it's possible to only use 3 func bits (again, not counting ADDIU[W]). (Spare func bits OP-IM[32] is much more scarce than OP and needs to be prioritised for conservation).

BitManip v0.90 also unnecessarily introduces a new two source R-type format specifically for one instruction, FSRI, which moves the rs2 register field to a new position.

No, the format doesn't move rs2 to another position. Like with all immediate instructions, rs2 is replaced by the immediate. The new field is rs3 and it is in the usual rs3 position.

FSRI is not a ternary instruction (from the source register point of view). It is a two input instruction, using what is essentially the rs1 & rs2 source fields (ignoring the renaming of rs2 to rs3). It's register port usage & register side effect/conflict profile (in speculative execution architectures) is closest to the two register input R-type instruction, not the R4-type 3 register input instructions.

Using FSRL/FSLI for RORI is impractical, because not all implementations will want to support ternary operations.

See above, when considered from register port point of view FSRI/FSLI is only a 2 input instruction (not 3 input instruction). It uses an immediate field as a 3rd input but so do the SW/SB/SH instructions... any implementation with these instructions can already handle two register inputs + one immediate input.

Also, there is no CMOVI and I don't think moving some immediate instructions from OP-IMM to OP is a good idea (and apparently that's what your FSRI/FSLI/Quadrant-D encoding does).

I just put CMOVI in there for fun, it's not serious :)

However, see above on prioritising conservation of OP-IMM over OP.

Let me see if I can redo the example to retain as much of the existing encoding as possible.

Please don't. Obviously changing 30% of it would be as bad as changing all of it.

Yes for the case of OP I agree v0.90 is efficient in use of func7 bits. For OP-IM[32] however, v0.90 uses 6 func bits.

I don't know where you get these ideas. Obviously this is not true.

We are using 3 of 5 available funct7 bits in OP-IMM: insn[30], insn[29], and insn[27]. (The lowest two bits of funct7 are part of the immediate and are therefore not available for encoding instructions. The exception is of course 32-bit and 64-bit FSRI. There is no 128-bit FSRI.)

Please just look at the existing opcode encodings before wasting my time like this.

We are using 3 of 5 available funct7 bits in OP-IMM: insn[30], insn[29], and insn[27]. (The lowest two bits of funct7 are part of the immediate and are therefore not available for encoding instructions. The exception is of course 32-bit and 64-bit FSRI. There is no 128-bit FSRI.)

I am well aware of which immediate bits FSRI uses, and the 6 vs 7 bit shift requirements of RV64 vs RV128 (that's the whole point of distinguishing Quadrant A from Quadrant B within func7).

I count FSRI use of func7 as 6 bits, as my discussion takes into account the possibility RV128 goes down the OP128 path instead of OP64 path (as per page 42 of latest draft of RISC V ISA manual). A key advantage of doing OP128 instead of OP64 (aside from binary compatibility) is that the second last bit of func7 becomes useable in immediate I-type instructions where func3=0 bX01.

Bitmanip with FSRI will thus (1) use up an otherwise useable 6 bits of OP-IM in the above scenario, (2) complicate the use of the matching 6 func7 bits in OP itself (can only put instructions which don't have corresponding I-type instructions in the 5 bits used by FSRI), (3) introduce a new two source register instruction format just for a single instruction.

Also on such a path for RV128, leaving out FSRI won't be viable as the key advantage of the OP128 path (binary compatibility between RV64 and RV128) gets lost.

I think FSRI is a great instruction and think it should be core to Bitmap base by the way.... I just don't like it's encoding.

(accidental duplicate comment deleted) In relation to FSRI comment (2) above, FSL is technically an example of instruction without corresponding I type but I concede it is a minor point and key issues with FSRI are (1) and (3) above

This is now devolving into an argument about RISC-V design decisions beyond the scope of the bitmanip task group (should RV128 add OP128 instead of OP64). I'd direct you to the RV128 task group for that.

I've sent an email to the mailing list about this GitHub issue two days ago and so far nobody felt it was interesting enough to join this discussion. Therefore I'm now closing this issue.

riscv / riscv-bitmanip