Closed grayresearch closed 6 months ago
896 of 4096 CSRs are custom CSRs:
| Unprivileged and User-Level CSRs
| BA | 98 | 7654 | Range | # | type
| 10 | 00 | XXXX | 0x800-0x8FF | 256 | U RW
| 11 | 00 | 11XX | 0xCC0-0xCFF | 64 | U RO
| Supervisor-Level CSRs
| 01 | 01 | 11XX | 0x5C0-0x5FF | 64 | S RW
| 10 | 01 | 11XX | 0x9C0-0x9FF | 64 | S RW
| 11 | 01 | 11XX | 0xDC0-0xDFF | 64 | S RO
| Hypervisor and VS CSRs
| 01 | 10 | 11XX | 0x6C0-0x6FF | 64 | H RW
| 10 | 10 | 11XX | 0xAC0-0xAFF | 64 | H RW
| 11 | 10 | 11XX | 0xEC0-0xEFF | 64 | H RO
| Machine-Level CSRs
| 01 | 11 | 11XX | 0x7C0-0x7FF | 64 | M RW
| 10 | 11 | 11XX | 0xBC0-0xBFF | 64 | M RW
| 11 | 11 | 11XX | 0xFC0-0xFFF | 64 | M RO`
In extending composable extension multiplexing to also multiplex custom CSR accesses, the CPU, not each CXU, shall first check the privilege access and read-write access of the CSR access, prior to forwarding the CSRR[WSC] to the selected extension and state context. If the hart does not have access or if the access is RW to a read-only CSR, the CSR access shall raise an illegal instruction exception per the priv spec.
It is now time to study the nitty-gritty LUT overhead of adding CX CSRs to CXU-LI and pick something.
First let's compare and contrast cx_reg cx_imm addi csrrw and csrrwi
cf_id[9:3] rs2[4:0] rs1[4:0] cf_id[2:0] rd[4:0] custom0[6:0] cx_reg
imm[7:0] cf_id[3:0] rs1[4:0] 000[2:0] rd[4:0] custom1[6:0] cx_imm
imm[11:0] rs1[4:0] func3[2:0] rd[4:0] op_imm_[6:0] addi (I-type)
csr[11:0] rs1[4:0] func3[2:0] rd[4:0] special[6:0] csrrw
csr[11:0] uimm[4:0] func3[2:0] rd[4:0] special[6:0] csrrwi
It certainly seems the present cx_imm custom1 opcode design is a mistake. It is irregular vs. addi even though it only provides a four-bit cf_id[3:0] vs. what it might with addi/I-type's 3-bit func3. The comments in the spec note: "This new, irregular immediate field encoding may have a disproportionate impact on area and critical path delay in the decode or execute pipeline stages of a RISC-V processor core." Also, "Seven-eighths of the custom-1 encoding space is reserved for future custom function instruction encodings." That is less of a concern now that mcx_selector.version exists to gracefully allow future custom instruction encodings.
There is nothing special about the current cx_imm encoding. It was decided arbitrarily during a 2020-21 design meeting and without full consideration of the cost of the irregularlity nor of the new requirement to HW-frugally support CX CSR accesses.
Although an I-type cx_imm only supports 8 CF_IDs per CX, the present irregular cx_imm only supports 16. Either way, you must resort to cx_reg when you need more than a 3b or 4b CF_ID.
So for starters, let's assume we change cx_imm so it follows the same layout as addi, with the cf_id[2:0] supplied by func3[2:0]. Now let's recap our table:
cf_id[9:3] rs2[4:0] rs1[4:0] cf_id[2:0] rd[4:0] custom0[6:0] cx_reg
imm[11:0] rs1[4:0] cf_id[2:0] rd[4:0] custom1[6:0] cx_imm (I-type)
csr[11:0] rs1[4:0] func3[2:0] rd[4:0] special[6:0] csrrw
csr[11:0] uimm[4:0] func3[2:0] rd[4:0] special[6:0] csrrwi
Here we see that like addi, this new I-type cx_imm takes the 12b immediate in insn[31:20] and sign-extends and muxes it into the second ALU operand register. This same value becomes the CXU-LI req_data1[] operand. Since we're already doing that, and since the CSR address csr[11:0] of CSRR[WSC][I] instructions is also at insn[31:20], it follows we can convey the CSR address to the CPU's CXU-LI request port req_data1[] for zero additional LUTs. (!!!)
The second thing the CPU must convey to the CXU request is that the request is a CSR access (either W, S, or C) or is a custom function instruction. Four possible request types (CF, CSRW, CSRS, CSRC) => 2b request type port.
This is a satisfactory encoding. By redefining cx_imm encoding to be more uniform, like addi (I-type), we simplify cx_imm's implementation cost, and support CX CSRs by adding just one CXU request port (two signal bits).
On further consideration, we redefine the req_func
port's FUNC_ID type to have width CXU_FUNC_ID_W = 1 + CF_ID_W.
When the MSB is 0, req_func
conveys a CF_ID.
When the MSB is 1, req_func[1:0]
conveys a 2b CSR access type (CSRR, CSRRW, CSRRS, CSRRC). This enables CXUs to implement read-only vs. read-write/set/clear CX CSR access semantics.
In all this adds only 1-bit of new control signal, minimizing impact across CXU interconnects, etc.
Background
The spec requires CX multiplexing for conflict-free composition of independently authored composable custom extensions. Here "conflict-free" means each extension may use any custom opcode instructions. With CX multiplexing, we select the hart's current CX and state context prior to issuing custom instructions to that CX. Thus even if two composable extensions use the same custom opcodes for different custom instructions, the fact that the hart's current CX and state context is always selected ensures that the correct custom instruction is performed, by the selected CX, in response to that custom opcode.
CXs may be stateful. Each CX state context is private (isolated) and is only accessed/accessible via custom instructions via CX multiplexing. Also, the spec defines four mandatory custom instructions, cf{read,write}{status,state}, together called IStateContext, that enables uniform CX state context save/restore for any stateful CX.
One of the spec's design tenets is uniformity. A uniform programming model will help the RISC-V custom computing community achieve an ecosystem of reusable CX library software and CX unit hardware.
Some current spec shortcomings with respect to CSRs for CXs
CX scoped custom CSRs
To address these shortcomings, this Issue proposes adding “CX scoped custom CSRs” (CX CSRs) to the spec.
Here are two different ways to do this.
This fixes shortcomings # 1 uniform access and # 3 conflict free CX CSRs, but not # 2 privileged CX CSRs nor #4 (undesirable mandatory stateful CX custom instructions).
Pros: it costs next to zero gates or LUTs into extensible processors that already implement the CX spec. CX CSRs are just more (processor uninterpreted) custom function instructions forwarded to some (selected) CXU. It preserves unchanged the definition of a CX as set of stateful custom instructions (only).
Cons: it introduces a new, redundant set of CSR access instructions that are used only to access CX CSRs. This will add unfortunate downstream work e.g. in developer tools, compilers, debuggers, program analysis tools, ... .
This fixes all the shortcomings listed above.
Pros: it extends the use of custom CSRs to CX CSRs in a clean way. It retains the existing CSR access instructions without introducing another set of them for CXs.
Cons: The definition of a CX must change to be a set of stateful custom instructions and also a set of custom CSRs. (*) It may require changing the CXU-LI to convey not only custom function instruction requests/responses, but now also custom CSR access requests/respones. It may require additional wide multiplexers in the processor datapath to route CSR access instruction fields (e.g. 12b CSR address) into CXU-LI ports.
(*) This is analogous to defining a software interface abstraction as a set of methods / member functions (only), vs. defining it as a set of methods/member functions plus a set of data members.
Note the spec requires that "Attempts to access a non-existent CSR raise an illegal instruction exception." This may be challenging to achieve in the current spec, which does not signal an exception but rather sets an error flag for the analogous error of issuing a custom instruction that is not implemented by a CX.
Also note, any CX CSR access must follow the CSR access ordering rules per the priv spec.
Taking stock of the two options, the clear winner from a clean HW-SW ISA perspective is the second one. Adding uniform, privileged-checked CX CSRs, via existing CSR access instructions, and providing unlimited conflict-free CX CSRs, is a significant improvement over "no uniform support for CX CSRs, roll your own" in the current spec. However we must take care this approach does not inevitably cause expensive new multiplexers into processor datapaths.
Impact of adding CX CSRs upon CXU-LI
Presently CXU-LI provides no means to convey a CSR access to a selected CXU. Here are two different ways to do this.
Don't change CXU Requests and Responses: Here the processor must express the CSR R/W access using existing CXU Request signaling. It could do this using certain reserved CF_IDs corresponding to the various CSR accesses. In other words, even if we adopt option # 2, "multiplexed custom CSR access" as the ISA mechanism, the processor could nevertheless map the CX CSR access into a CX custom function instruction.
Change CXU Requests and Reponses: Extend CXU-LI signaling to explicit represent (signal) CX CSR accesses, distinct from other CX custom function instructions. Rather than add several expensive new ports we might try to share the existing CXU request ports that make sense. The CSR address might be a new 12b port, or it might reuse req_data1[11:0] or extend-and-use req_func[9:0]. The new CSR value, already sourced on X[rs1], might as well arrive on req_data0[]. The 3 CSR access operations csrrw/csrrs/csrrc might be encoded and conveyed via a new 2-bit field req_type (or req_cmd); the fourth value might signal "NOT CSR access" i.e. signaling this is a custom function instruction not a CSR access instruction.
If we adopt this encoding, the hardware cost of extending CXU-LI to carry CX CSR accesses is +2 signal bits per request, + one 12-bit 2-1 mux in the CPU to route the CSR address field into req_data1[11:0]. Note, beside the extra LUTs for this mux, the extra LUT delay and wiring near req_data1[] is painful. For that reason it might indeed be better to convey the CSR address on req_func[] which is after all not anywhere near the critical EX stage register operands, muxes, and ALU.
We must also convey CX CSR address errors. We might reuse cxu_status = CFU_ERROR_FUNC to signal an error that the addressed CSR is not implemented by this CX.
In all, the expected hardware cost of adding CX CSRs to CXU-LI is unfortunate but manageable.
Summary
We recommend adding CX scoped custom CSRs to the CX spec. This should be done by extending CX multiplexing to also multiplex CSR access instructions to custom CSR addresses when a CX state context is selected.
CXU-LI should be extended to explicitly represent and distinguish between CX custom instructions and CX custom CSR accesses, with care, so as to minimize the expected area and frequency impact of the new signaling.