grayresearch / CX

Proposed RISC-V Composable Custom Extensions Specification
Apache License 2.0
66 stars 12 forks source link

replace IStateContext instructions with CX State Context CX CSRs #28

Open grayresearch opened 6 months ago

grayresearch commented 6 months ago

Background

CX State Contexts: A composable extension may be stateful. Abstractly that means the behavior of a CX instruction may depend upon the history of CX instructions previously issued. More concretely, a CX may have one or more state contexts, each encompassing state machines, registers, register files, RAMs, channels, etc. For correct composition, CX state is private and isolated to a CX. It is not manifest as shared memory. The only way to access, observe, or modify CX state is to issue CX instructions. When CX multiplexing is enabled, the current CX selector identifies the hart's current CX and CX state context. This determines the CX that receives CX (custom) instructions and which state context is accessed.

Stateful CXs are common and therefore merit uniform programming models, uniform resource management, etc.

IStateContext: The Spec (%1.4.4 %2.1 %2.5 %2.6) defines a composable extension basetype IStateContext with four mandatory custom instructions: cf{read,write}{status,state}. Every stateful CX must implement these four instructions. They are used by a resource manager (RM) such as an operating system to uniformly manage, initialize, save, and restore a CX's state context. With this interface an RM can manage and context-switch any CX state context, in other words, CX-agnostic context switching. This works even for CXs that have yet to be designed. This is in contrast to systems where every new bit of architectural state requires OS updates or device drivers.

The spec'd cf_read_status and cs_write_status instructions read and write the CX state context status word. This word's fields:

error: [31:24]
reserved:[23:12]
state_size:[11:2]
cs:[1:0]

provides:

  1. cs: context status, enabling the CX state context to be { off, initial, clean, dirty }, mirroring mstatus.XS (priv spec p.26).
  2. state_size: current size (words) of the serialization of this state context
  3. error: a stateful custom errno (valid when cx_status.CU).

The cf_read_state and cf_write_state instructions read and write specified words of the state context.

As described in %2.6 Resource management and context switching, an RM uses read_status to determine how large the CX's context save record must be; it then uses read_state to read each word of state out of the CX state context. Later on state context reload (restore), the RAM uses write_status to restore the state context state word, and write_state to restore the state context data words. The RM will also use write_status to reinitialize a CX state context to its initial state.

Some current shortcomings of the IStateContext instructions

  1. Unpriv access to CX state context management: There is no privileged access control on CX custom instructions. Once you have selected the hart's current CX / state context, you can invoke any custom instruction, including cf{read,write}{status,state}. In a privileged U/M/S system, e.g., Linux, these instructions are meant for use by the operating system, to perform a context save, and then later, a disciplined context restore from data that was previously context saved. They are not meant for use by buggy or malicious user code to poke random data into a CX state context "to see what happens". We would not want to require that all stateful CXs/CXUs must defend themselves against such misuse. Instead we must deny access to these instructions from U-mode.
  2. The context save/restore interface might be too austere: The current CX-agnostic context save sequence (%2.6) read_status; read_state, and context reload sequence write_status; write_state, may be too austere. For example, these instructions do not express context management events/epochs such as "prepare to save", "save complete", "prepare to restore", restore complete". Although we can infer these events: on read_state at 0 might also be "prepare to save", and read_state of the last state word might also be "save complete." Also there is no way to indicate an error such as accessing context save data words out of bounds.
  3. The context/save/restore interface might be too expensive: The current interface affords random access to the state context data words. Depending upon the specific CX state context, a streaming-only interface (no indexed accesses) might afford a less expensive (fewer circuit resources) implementation.

CX State Context CX CSRs

Assuming we adopt #27, we might replace the four mandatory IStateContext instructions with a few mandatory CX CSRs. For example, cf_read_status, cf_write_status instructions, which R/W the CX state context status word, might be directly replaced by a new S-priv accessible R/W CX CSR, scxct_status. Similarly, cf_read_state and cf_write_state might be replaced by a new S-priv accessable R/W CX CSR, scxct_data.

scxct_status: this CX CSR would carry forward the .cs, .state_size, and .error fields of the %2.5.1 state context status word, and retain the various behaviors and semantics detailed in %2.5.2 cf_read_status and %2.5.3 cf_write_status.

scxct_data: not so straightforward! Both cf_read_state and cf_write_state instructions have an index operand specifying which word of the context data to access. This affords random access to CX state context save data words.

However, there is no way to specify that index with a CSR access instruction. Compare:

# random access               # sequential access?
cf_write_state a0,a1          csrw scxct_data,a1
cf_write_state a2,a3          csrw scxct_data,a3

As discussed above ("too expensive"), perhaps random access to state context save data words is overkill. Instead, imagine an implicit index counter, or other state machine, is kept by each CX state context. That "index" might be reset to 0 / first word by reading or writing scxct_status. Then each success read or write of scxct_data might advance the "index".

Then you can save a CX state context via:

csrr a0,scxct_status       # read state context status word
for each word in a0.state_size:
  csrr a1,scxct_data       # read word of state context
  sw a1,(a2) ...              # save  it

Note this new design does not allow you to:

That may be unacceptable. These use cases can be addressed by reifying the state context's internal data index counter, i.e., add an scxct_index counter CX CSR that determines the next state context data word to be access. So e.g., to restore the 10th, 11th, 12th words of a CX state context, issue:

csrw scxct_index,10
csrw scxct_data,a0      # context[index++] = a0
csrw scxct_data,a1      # context[index++] = a1
csrw scxct_data,a2      # context[index++] = a2
csrr a3,scxct_index // a3 == 13

It is still a bit fragile. Please comment on better ideas for the specific CX State Context CX CSRs.

We also note that since scxct_status is not U-mode accessible, it can no longer be used to access custom error state. The .error field will have to be moved to a new user-mode CX CSR.

CX state front door vs back door

A brief aside to note that there are two access paths to CX state, and how these relate.

The front door, if you will, are the user-mode CX CSRs provided by the CX. A CX might provide CX CSRs to set up operating modes, parameters, vector widths, etc. In addition there are the CX custom instructions themselves, which can also be used to obtain various facets of a CX's state. These front door access paths are specific to that CX.

The back door, in contrast, is the CX State Context CX CSRs. These are used by RMs to initialize, save, restore, and otherwise manage CX state contexts. This back door access path is generic to any CX. The RM must not interpret the context save data it obtains and maintains. Each context save data array is just a blob of bits. (*) The RM should not use CX-specific CX CSRs.

(*) Indeed, as %2.5.1 notes "

The CXU that implements the CX provides CX-specific CX CSRs that change its internal state, but that state, as well as other CX state, is nevertheless serialized and deserialized by RMs using the CX-agnostic CX State Context CX CSRs.

It goes in both forwards and backwards directions. Start: User code sets its CX properties "MyCX-props" with a CX CSR write. Context save: The RM serializes the CX state into a blob, using csrr scxct_status and csrr scxct_state. Somewhere in the blob are the properties the user set. The RM doesn't know or care. ... context switch ... context switch ... etc ... Context restore: The RM deserializes the CX state from the blob, using csrw scxct_status and csrw scxct_state. User code retrieves its "MyCX-props" with a CX CSR read. User code observes the property settings it previously set at 'Start'.

No more reserved CX custom instructions?

Assuming we replace IStateContext, it is possible there is no further need for the reserved CX custom instruction CF_IDs.

%2.5: "CF_IDs 1008-1023 (0x3F0-0x3FF) are reserved for standard custom functions. It is recommended, not mandatory, that these CF_IDs not be used for another purpose. ... Any CF instruction with CF_ID=1023 must be side effect free, i.e., never modify any CXU state."

We have no present use for these now. But if we delete this reservation it will be more difficult to add mandatory CX custom instructions later.

Also note the spec uses cf_read_status to provide a specific way to probe that a selected CXU is actually present:

%2.5.2: "cx_read_status may be used as a probe after a mcx_selector write, to check whether the selector addresses a valid CXU and state context:

csrw mcx_selector,x1 ; select some CXU and state context
csrw cx_status,x0 ; clear cx_status
cx_read_status x0 ; probe, discarding state status word
csrr x2,cx_status ; retrieve cx_status
... ; cx_status.ci => invalid CXU_ID
... ; cx_status.si => invalid STATE_ID

But eliminating cf_read_status we no longer have a CX-agnostic way to do this probe. The analogy would be to

csrw mcx_selector,x1
csrw cx_status,x0
csrr x0,scxct_status ; probe, discarding state status word
csrr x2,cx_status
... ; cx_status.ci => invalid CXU_ID
... ; cx_status.si => invalid STATE_ID

except that stateless CXs need not implement scxct_status CX CSR, and scxct_status is not accessible to user code!

Oops. Not sure what to do here. I think the cleanest thing to do is split scxct_status into two CX CSRs. One would be a user-accessible R/W custom CX CSR that carries e.g. the custom error number. The second would be a system-accessible R/W custom CX CSR used for CX context managment. Hmm.

Summary

We recommend replacing IStateContext instructions with CX State Context CX CSRs. The CX State Context CX CSRs are mandatory for all stateful CXs. These CSRs addresses can be specified so as to deny state context management access to user mode code.

The specific CX State Context CX CSRs should include a state context status CSR and a state context data CSR. We may require additional CX CSRs to specify phases of context save or restore, or to specify random access into state context data. TBD.

We recommend eliminating the 16 reserved CX custom instructions. Unclear if scxct_status is mandatory for stateless CXs.

grayresearch commented 5 months ago

On further consideration, I propose these replacments for IStateContext custom functions:

Stateful CX CSRs

  1. cxs_error -- user read/write 32b errno
  2. scxs_status -- system read/write status word (like cf_read_status but now sans cxs_error)
  3. scxs_index -- system read/write index to state context data words
  4. scxs_data -- system read/write state context data words at scxs_index
  5. scxs_data_incr -- like scxs_data but access also increments scxs_index

Guy requests a new scxs_status bit to indicate whether the selected CX state context is online and able to receive custom instructions or is paused or in a process of being saved or reloaded during a CX state context swithc.

CSR reads / read-writes of of scxs_data are idempotent. Not so, scxs_data_incr. The latter speeds up save and reload of multi-word state context save data.

In addition, Guy has requested uniform optional means to interrogate the current CX's CX_GUID and CXU_GUID. I propose: Stateless or stateful CX CSRs -- all of these are read-only user custom CSRs:

  1. cx_guid0 -- first 32b of CX_GUID -- IDs the CX contract/version of the selected CX
  2. cx_guid1 -- next 32b of CX_GUID
  3. cx_guid2 -- next 32b of CX_GUID
  4. cx_guid3 -- last 32b of CX_GUID
  5. cxu_guid0 -- first 32b of CXU_GUID -- IDs the specific vendor/version of CXU that implements the selected CX
  6. cxu_guid1 -- next 32b of CXU_GUID
  7. cxu_guid2 -- next 32b of CXU_GUID
  8. cxu_guid3 -- last 32b of CXU_GUID

In discussion above, we note we need a replacement for a uniform "probe" sequence to determine if mcx_selector's CXU_ID and STATE_ID are valid. If cx_guid0 custom CSR is compulsory then csrr x0,cx_guid0 is a fine CX selector probe instruction.

If compulsory it should be OK to return 0.0.0.0 for cx_guid and for cxu_guid.

I'm not sure we need to XLEN-ize/32b/64b these.

grayresearch commented 5 months ago

"Simple, frugal, fast" is an important CX design tenet. How does this redesign of IStateContext now with 5 32b custom CSRs, and then adding another 8 read-only CSRs comport with that?

  1. An implementation may implement as many or as few bits of cxs_error as it requires, down to 0 bits.
  2. An implementation may implement as many or as few bits of scxs_index as it requires, down to 0 bits.
  3. A stateful CX with one word of state (a dot product accumulator for example) may implement a 0-bit scxs_index and may implement scxs_data and scxs_data_incr reads (or writes) to respond with (or resp, to reload) the current accumulator register.

As for the read-only CSRs, if cx_guid and cxu_guid are mandatory CX CSRs, but may read as 0, we just need a way for any CXU to respond with zero without requiring that every CXU contain a separate copy of logic to generate a 0 value on resp_data.

If cx_guid* is not mandatory you can still do a mcx_selector probe with this sequence

csrw cx_status,x0
csrr x0,cx_guid0    ; probe!
csrr x1,cx_status
; if x1==0 or x1.IF, the selector is valid.
; if x1.IC or x1.IS, the selector is invalid.

Not as efficient as just testing x1 == 0 though.

grayresearch commented 4 months ago

Should there be a way for a stateful CX to indicate "I am NOT serializable" or even "I am NOT serializable AT THIS MOMENT" in the read-only system composition information (i.e. CXU Map) or in the CX state context CSRs?

grayresearch commented 4 months ago

In this ongoing redesign with CX CSRs, let's discuss some issues of CX state context initialization and reinitialization.

Background

(First note that since each CX state context is isolated, the state of a CX is usually determined solely by the prior series of CX operations issued since initialization. In some cases, but not always, a CX is deterministic in that the same series of CX operations since initialization always produces the same series of results.)

The spec proposes CX-agnostic CX state context management. The acid test is that a CX-aware OS can manage the CX state context of any CX without change, and without any CX-specific code. This requires the OS to access configured-CX-specific data and behavior, perhaps provided statically in the CXU Map, perhaps provided dynamically by the CXU hardware.

Context management includes context save and context restore, but it also includes context initialization / reinitialization.

Presently the spec provides CX-agnostic initialization via the .CS field of the CX state context status word, accessed by the IStateContext custom function instructions cf_{read,write}_status. Initialization: "On system reset, each state context of a serializable stateful extension CXU is in the initial state." Reinitialization: "A write .cs=1 has the side effect of resetting the entire current state context to its initial (power up) state."

This provides simple CX-agnostic initialization, but there are several problems:

  1. This is not how the FS, XS, VS fields of mstatus work when set to Initial. For example, the priv spec notes "nor does setting FS=Initial clear the contents. Rather (per priv spec) software must perform an extension-specific set of instructions to re-initialize the extension state context. But this (extension-specific code) is not what we want for CX.
  2. Setting .cs=1 requires the CXU implementing the CX to initialize a potentially large CX state context. In general this may take 1, 10, 100, 1000, ... clock cycles. Whether writing .cs=1 blocks for 1000 cycles, or whether it completes while initialization is still in progress but subsequent operations against that CX state context first wait for the initialization to complete, a large latency may be problematic (interrupts get lost, for example).

(TODO: We need a new Issue tracking configurable maximum latency of any CX operation.)

Proposals

  1. Init as just another context reload: Initialization could be provided as a context-reload of a distinguished state context blob. For example, context-reload of all 0s, or all ~0s, could be defined to be the initial state of any CX state context. This would make context initialization latency always equivalent to context reload latency, proportional to the size of the context save blob.

Note that for every CXU that requires O(size) cycles for initialization, there will be others with "flash clear" in O(1) cycles, using flop resets, seqnum-tagged structures, or other methods. For a large CX state context, flash clear initialization may be the difference between a CX code speedup and a code slowdown. The spec should support both O(size) and O(1) type initialization CXUs.

  1. Init status + busy bits: As with the present spec's state context status word .cs=1, initialization could be provided, CX-agnostically, by writing scxs_status.cs=init=1. CSR-writing this would start initialization but would not block on its completion. A new read-only scxs_status.busy status bit could signal to software that CX state context initialization is underway but has not completed. Software could spin-wait until !scxs_status.busy -- and perhaps do something else useful in the meantime. This would allow software to request a (hopefully O(1)) CX state context flash clear init, but also cope with an O(size) initialization when necessary, and perhaps do something else useful while it waits for !busy.

By keeping initialization as the sole responsibility of a CXU, operating autonomously and asynchronously to the CPU, it spares the CPU from wasteful energy sending O(size) CXU requests and receiving O(size) CXU responses, as a context reload would. (It is not perfect as there is still the energy of !scxs_status.busy spin-waiting. Perhaps a new wait instruction could sleep pending any CXU operation becoming unbusy. (Ugh!) This would require a small change to CXU-LI.)