grayresearch / CX

Proposed RISC-V Composable Custom Extensions Specification
Apache License 2.0
66 stars 12 forks source link

a new CX ABI for composability, compatibility, security, performance #29

Closed grayresearch closed 6 months ago

grayresearch commented 6 months ago

Background

The CX ABI ensures correct composition of independently authored CX libraries under CX multiplexing.

CX multiplexing operates by setting the hart's CX selector (mcx_selector write), or setting it indirectly (cx_index write), to identify a specific CX and CX state context, prior to issuing custom instructions or accessing custom CSRs (together, "custom operations"). The hart's current CX selector determines which CX, or "legacy custom", will perform the custom operation.

Per %2.2.1, mcx_selector.version determines wheter CX multiplexing is enabled: "When version=0, disable composable extension multiplexing. The rest of mcx_selector is ignored. No CXU is selected. Custom-[0123] instructions execute the CPU’s built-in custom instructions. / When version=1, enable version-1 composable extension multiplexing. The cxu_id and state_id fields select the current CXU and state context. Custom-0/-1/-2 instructions issue CXU requests to the CXU identified by cxu_id and to the state context identified by state_id."

So while running CX software on a hart, software will write the CX selector and issue custom operations to it. If software needs to use legacy custom operations the CX selector must be in legacy mode (CX multiplexing disabled)!

Since application software is composed of dozens or hundreds of separately authored, sometimes separately versioned libraries, we require an application binary interface (ABI) that ensures dependable disciplined use of the thread's shared CX selector, so that whenever software performs custom operations, these are performed against exactly the expected CX or legacy custom.

The currently spec'd provisional CX ABI (%1.4.5) manages the CX selector with a callee-save discipline. The hart is initialized with a mcx_selector of 0. "All CXU CSRs are initialized to zero on reset." so that legacy custom mode is in effect. Then any code that writes CX selector must save the old value and restore it upon return (or any stack unwind!) from that code. This discipline has the advantages that:

  1. It provides correct nested composition of CX libraries. If CX A lib selects CX A, performs A custom operations then calls lib B, which selects CX B and performs B custom operations, B must re-select the previous selection (CX A) prior to returning to lib A, which can then happily perform more A custom operations.
  2. Code which does not change selectors need not save or restore selectors.
  3. Compared to caller-save, wherein a CX library defensively re-selects its CX after every function call out, the CX library trusts the transitive callees to restore its CX selection. So arguably it minimizes the number of CX selector writes in a given code path.

Issues with the provisional callee-save CX ABI

  1. Breaks legacy custom code. Under callee-save you must always select your CX (or legacy custom) prior to issuing custom operations. However preexisting legacy custom code, and legacy compilers, both predating CX, do not include the required CX selector writes to select legacy custom behavior, nor a matching CX selector write to restore the caller's selection. So any explicit or implicit use of a legacy custom operation may instead forward to some other selected CX -- a disaster! In summary, callee-save discipline is incompatible in general with legacy libraries and compilers that use legacy custom operations at will. Since legacy custom code may appear almost anywhere, it follows the usual, default, ambient CX selection should be "CX mux = disabled / legacy custom operation".
  2. Wrong trust model. Callee-save CX selection means you trust code you do not control to preserve your current CX selection. In general any C function capable ABI assumes that a callee will not corrupt the stack or do other undefined behavior that corrupts the caller, but CX multiplexing is a very sharp knife, and if the callee violates the callee-save ABI and returns with a different CX selector in place, the caller may then perform unboundedly undefined custom operations by issuing custom operations to the wrong CX or CX state context. In applications comprising separately authored, separately versioned libraries, it is not possible to inspect the transitive call graph from a CX library to ensure callers preserve the CX selection as required. A more secure, more defensive, more amenable to program analysis ABI will require a greater degree of paranoia in each CX library, assuming that callees do not preserve the current CX selection.

A new CX ABI for composability, compatibility, security, performance

Tenets / competing goals:

  1. Support composition of libraries and nested composition of libraries.
  2. Support legacy custom code that does not pre-select legacy mode.
  3. Minimize the CX selection "trust surface" to that of the current function (or perhaps, current library).
  4. Minimize the number of CX selector writes.

Here is the proposed new CX ABI.

  1. On reset, the thread's CX selection is legacy mode.
  2. On entry to a function, and after each function call, a thread's CX selection is UNDEFINED.
  3. Code MUST select its CX prior to issuing its CX custom operations.
  4. Such code (that selects a CX) MUST select legacy mode prior to calling a function, returning, or stack unwinding.
  5. Code SHOULD select legacy mode prior to issuing legacy custom operations.

This ABI endeavors to maintain an ambient "legacy mode" CX selection when not actively issuing CX custom operations. This ensures to the greatest extent possible that legacy custom code, unaware of CX multiplexing, and lacking the code to select legacy custom mode, nevertheless always operates in legacy custom mode.

For CX libraries, this code supports composition and nested composition. Composition works because (rule 3) each library selects its CX prior to issuing its custom operations. Nested composition also works, because, after following a function call (rule 2), the caller must re-select its CX (rule 3) prior to issuing additional custom operations:

  1. CX A lib sets CX selection to CX A, issues A operations
  2. CX A lib sets CX selection to legacy mode, calls CX B lib
  3. CX B lib sets CX selection to CX B, issues B operations
  4. CX B lib sets CX selection to legacy mode, returns
  5. CX A lib sets CX select to CX A, issues more A operations
  6. CX A lib sets CX selection to legacy mode, returns.

Also, all is well when a CX A lib calls legacy custom code:

  1. CX A lib sets CX selection to CX A, issues A operations
  2. CX A lib sets CX selection to legacy mode, calls legacy lib
  3. legacy lib issues its legacy custom operations
  4. legacy lib returns
  5. CX A lib sets CX select to CX A, issues more A operations
  6. CX A lib sets CX selection to legacy mode, returns.

Rule 4 helps ensure that following a brief excursion into a CX lib which changes the CX selection, we immediately return to legacy mode in case we encounter selection-less legacy custom code.

There is still an attack surface caused by malicious code violating the ABI by selecting some CX (i.e. not legacy custom mode) and then calling selection-less legacy custom code, which issues custom operations which are not the legacy custom operations it intends causing unboundedly undefined behavior.

Rule 5 helps defend against this. To the extent practical or necessary, legacy custom code should be compiled defensively to set legacy custom mode on entry and after function calls (rule 2) prior to issuing its custom operations.

Unlike the provisional callee-save ABI, these rules will incur unnecessary CX selection writes and will give up a little bit of performance (which after all may be the reason for using that CX in the first place.) For example, in our CX A lib + CX B lib nested example above, the CX selector writes at step 2 and 4 (at least) are unnecessary. It certainly possible for a CX enlightened compiler+linker to analyze control flow within a monolithic CX library and optimize the generated code by eliding provably unnecessary defensive CX selector writes.

Impact on the CX API / CX Programming Model

In lieu of compiler support, CX multiplexing requires explicit CX selection settings and unfortunately requires programmers explicitly follow this CX ABI.

Consider the following sketch of a Linux user-mode CX API:

mycx::open() {
  cx = cx_open(MYCX_GUID, …);   // discover if CX present, allocate a state context
}
mycx::close() {
  cx_close(cx);         // release it
}
mycx::doit() {
  if (cx < 0) return sw(…); // pure software if CX absent
  cx_select(cx);        // csrw cx_index,cx // (Rule 3) select MYCX
  for (…)
    c[i] = cx_reg(FN,a[i],b[i]); // custom-0 rd,FN,rs1,rs2
  cx_select(0);         // csrw cx_index,zero // (Rule 4) select legacy mode
  ...
}

Here the CX API guarantees that CX selector index 0 designates legacy mode, and this makes it straighforward for the CX lib doit() function to obey the CX ABI.

If CX multiplexing is combined with nested call outs, the caller must repeatedly re-select its CX:

mycx::doit2() {
  if (cx < 0) return sw2(…);    // pure software if CX absent
  for (…) {
    cx_select(cx);      // csrw cx_index,cx // (Rule 3) select MYCX
    c[i] = cx_reg(FN,a[i],b[i]); // custom-0 rd,FN,rs1,rs2
    d[i] = func(c[i]);      // (Rule 2)
  }
  cx_select(0);         // csrw cx_index,zero // (Rule 4) select legacy mode
  ...
}
ubc-guy commented 6 months ago

in the last code example,it should be clear that func(..) is actually mycx::func(..) rather than legacy::func(..)

grayresearch commented 6 months ago

That is not my intention. In the last example, func() is any function. Might be in the class, might be in the CX library, might be elsewhere. Might use legacy custom instructions, might use MYCX custom instructions, might do neither. It doesn't matter. Rule 2 says we always reset cx_index to 0 (legacy mode) before calling any function and assume it is trash after returning from any function.

In general with separately versioned dynamic libraries typical in real world Linux code, even if you think you know what func() is or does, func() (transitively) may change after you test and ship your code.

Although as noted above, you may be able to do some CX ABI interprocedural optimizations within your monolithic library.

BrandonFrei commented 6 months ago

Regarding rule 2, is it possible to add a note that says this is the case for : "On entry to a function, and after each function call that is not a cx_reg / cx_imm / cx_flex, a thread's CX selection is UNDEFINED.". Otherwise, without the example, I would think that it would be needed to do something like this:


for (…)
    cx_select(cx);      // csrw cx_index,cx // (Rule 3) select MYCX
    c[i] = cx_reg(FN,a[i],b[i]); // custom-0 rd,FN,rs1,rs2

I think the confusion comes from the fact that in the spec we call these (cx_reg / cx_imm / cx_flex) custom functions - so in my mind, each time we call them we're calling a function and need to perform cx_selection.

Maybe I'm alone in my interpretation of this.

grayresearch commented 6 months ago

LOL! Well each call of certain CX APIs (including cx_select, cx_reg, cx_imm, cx_flex, cx_status) should be inlined into the caller and become a single CSR[RW] or a single custom instruction. Once this happens there is no actual function call -- and if there is no concrete function call there is no need to restore basis (legacy) mode, and no need to assume the callee has killed the caller's CX selection (because there is no callee).

Alternately and equivalently we could amend /complicate the CX ABI rules to read: "function calls (except CX API function calls)". But I am happy to leave the rules as they are, simple.

Note the termology "custom function instruction" in the spec was introduced early on as a way to distinguish an ordinary custom instruction (which can do anything) from the verry narrowly scoped and constrained subset of custom instructions that can be used by composable extensions and their Custom Function Units. See custom function in %1.3 and %2.1. That is, CFUs modules that implement custom function instructions However, for reasons, in 2023 we renamed CFUs to CXUs, without retiring the name "custom function". We also have CF_IDs custom function IDs, etc. in the CXU-LI.

If we wish to retire the spec term "custom function instruction", we still need a good name for the constrained custom instruction(s) of a composable extension. Sometimes I write "CX custom instructions" but that's a tiresome mouthful. Perhapes "CX instructions" is the way. If they are CX instructions, they are custom instructions.

BrandonFrei commented 6 months ago

We propose new rules (TO BE EXPANDED): 6: cx identifier to know that we don't have to call to cx_select to select the cx_selector after entering a function - it is assumed to be called with a valid selector, and may use custom instructions.