Closed grayresearch closed 6 months ago
in the last code example,it should be clear that func(..) is actually mycx::func(..) rather than legacy::func(..)
That is not my intention. In the last example, func() is any function. Might be in the class, might be in the CX library, might be elsewhere. Might use legacy custom instructions, might use MYCX custom instructions, might do neither. It doesn't matter. Rule 2 says we always reset cx_index to 0 (legacy mode) before calling any function and assume it is trash after returning from any function.
In general with separately versioned dynamic libraries typical in real world Linux code, even if you think you know what func() is or does, func() (transitively) may change after you test and ship your code.
Although as noted above, you may be able to do some CX ABI interprocedural optimizations within your monolithic library.
Regarding rule 2, is it possible to add a note that says this is the case for : "On entry to a function, and after each function call that is not a cx_reg / cx_imm / cx_flex, a thread's CX selection is UNDEFINED.". Otherwise, without the example, I would think that it would be needed to do something like this:
for (…)
cx_select(cx); // csrw cx_index,cx // (Rule 3) select MYCX
c[i] = cx_reg(FN,a[i],b[i]); // custom-0 rd,FN,rs1,rs2
I think the confusion comes from the fact that in the spec we call these (cx_reg / cx_imm / cx_flex) custom functions - so in my mind, each time we call them we're calling a function and need to perform cx_selection.
Maybe I'm alone in my interpretation of this.
LOL! Well each call of certain CX APIs (including cx_select, cx_reg, cx_imm, cx_flex, cx_status) should be inlined into the caller and become a single CSR[RW] or a single custom instruction. Once this happens there is no actual function call -- and if there is no concrete function call there is no need to restore basis (legacy) mode, and no need to assume the callee has killed the caller's CX selection (because there is no callee).
Alternately and equivalently we could amend /complicate the CX ABI rules to read: "function calls (except CX API function calls)". But I am happy to leave the rules as they are, simple.
Note the termology "custom function instruction" in the spec was introduced early on as a way to distinguish an ordinary custom instruction (which can do anything) from the verry narrowly scoped and constrained subset of custom instructions that can be used by composable extensions and their Custom Function Units. See custom function in %1.3 and %2.1. That is, CFUs modules that implement custom function instructions However, for reasons, in 2023 we renamed CFUs to CXUs, without retiring the name "custom function". We also have CF_IDs custom function IDs, etc. in the CXU-LI.
If we wish to retire the spec term "custom function instruction", we still need a good name for the constrained custom instruction(s) of a composable extension. Sometimes I write "CX custom instructions" but that's a tiresome mouthful. Perhapes "CX instructions" is the way. If they are CX instructions, they are custom instructions.
We propose new rules (TO BE EXPANDED): 6: cx identifier to know that we don't have to call to cx_select to select the cx_selector after entering a function - it is assumed to be called with a valid selector, and may use custom instructions.
Background
The CX ABI ensures correct composition of independently authored CX libraries under CX multiplexing.
CX multiplexing operates by setting the hart's CX selector (mcx_selector write), or setting it indirectly (cx_index write), to identify a specific CX and CX state context, prior to issuing custom instructions or accessing custom CSRs (together, "custom operations"). The hart's current CX selector determines which CX, or "legacy custom", will perform the custom operation.
Per %2.2.1, mcx_selector.version determines wheter CX multiplexing is enabled: "When version=0, disable composable extension multiplexing. The rest of mcx_selector is ignored. No CXU is selected. Custom-[0123] instructions execute the CPU’s built-in custom instructions. / When version=1, enable version-1 composable extension multiplexing. The cxu_id and state_id fields select the current CXU and state context. Custom-0/-1/-2 instructions issue CXU requests to the CXU identified by cxu_id and to the state context identified by state_id."
So while running CX software on a hart, software will write the CX selector and issue custom operations to it. If software needs to use legacy custom operations the CX selector must be in legacy mode (CX multiplexing disabled)!
Since application software is composed of dozens or hundreds of separately authored, sometimes separately versioned libraries, we require an application binary interface (ABI) that ensures dependable disciplined use of the thread's shared CX selector, so that whenever software performs custom operations, these are performed against exactly the expected CX or legacy custom.
The currently spec'd provisional CX ABI (%1.4.5) manages the CX selector with a callee-save discipline. The hart is initialized with a mcx_selector of 0. "All CXU CSRs are initialized to zero on reset." so that legacy custom mode is in effect. Then any code that writes CX selector must save the old value and restore it upon return (or any stack unwind!) from that code. This discipline has the advantages that:
Issues with the provisional callee-save CX ABI
A new CX ABI for composability, compatibility, security, performance
Tenets / competing goals:
Here is the proposed new CX ABI.
This ABI endeavors to maintain an ambient "legacy mode" CX selection when not actively issuing CX custom operations. This ensures to the greatest extent possible that legacy custom code, unaware of CX multiplexing, and lacking the code to select legacy custom mode, nevertheless always operates in legacy custom mode.
For CX libraries, this code supports composition and nested composition. Composition works because (rule 3) each library selects its CX prior to issuing its custom operations. Nested composition also works, because, after following a function call (rule 2), the caller must re-select its CX (rule 3) prior to issuing additional custom operations:
Also, all is well when a CX A lib calls legacy custom code:
Rule 4 helps ensure that following a brief excursion into a CX lib which changes the CX selection, we immediately return to legacy mode in case we encounter selection-less legacy custom code.
There is still an attack surface caused by malicious code violating the ABI by selecting some CX (i.e. not legacy custom mode) and then calling selection-less legacy custom code, which issues custom operations which are not the legacy custom operations it intends causing unboundedly undefined behavior.
Rule 5 helps defend against this. To the extent practical or necessary, legacy custom code should be compiled defensively to set legacy custom mode on entry and after function calls (rule 2) prior to issuing its custom operations.
Unlike the provisional callee-save ABI, these rules will incur unnecessary CX selection writes and will give up a little bit of performance (which after all may be the reason for using that CX in the first place.) For example, in our CX A lib + CX B lib nested example above, the CX selector writes at step 2 and 4 (at least) are unnecessary. It certainly possible for a CX enlightened compiler+linker to analyze control flow within a monolithic CX library and optimize the generated code by eliding provably unnecessary defensive CX selector writes.
Impact on the CX API / CX Programming Model
In lieu of compiler support, CX multiplexing requires explicit CX selection settings and unfortunately requires programmers explicitly follow this CX ABI.
Consider the following sketch of a Linux user-mode CX API:
Here the CX API guarantees that CX selector index 0 designates legacy mode, and this makes it straighforward for the CX lib doit() function to obey the CX ABI.
If CX multiplexing is combined with nested call outs, the caller must repeatedly re-select its CX: