secure deallocations from cx_table

adding a new allocation to cx_table is clear and secure. returning the index of the new allocation is fine.

when an entry in the cx_table is to be de-allocated, we have problems. it seems we must mark the entry as invalid, and trap on use to detect errors. however, this means the cx_table will quickly fill if there is any expectation from software to allocate/deallocate.

further, if we ever mark a cx_table entry as valid again (reusing the same entry contents, ie same context id), then do we need to make sure that any past cx_index values cannot be again reused? or does the OS simply clean the context state (security reasons)? can an application allocate cx_values, and then deallocate them, only to have them allocated by another process causing a security leak when the original application attempts to reuse deallocated cx_values?

this part of the security model is flawed.

I disagree this issue identifies some security flaw within the privileged access control design in the spec.

The spec defines bare mechanism, but not policy, nor user programming models, for privileged access control to CX state contexts. Also it briefly covers CX resource managers and what happens at context save/restore with respect to CX state contexts. Absent much more detail on these topics, it is easy to conclude there is some security flaw. But that depends on how software programming models apply the bare mechanism, and how operating systems implement these programming models.

First, what must happen when a privileged resource manager (e.g., an OS) must securely revoke access from a user process or thread or hart to a CX state context that it had previously granted and provided indirectly to the user software as a CX selector index?

The OS writes an invalid selector value into the corresponding entry in the hart's CX selector table (or possibly tables plural if access to the CX state context has been granted across multiple harts).
If the hart's (or any harts') current CX selection (i.e., cx_index) is being revoked, the OS also writes an invalid selector value into such hart's mcx_selector CSR.

If the CX state context access revocation is done synchronous to the user thread (e.g. system call or interrupt) this is easy. But if the revocation originates asynchronously to the hart, e.g. from another thread, the hart must be interrupted to do step no. 2. Indeed, revoking access to a CX state context is similar to page table entry changes, and in the worst case may require the analog of a TLB shootdown -- depending upon whether the CX programming model allows a CX state context to be shared by multiple threads within one process, and/or allows a thread that is using CX state context(s) to be rescheduled to a different hart.

Anyway, once the mcx_table(s) entry and the hart(s) mcx_selector CSR are set to the invalid selector, as described, any subsequent attempt to access the revoked CX state context, or to reselect (csrw cx_index,index) and access the revoked CX state context, all fail, setting an error bit in cx_status as desired. Once access is revoked, no access is possible. No security flaw here.

Now the issue above seeks "trap on use". The current spec does not ever trap on any CX error. The CX error mechanism is currently to set cx_status error bits. This option instead of taking an exception was intentionally adopted to keep the CX hardware simple and frugal. That said, please see issue #19 where this design is being reconsidered.

If we ever do trap on CX error, the revocation process described above will also trap on first use (first custom instruction, first custom CSR access) after revocation, as desired.

The next part of the issue pertains to reclaimation and reuse of CX selector index values. Again, this is a policy rather than mechanism issue.

First note that the spec'd mechanism is per-hart cx_tables. Not per system, per process, or per thread. (You can share them across threads in a process or across processes in a system, but you don't have to. Mechanism vs. policy.) So we have to be careful and precise when we talk about "allocated to another process" because each process will (best practice) have its own distinct per-thread per-hart mcx_tables and thus its own disjoint scope of CX selector indices.

Moving on, in one reasonable programming model, OS managed CX selector indices are analogous to OS managed file descriptors. The programming model for these is: open a file, receiving a file descriptor. Perform file operations, specifying the file descriptor. Deal with errors. Close the file, releasing the file descriptor. Be not surprised when a subsequent open file call returns a previously released file descriptor value. And if you never close file descriptors you opened, eventually you will run out of file descriptors.

Similarly, for CX state contexts, the programming model might be: open (request) a CX or a particular CX ID, receiving a CX selector index that identifies that CX and its state context. Perform CX custom instructions (first selecting that CX by CSR-writing cx_index). Deal with errors (CSR-read cx_status). Close (deallocate) the CX, releasing its CX selector index. Be not surprised when a subsequent request to open some CX returns a previously released CX selector index. And if you never close CX selector indices you opened, eventually you will run out of CX selector indices.

I do not believe whether the programming model for resource management is cooperative or adversarial (I have it til I release it vs. the resource manager (OS) can take it away from me at any moment) changes any of this. You can lose access to a CX state context and be left with holding a tombstoned CX selector index (note: with no access to the CX state context you used to be able to access), an index that is nonetheless yours until you return it to the OS. And just as with recycled OS file descriptor values, there is no surprise, no problem, and definitely no security flaw when a subsequent request for a CX returns a recycled CX selector index.

In summary, I believe the right programming models and the right OS implementations can use the spec'd mcx_table / cx_index mechanism to provide the desired privileged access control including secure revocation of access.

The issue also talks of reuse of cx_values (CX selector values?). Per-hart mcx_table entry indices can be reused, as discussed above. But CX state contexts themselves (and hence their CX selectors) can also be reused and time-multiplexed. Consider a simple system with one CX/CXU with one state context, and one hart, running a multiprogrammed OS, scheduling two software threads to the one hart. It is possible for the two threads to each enjoy multiprogrammed access to the one CX state context.

The first thread is scheduled to the hart and requests access to the CX. The OS grants this, initializes (resets) its CX state context # 0 (IStateContext::cf_write_status), sets up the hart's mcx_table to address the first thread's CX selector table with entry # 0 for that cxu_id=0 and state_id=0, and returns CX selector index # 0. The thread selects that CX (csrw cx_index, index) and peforms various CX custom instructions.

The timeslice expires. The OS saves the first thread context, including the thread's CXs' state contexts state (IStateContext::cfread{status,state}) and its current cx_index and cx_status.

The second thread is scheduled to the hart and requests access to the same CX. The OS grants this, initializes (resets) its CX state context # 0 (IStateContext::cf_write_status), sets up the hart's mcx_table to address the second thread's CX selector table with entry # 0 for that cxu_id=0 and state_id=0, and returns CX selector index # 0. The thread selects that CX (csrw cx_index, index) and peforms various CX custom instructions.

The timeslice expires. The OS saves the second thread context, including the second thread's CXs' state contexts state (IStateContext::cfread{status,state}), and its current cx_index and cx_status.

The OS reschedules the first thread to the hart. It reloads the thread context from the save area, including reloading the thread's CXs' state contexts' state (IStateContext::cfwrite{status,state}). It sets the hart's mcx_table to once again address the first thread's CX selector table. It reloads the hart's cx_index (and thus mcx_selector) and cx_status. Context reload complete, it resumes execution of the second thread.

(The fact that both threads' mcx_tables happen to dynamically address the same CX state context with the same valid CX selector value, and using the same CX selector index, is a coincidence, and at any rate is benign due to the way the OS manages CX state contexts across context switches.)

The previous comment described dynamic reuse/sharing of CX state context as part of OS thread multiprogramming . But different resource managers may perform CX state context reuse and multiprogramming in other situations.

For example, if a given CX provides fewer state contexts than harts, if threads running on every hart all request that CX, the OS may decide to fail requests when CX state contexts are oversubscribed (a more cooperative resource management policy). Or it may grant a CX state context to each new request by unilaterally revoking some previous grant -- a more adversarial resource management policy. (Either way, when a CX state context is reused by a different thread it must be initialized (reset) or reloaded from a previously saved context record so there is no potential leak of CX state context across trust domains.) Both cooperative and adversarial resource management models have been used previously. The CX privileged access control mechanism must support both.

At the other extreme, when a system is configured with a CX, itself configured with more state contexts than the system has harts, this allows for example, more efficient context switching. In a system with one CX with two state contexts, one hart, and two software threads using the CX, it may not be necessary to laboriously save and restore the CX state context data when context switching from one thread to another on the hart. Instead, each software thread might just use a separate CX state context, only switching mcx_tables and mcx_selector values on a context switch.

Another comment that may be helpful. Depending upon policy, the CX user programming model on Linux-class systems might be that the CX selector indices obtained from the OS are thread-scoped, or process-scoped. If thread-scoped, that means the indices retrieved by one thread are meaningless/wrong on another thread in the process. In this case, each thread scheduled to a hart would use a unique disjoint CX selector table. If process-scoped, indices retrieved by one thread would be usable by other threads in the process. In this case, each thread of a process, scheduled to a hart, would share the same per process CX selector table. (Process scoped CX selector indices behave more analgously to OS file descriptors, which are process scoped: if one thread opens a file and shares the file descriptor with another thread in the process, either or both of the threads can perform file I/O using the file descriptor.) Process scoped CX selector indices make it easier for software to intentionally (or accidentally!) share a CX state context. This may be good or bad. Thread scoped CX selector indices require a programming model that mandates finer grained CX state ownership (per-client or per-thread state data structures) vs. process coped CX selector indices which may be kept and shared more simply in "global" data structures. Thread scoped CX selector indices do not preclude sharing a CX state context if that is the desired use case. (Example: a CX that provides event counter state that we wish to share across threads of a process.) It just requires that each thread request the specific CX state context from the OS, each is granted its own CX selector index to the same CX state context. (Behind the scenes, the same specific CX selector value for the specific shared CX state context will found in both threads' harts' CX selector tables and (at times) in both hart's mcx_selector CSRs.) The spec'd limited size of the CX selector tables (=1024 entries) reflects past design assumptions that CX selector indices would be thread scoped. If process scoped, 1024 entries per process could be limiting in high-hart-count systems with many CXs times many threads.

There is much to learn from the comments in this this Issue thread but TL/DR there is no flaw in the CX security model, albeit preivously in the spec there was no clean way to force a trap on use of an OS-reclaimed CX selector index.

Now PR #32 adds mcx_selector.cte, custom operation trap enable, to CX selectors, so that an OS may tombstone a closed/released selector index by setting cte=1 on the corresponding selector value in mcx_table[]. If the user thread subsequently reselects this tombstoned CX selector index, then issues a custom operation, this raises an illegal instruction exception. The exception handler can exit the process, or take some other action, as appropriate.

grayresearch / CX

secure deallocations from cx_table #20