grayresearch / CX

Proposed RISC-V Composable Custom Extensions Specification
Apache License 2.0
66 stars 12 forks source link

waiting until context status changes from xxx -> initial are ready #25

Open ubc-guy opened 8 months ago

ubc-guy commented 8 months ago

When runtime/Os allocates a new context, it must change the state context to Initial.

Also, upon reset, all contexts must start off in the initial state

The CX spec currently states this must initialize all of the underlying hardware state. This could be a multi-cycle operation, during which time the CX+context is unavailable. If multiple contexts for this same CX are all initialized at the same time, it might occur serially to share hardware. Upon hardware reset, this may also be done serially.

This presents an opaque delay to using CX instructions. Presumably, after an application gets its context allocated, it is ready for use right away. Applications will not be "composable" if some implementations of a CX contract are ready immediately after initialization, while others have lengthy delays.

The proper way to handle this is to make these lengthy delays visible to software.

So, some method needs to be added to the CX spec for software to add purposeful "wait until it is ready" delays. This might be done syncronously as part of a cx_open() runtime call, for example. Or, if cx_open() is done asyncronously, there might be an explicity cx_sync() or cx_wait() or similar operation that waits until the CX is ready.

grayresearch commented 8 months ago

This Issue pertains to latency of CX custom instructions in general and the latency of initiailzing one CX state context in particular.

Let's review latency in the CX design at present. The CX mux model (select the CX, issue its custom instruction(s)) may be implemented in hardware in diverse ways. The spec proposes a CPU-CXU DAG and the CXU-LI for how this works. CXU-LI accomodates a diveristy of CPU and CXUs from combinational to fixed latency to variable latency. The fixed latency levels limit the domain of CXs that are implementable. The variable latency levels allow greater flexibility to have some operations sometimes take dozens or hundreds of cycles.

At present there is no upper bound on the latency of a CX custom instruction, and no timeout mechanism. Indeed, a broken or malicious CXU could hang the CPU complex by withholding a CXU response of some instruction (so it never commits in the CPU pipeline).

I recall our Honeywell 66/60 at Waterloo had a ZOPFAULT trap raised on an insruction taking over 2 ms (typically you built a cycle in your pointer-linked indirect addressing mode (!) data structure).

Perhaps CXU-LI should impose an upper DESIGN bound on latency. (It's not clear how to recover from a dynamic stuck CXU request/response pipeline of work in flight.)

Also, as noted in the Rationale talk, https://www.youtube.com/watch?v=7daY_E2itpo&t=979s stateful CXs for long latency custom functions may wish to adopt an async design pattern begin/complete or begin/test-complete*/complete style, to separately launch, poll, later complete, such functions.

OK. Back to the Issue.

I do not agree that different operation latencies necessarily mean CXs are not composable, unless latency is part of their CX contract, which would be possible but unusual.

The Issue requests some affordance in the spec's HW-SW interface to determine that the selected CX is "ready".

Here are three ways to achieve this: 1) add a mandatory stateful CX instruction, get_CX_readiness, or the like -- too expensive, 2) add a busy or ready indicator field to the CX state context status word, so that software can spin wait on the indicator, or 3) define the behavior of IStateContext::cf_read_status() to not complete until the selected CX state context is "ready".

Either 2 or 3 might be used by a CX runtime that values diffferentiating "logically initialized" with "actually initialized".