riscv / riscv-fast-interrupt

Proposal for a RISC-V Core-Local Interrupt Controller (CLIC)
https://jira.riscv.org/browse/RVG-63
Creative Commons Attribution 4.0 International
237 stars 49 forks source link

enhance support co-routines and thus allow level change already saved registers. #106

Open David-Horner opened 3 years ago

David-Horner commented 3 years ago

enhance support co-routines and thus allow level change already saved registers. can be used horizontally application can check that co-routine did not clobber the registers reserved for itself one halt loop suffices for multiple levels current code assumes level 0 and pause for horizontal interrupt? riscv/riscv-fast-interrupt@2fc1965

scenarios : 1) define code section that can be re-runnable even if interrupt clobbers a specified set of registers co-routine code <> set redoable flag on register set q1 1l: reset redo flag for q1 .... execute re-runable code segment .... if redo flag for q1 is set branch to 1l reset redoable flag for q1 <>

interrupt co-routine handler code save non-q1 registers that are used in handler if redoable flag clear save q1 registers ‘’’’ execute some interrupt handler code ‘’’’ if redoable q1 flag was set on entry and q1 register set modified set redo flag for q1
if redoable q1 flag not set on entry restore q1 saved registers restore non-q1 save registers return

Uses include a) reducing code segments that need interrupts disbled b) reducing interrupt save/restores

Specifically this code is applicable to nested interrupt handlers.
2) on return from interrupt handler allow saved register set to be reused if immediately followed by an interrupt. In the examples below the necessary modifications to the standard process are defined.

Note: The exact order relative to each other and the existing functionality is sub-optimal and would be incorrect for some implementations. For example, in 2a reset of reusable q1 flag in handler prolog could be performed co-incident with the read of its state.

Note: In practice, the discreet functionality will need to be reordered to avoid special cases in the hardware design. For example,

This functionality requires hardware support.

 A) This functionality allows reuse of the register sets saved in the trap handler
         by the next interrupt at the same level if:
           i) xret does not complete, or 
           ii) not instruction is executed before the next interrupt is actioned.

      For 2a hardware support is required but is minimal.
i)  two bits in CSRs, reusable q1 flag for each privilege level
        ii)  if an interrupt is imminent (pending and enabled).
           MRET cannot execute any instructions until interrupt is engaged.
        iii) reusable q1 flag is reset if, after  return, any instruction is executed.

B) The functionality in 2b is enhanced over 2a to allow M-mode and S-mode to share the others saved register sets.

  2B requires the same hardware support as 2a, and 
    additionally requires S-mode to clear 
    M-mode's reusable q1 flag

2a) horizontal interrupts with minimal hardware flag reset. <<modified interrupt handler epilog a) leaves x2 in xswap pointing to saved q1 register set b) set reusable q1 flag for current privilege level x c) execute modified xret >> << modified xret at end of standard xret if pending interrupt a) do not execute target instruction b) start interrupt process otherwise a) reset reusable q1 flag b) resume interrupted code >>

<<modified interrupt prolog <<perform standard preliminary sp setup except do not advance sp by -FRAMESIZE>> save non-q1 registers if reusable q1 flag clear save q1 set registers otherwise a) leave previously saved q1 registers unchanged b) [[ possibly reset reusable q1 flag ]] <>

This 2a varient is comparable to other architecture's hardware stacking feature that avoids writing registers if just saved. (specifically when an interrupt arises during return)

2a) vertical interrupts with minimal hardware flag reset. <<modified S-mode interrupt handler epilog a) leaves x2 in sswap pointing to saved q1 register set b) set S-mode reusable q1 flag c) execute modified sret >> << modified sret at end of standard sret if pending S-mode or M-mode interrupt a) do not execute target instruction b) start interrupt process (S-mode or M-mode) otherwise a) reset S-mode reusable q1 flag b) resume interrupted code >>

<<modified M-mode interrupt prolog <<perform standard preliminary sp setup except do not advance sp by -FRAMESIZE>> save non-q1 registers if neither S-mode nor M-mode reusable q1 flags are clear a) save q1 set registers otherwise a) leave previously saved q1 registers unchanged b) reset reusable q1 flags [Not technically needed?] <>

<<modified M-mode interrupt handler epilog a) if S-mode reusable q1 flag was set on entry to M-mode handler execute b) restore all registers from M-mode saved register frame (which will include q1 register set)
b) leaves x2 in mswap pointing to saved register frame c) set M-mode reusable q1 flag d) execute modified mret >>

a) if S-mode reusable q1 flag was set on entry to M-mode handler do not load q1 set of registers from M-mode stack

<< modified mret at end of standard sret if pending S-mode or M-mode interrupt a) do not execute target instruction b) start interrupt process (S-mode or M-mode) otherwise a) reset S-mode reusable q1 flag b) resume interrupted code >>

3) Allow reused of saved registers by interrupt handlers if saved data is still current.

3b) horizontal interrupts with comprehensive hardware flag reset. << as in 2a: modified interrupt handler epilog a) leaves x2 in xswap pointing to saved q1 register set b) set reusable q1 flag for current privilege level x c) execute modified xret >> << standard xret, revised from 2a resume interrupted code >> << modified resumed code behaviour, new for 2b a) execute instructions as normal b) however, if executed instruction modifies a q1 register reset reusable q1 flag Note: reset of q1 flag occurs in all privilege modes >>

<<modified interrupt prolog <<perform standard preliminary sp setup except do not advance sp by -FRAMESIZE>> save non-q1 registers if reusable q1 flag clear save q1 set registers otherwise a) leave previously saved q1 registers unchanged b) [[ possibly reset reusable q1 flag ]] <>

This 2b variant goes beyond other architecture's hardware stacking feature that avoids writing registers if just saved. Interruptible routines that are q1 register set aware can avoid their use in heavily dynamically executed code segments in a interrupt heavy environment and make considerable more progress than without this feature. 2c) horizontal interrupts with multiple hardware flag reset. << as in 2b: modified interrupt handler epilog a) leaves x2 in xswap pointing to saved qx register sets b) set reusable qx flags for current privilege level x c) execute modified xret >> << standard xret, revised from 2a resume interrupted code >> << modified resumed code behaviour, parallels 2b a) execute instructions as normal b) however, if executed instruction modifies a qx register reset that reusable qx flag in all priv modes>>

<<modified interrupt prolog <<perform standard preliminary sp setup except do not advance sp by -FRAMESIZE>> save non-qx registers for each reusable qx flag that is clear save qx set registers otherwise a) leave previously saved qx registers unchanged b) [[ possibly reset reusable qx flags ]] <>

This 2c variant provides further granularity of register sets. As before, qx register set aware interruptible routines can tailor register use in heavily dynamically executed code segments leveraging the additional flexibility that the multiple qx flags provide. I believe the sweet spot may be 2 qx sets that encompass all the registers, {x16...x31} and {x1,x3...x15}. Note, sp, by convention x2 will (also by convention) be saved in xswap. As such, x2 can be the sole register modified in such coroutine sections and avoid any register saving to memory.
2d) vertical interrupts with hardware flag reset (s to m). << as in 2c: modified supervisor interrupt handler epilog a) leaves x2 in sswap pointing to saved S-mode qx register sets b) set reusable qx flags for supervisor privilege levels c) execute standard sret >> << modified resumed code behaviour, same as 2c except a) execute instructions as normal b) however, if executed instruction modifies a qx register reset both M-mode and S-mode reusable qx flag Note: initially only U-mode is affected >>

<<modified M-mode interrupt prolog <<perform standard M-mode sp setup except do not advance sp by -FRAMESIZE>> save non-qx registers for each M-mode reusable qx flag that is clear save qx set registers otherwise a) leave previously saved qx registers unchanged b) [[ possibly reset reusable qx flag ]] <>

<<modified M-mode epilog a) restore all registers saved by prolog i) all non-qx registers ii) all qx registers with reusable qx flag set on entry b) if no reusable qx flag set on entry i) set continue with normal epilog (mepc etc. and mret) including setting M-mode c) if any reusable qx flag set on entry, perform special M-mode trampoline return> >>

<<special M-mode trampoline return

 a) select first of the qx set that had 
      reusable qx flag set on entry (needs 3+ registers).
 b) store saved x2 in first register of that set.
 c) store saved mret
     (interrupt return address, to be used by sret)
     in second register of selected qx space.
 d) store saved mstatus  (to be used by sret)
     in third register of selected qx space.
 e) set x2 to identify the to be recovered qx sets.
 f) set mepc to <trampoline code in S-mode space>
 g) set MPP to S-mode
 h) set MPEI to disable S-mode interrupts.
 i) execute (standard) mret to <trampoline code in S-mode space> >>

<<trampoline code in S-mode space a) check x2 for first qx set to use, branch to appropriate code to do the following: b) recover x2 from sswap c) recover all or most of set qx SSIE and SPP from register containing mstatus data ii)

This 2c variant provides further granularity of register sets. As before, qx register set aware interruptible routines can tailor register use in heavily dynamically executed code segments leveraging the additional flexibility that the multiple qx flags provide. I believe the sweet spot may be 2 qx sets that encompass all the registers, {x16...x31} and {x1,x3...x15}. Note, sp, by convention x2 will (also by convention) be saved in xswap. As such, x2 can be the sole register modified in such coroutine sections and avoid any register saving to memory.
current thought to enhance xnxti use low bits to defined co-routine flag. cssrci and cssrsi can provide different polarities, especially as it is coupled with interrupt disable/enable (xie clear/set). use low 8bits of xintstatus for tracking co-routine / interrupt saved stack status. This status is a declared state by any privilege level that their x2 register is a stack pointer to saved data that if restored will allow correct ongoing program behaviour. In addition to being readable here, the state is set through xnxti instructions and
the state is aggregated to provide status in the low bits of the vector in rd provided by the xnxti instructions. Aggregated status is also provided in the low bits of xscratchcsw[l] . Perhaps unexpectedly, plan is xintthresh will not affect these bits. xintstatus bit recovqq

qq is a set of saved registers on the stack pointed to by x2. As a result x2 is not considered part of the recoverable set, it must be re-established before return to the co-routine. a code section sets this bit with a csrrsi xnxti when setrecovqq is 1 // not sure of this yet ------ and clears when bit is 0. when set informs co-routine (including interrupts) that the partner process is
able to fully restore qq state coroutine is thus able to use qq state with impunity iff it signals dorecovqq to the co-routine. partner routine must at the end of the recoverable qq section check do recovqq and recover qq state. further, it will clear statrecqq to terminate the shared qq state code region

plan: a single csrrci xnxti instruction does both current dorecovqq state is placed in setrecovqq bit location and recovqq is cleared if setrecovqq is set in immediate field.
Bit dorecovqq needs to be set on return from co-routine if coroutine messed with qq state.

Dorecovqq could be offset from zero that allows C.JR to return right back to process.

is cleared when recovqq is set by csrrsi.+ new branch instruction on low bits set. generally available functionality , but especially valuable in a minimal register use case where either: no other register available (without spill) to mask specific bits, or reload of value costly (as CSR read may be).

Note: for 2a some implementations may already have this functionality. Specifically, if the implementation already detects pending enable interrupt and a) vectors to start of interrupt routine and b) leaves xepc unchanged Then the low bit of xepc can be used as the hardware signal for M and S modes.

Kevin-Andes commented 3 years ago

Attached the detailed description and diagram that David sent out to the TG discussion mailing list:

Tracking updated registers.pdf

ill#1c.pdf

Kevin-Andes commented 3 years ago

In the discussion of our Task Group meeting on 20210119, some member pointed out that the worst case scenario won't get improved with this approach. This is because, even though the "window" to avoid repeated saving/restoring of interrupt context has been extended, the worst case will occur when an interrupt arrives right after this extended window.

Also, some member was concerned about the extra HW cost. Alternatively, a pure soft approach may achieve similar goal by using compiler to reserve a fixed group of registers for interrupts or only save registers that are actually used.

Lastly, the fastest approach should be using pure HW-assisted context saving/restoring, and it is not clear how this proposed scheme can work with pure HW approach. Also, we may need to examine if there are any potential security issues in the usage model.