openhwgroup / cv32e40p

CV32E40P is an in-order 4-stage RISC-V RV32IMFCXpulp CPU based on RI5CY from PULP-Platform
https://docs.openhwgroup.org/projects/cv32e40p-user-manual/en/latest
Other
967 stars 424 forks source link

Add CSRs for checking stack overflow on RTL simulation #161

Open haugoug opened 5 years ago

haugoug commented 5 years ago

Would be nice to add custom CSR for checking that any access using the SP as base register is falling into the area declared through these CSRs. The software could declare this area when a stack is activated and the platform could raise a simulation error (like branch decision is X) if SP-based access is done outside. The goal is to target only simulation platforms, so we don't need to care if the solution is good for final chip on silicon. I've used it for a while on virtual platform using these CSRs and it turned out to be very useful: 0x7D0 : 0=disabled, 1=enabled 0x7D1: stack base address 0x7D2: stack base address + stack size

davideschiavone commented 5 years ago

Do we really want to enable or disable it? Referring to 0x7D0 : 0=disabled, 1=enabled

haugoug commented 5 years ago

Yes because some runtime will not support this feature and then needs it to be inactive.

gautschimi commented 5 years ago

I think this can be done with the pmp unit. in a context switch you can change the pmp_cfg to allow read/write access to the current stack. all other accesses will result in a pmp error.

haugoug commented 5 years ago

Maybe but we have some systems where we don't have any PMP. Also doing it in the core allows monitoring exactly the stack, even if you have other data after the stack, while it is not feasible with the PMP, you need to waste some memory before and after to catch the wrong access, and will not work when the wrong offset is bigger than this empty area.

Silabs-ArjanB commented 5 years ago

Hi Haugoug,

Thank you for bringing up the topic of stack overflow checking and for proposing a solution for that. We also think that stack overflow checking would be a nice addition to RI5CY (or even RISC-V in general). In your proposal it is not completely clear to me when (i.e. for which instructions) the checking is actually performed and what the consequences are of a failing check.

Please allow me to share our opinion on how this feature could be specified:

- We would make key assumptions / simplifications:

- Stack limit checking only needs to check for **stack overflow during stack frame creation**
    - Only overflow needs to be checked, i.e. there is no need to perform checks during unstacking.
    - Only 1 limit CSR is therefore needed
- Stack limit checking only needs to work **compiler-generated code** that modifies (i.e. decrements) the stack pointer before performing any stores to the reserved space when performing a stack frame creation, e.g. as follows:

    addi sp,sp,-framesize                     # Create stack frame 
    sw a1,OFFSET(sp)                           # Push register onto stack
    sw a2,OFFSET(sp)                           # Push register onto stack
    …
- Stack frame creation will always start with an ‘addi sp,sp,-framesize’ (with sp == r2) and we propose that the stack overflow check is **only** performed on the result of such an ‘addi’ instruction.

  So, for example the following instructions/values will not be used in the stack overflow check:
     - Access address of store instructions
     - Access address of load instructions
     - Instructions (other than ‘addi sp,sp,-framesize’) writing to sp (so also not for RI5CY’s post-increment load/store)

- CSR usage

- Only **1 CSR** is needed, splimit
    - If resulting sp of ‘addi sp,sp,framesize’ < splimit, then a stack overflow is detected.
- Stack overflow detection can be disabled by setting splimit to 0 (default value) (e.g. by ‘csrwi splimit, 0’)

- Upon a stack overflow detection:

- Cause a synchronous ‘stack overflow’ exception for the addi instruction (mepc would point to the address of the addi instruction; mtval would contain the offending stack address)
    - Exception code 24 in mcause can be used for this (or another free exception code that is reserved for future custom use) (the priority of this exception versus other exceptions is not relevant as addi cannot cause other synchronous exceptions).
- Automatically disable further stack overflow checking (so splimit=0)

Reason for this is to not further complicate the exception handler code (which itself would normally start with ‘addi sp,sp,framesize’ which would then also cause an exception). (The actual splimit therefore needs to be set a little above the actual lower boundary of the stack such that this exception itself can do its work.)

It is the duty of the exception handler to re-initialize splimit once it has taken its other corrective actions.

- ‘addi sp,sp,framesize’ which caused the overflow actually performs ‘sp = splimit’ (this gives the exception handler a predictable stack pointer)

- TBD:

- Should stack overflow checking be automatically disabled for other scenarios (e.g. during NMI)?
- Are any special provisions needed for cores that have Machine mode + User mode (my initial reaction would be that we only need a Machine-mode CSR)?
- Above proposal limits the stack overflow checking to only the 'addi sp, sp, framesize' instruction. Should this be extended to check every write to sp (e.g. also my MUL, XOR, load with post increment, etc.)? (If so, what are the implications on critical path and how would we prioritize instructions causing multiple synchronous exceptions (e.g. stack overflow exception plus store access violation)?)

I am very interested in hearing your opinion on this.

Best regards, Arjan

haugoug commented 5 years ago

Hi Arjan, thanks a lot for the detailed proposal. Your proposal is targeting final silicon where we care about properly handling such an error as an exception, and my proposal was more like a simulation assert to help developers identifying bugs with stack. I think it would worth having 2 different proposals because it will bring different constraints. Targeting the silicon will imply having a lightweight-solution to limit power consumption while in simulation, we don't care and we just want to catch every possible error. Could you please create a new tracker for your proposal which specifically targets final chip ? I will update the title of this one to focus on simulation. For what concerns simulation I think it is better to track the memory accesses, because otherwise you could miss some errors. For example, if the frame has been updated, an interrupt occurs and the interrupt handler is corrupting the sp register, you would not see the incorrect access with your proposal. The same with a compiler bug. In case such an error occurs, just having a simulation error and stopping the simulation would be fine, just like the assert that we already have "branch decision is X" which is very useful to check accesses to uninitialized variables. Best regards, Germain

Silabs-ArjanB commented 5 years ago

Hi Haugoug,

as you requested I moved the stack overflow checking enhancement request (aimed at silicon as opposed to simulation-only) into a separate ticket https://github.com/pulp-platform/riscv/issues/183, so you can further ignore my remarks above (or even delete them if you think that helps the clarity of your ticket).

Best regards, Arjan