kuznia-rdzeni / coreblocks

RISC-V out-of-order core for education and research purposes
https://kuznia-rdzeni.github.io/coreblocks/
BSD 3-Clause "New" or "Revised" License
37 stars 16 forks source link

Internal Interrupt Controller #654

Closed piotro888 closed 5 months ago

piotro888 commented 7 months ago

Closes #599

Implements priv spec core internal interrupt controller operating on CSRs. Details in docstrings.

Depends on #652 , needs rebase to reflect changes and clean-up from it after it is approved.

piotro888 commented 7 months ago

The cause of the problem is Forwarder inside FetchResume Unifier key. It can illegally store resume call to next fetcher stall.

I'm still not sure how it happened that call with resume_from_exception=0 was stored in forwarder instead of resume_from_exception=1 that should be always be later (called from Retirement), and what is the best way to fix it.

EDIT: Unifer key collects only entries form FUs (with resume from exception=0) and Retirement calls fetch directly, so stored value makes sense, but still Retirement call coming first to block the forwader is weird

xThaid commented 7 months ago

I am not sure if I see where exactly the problem is. I looked at the execution of TestCoreInterrupt_0_interrupt_asm and it seems that the core works fine, at least from the perspective of the instruction stream it is given. I didn't dig into what it exactly executes and the test itself

Can you provide cycle at which there is a bug? And what is the expected behavior there?

piotro888 commented 7 months ago

@xThaid Ok, I found it. The problem:

  1. Core is stalled due to CSR instruction
  2. Misprediction happens
  3. CSRUnit calls FetchResume with resume_from_exception=0 that normally should be ignored, but Fetch is still doing something and resume method is not yet ready, so this call waits inside Unifier in Forwarder.
  4. Core completes flushing, and Retirement tries to call resume with resume_from_excpetion=1, but the method blocks because fetcher is still not ready
  5. Fetcher becomes ready, due to random priority resume call from Retirement wins with Forwarder.
  6. Fetch is unstalled, and resume method is not ready again
  7. Another unsafe instruction arrives
  8. Forwader in Unifier still holds the old call value!!! When resume becomes ready, fetch is resumed to the old PC.

Setting priority between Retirement.CORE_RESUME and Unifier, to Unifier.Forwarder should fix the issue, so all calls with resume_form_excpetion=0 arrive before call from the Retirement (with resume_form_excpetion=1). I need to find a way how to declare it cleanly

piotro888 commented 7 months ago

Time ~707200ns

piotro888 commented 7 months ago

I think that best solution would be to split resume into resume_from_exception and resume_from_unsafe with conflict priority explicitly defined in Fetch

xThaid commented 7 months ago

OK, thanks, now I can see it.

I wonder though, why do we even allow for speculatively resuming the fetch unit? If we are precommiting a CSR instruction without side effects, (in my opinion) it must not call fetch resume - this is clearly a side effect.

This fixes the failing tests:

diff --git a/coreblocks/func_blocks/csr/csr.py b/coreblocks/func_blocks/csr/csr.py
index 3e203caf..5191ea37 100644
--- a/coreblocks/func_blocks/csr/csr.py
+++ b/coreblocks/func_blocks/csr/csr.py
@@ -103,6 +103,7 @@ class CSRUnit(FuncBlock, Elaboratable):
         done = Signal()
         accepted = Signal()
         exception = Signal()
+        can_resume = Signal()
         precommitting = Signal()

         current_result = Signal(self.gen_params.isa.xlen)
@@ -205,7 +206,7 @@ class CSRUnit(FuncBlock, Elaboratable):

         @def_method(m, self.get_result, done)
         def _():
-            m.d.comb += accepted.eq(1)
+            m.d.comb += accepted.eq(can_resume)
             m.d.sync += reserved.eq(0)
             m.d.sync += instr.valid.eq(0)
             m.d.sync += done.eq(0)
@@ -249,6 +250,7 @@ class CSRUnit(FuncBlock, Elaboratable):
         @def_method(m, self.precommit)
         def _(rob_id: Value, side_fx: Value):
             with m.If(instr.rob_id == rob_id):
+                m.d.sync += can_resume.eq(side_fx)
                 m.d.comb += precommitting.eq(1)
                 m.d.comb += exe_side_fx.eq(side_fx)

Current setup seems to be broken anyway. csr unit sets the ready signal for method fetch_resume whenever get_result is called. However, if the unifier forwarder is not ready (another func unit inserted something there), fetch_resume won't run and its ready signal will be gone.

edit: it seems that the priv unit also tries to resume the fetch unit even when the instruction was precommited without side effects.

piotro888 commented 7 months ago

In both cases besides side_fx, it also needs to check if exception is not going to be reported on that instruction, otherwise the same conflict could happen if resume is blocked.

Current setup seems to be broken anyway. csr unit sets the ready signal for method

It would work, because there is at most one unsafe instruction at time in the core (that call resume via Forwarder)

But it would still break when we would introduce interrupt delay enforcing via insertion of extra instruction at end of the pipeline. Unsafe instruction on precommit stage with exception clear, could still conflict with that interrupt. So I think that solution with separate resume methods would fit better in the terms of cleaner logic later.

piotro888 commented 6 months ago

Should be finally ready to merge EDIT: wfi test fail, I will fix it tomorrow, rest should be ready to review

piotro888 commented 5 months ago

Solid work! Just a small question: why are custom_report_level and custom_report_edge separated as two signals? As far as I understand it, their bits are mutually exclusive anyway.

merged to one signal, thanks!

should be ready to merge now