microsoft / cheriot-ibex

cheriot-ibex is a RTL implementation of CHERIoT ISA based on LowRISC's Ibex core.
Apache License 2.0
73 stars 14 forks source link

Fetch FIFO clearing twice on CJALR permit execute error #28

Closed marnovandermaas closed 5 months ago

marnovandermaas commented 5 months ago

The behavior I am seeing is that the first 4 instruction bytes of the trap handler being dropped when a CJALR error happened.

You can see from this wave form that fifo_clear in the prefetch buffer gets asserted twice in succession. It causes the instr_rdata_i that corresponds to the PC of the trap handler 12FD42C1 to be dropped. image

I think this happens because the CJALR clears an entry in the FIFO through branch_req_spec_o. However, the taking of the interrupt also causes an entry to be cleared. This extra entry causes the fetch FIFO to become out of sync. The solution is to not clear the FIFO in the CHERI execute stage when an instruction fault happens. This means using the behavior of branch_req_o (no spec) for example.

This is what I would consider the correct behavior to be after applying the patch from this PR: image

Here is the dump that causes the error:


error.elf:  file format elf32-littleriscv

Disassembly of section .rom_loader:

00100000 <_start>:
        ...

00100080 <start>:
  100080: 5b 05 d0 03   cspecialr   ca0, mtdc
  100084: 97 05 00 00   auipcc  ca1, 0
  100088: 5b 06 e0 03   cspecialr   ca2, mscratchc
  10008c: b7 02 10 00   lui t0, 256
  100090: 93 82 82 0b   addi    t0, t0, 184
  100094: db 82 55 20   csetaddr    ct0, ca1, t0
  100098: 5b 80 c2 03   cspecialw   mtcc, ct0
  10009c: 85 42         addi    t0, zero, 1
  10009e: 73 90 42 bc   csrw    3012, t0
  1000a2: 21 44         addi    s0, zero, 8

001000a4 <permitexecute>:
  1000a4: 02 95         cjalr   ca0
  1000a6: 6f 00 40 02   j   0x1000ca <fail>
  1000aa: 13 00 00 00   nop
  1000ae: 01 00         nop

001000b0 <compartment_key>:
        ...

001000b8 <trap>:
  1000b8: c1 42         addi    t0, zero, 16

001000ba <delayloop>:
  1000ba: fd 12         addi    t0, t0, -1
  1000bc: e3 9f 02 fe   bnez    t0, 0x1000ba <delayloop>
  1000c0: 73 50 30 34   csrwi   mtval, 0
  1000c4: 73 50 20 34   csrwi   mcause, 0

001000c8 <pass>:
  1000c8: 01 a0         j   0x1000c8 <pass>

001000ca <fail>:
  1000ca: 01 a0         j   0x1000ca <fail>
        ...

0010014c <end>:
  10014c: 01 00         nop
  10014e: 01 00         nop
  100150: 01 00         nop
  100152: 01 00         nop
  100154: 01 00         nop
  100156: 00 00         unimp   

In case its useful here are the wave files that I used to generated the images above: wavefiles.zip

kliuMsft commented 5 months ago

Looking at your failure case waveform, I noticed something interesting at the fetch interface. Basically there is a fetch request at 0x0000_0000 due to the speculative fetch. The fetch is granted however the rvalid for that never come (you can see there is a rvalid for the fetch at address 0x001_00a8 and then a rvalid for 0x001_00b8, but nothing in the middle). That's probably why the fetch_fifo went out of sync. Can you check the FPGA memory design?

image

marnovandermaas commented 5 months ago

Yes, it seems that currently the memory system grants the access to address zero but never returns anything. However, if I don't grant the access the ibex will just hang and retry fetching from that address. Either way, I think it's better not to do this speculative fetch for a capability that you know will fail.

kliuMsft commented 5 months ago

I guess we are still not on the same page. There are 2 separate issues, 1) The sonata memory system behavior is clearly a bug. It's ok to return an instr_err together with instr_rvalid, but granting access without issuing a instr_rvalid is NOT acceptable. This condition causes CPU to go into an unrecoverable failure state (same consequence if this happens on the data interface 2) whether we want to issue a speculative fetch. To me it is similar to branch prediction and thus the fetching stage should be able to handle. The intention for introducing the speculative fetch is to improve timing on the instr interface (to remove bound checking from the critical path). While this is now mitigated by the ISA spec change, I'd like to defer it until we can fully evaluate the impact of the change (syntheis/timing analysis and regression simulation).

marnovandermaas commented 5 months ago

Agreed with point number 1. I have opened an issue in the Sonata repository to track this while I investigate: https://github.com/lowRISC/sonata-system/issues/28

In terms of the speculative fetch, I am curious about the ISA change that you mention. Is this published anywhere?

kliuMsft commented 5 months ago

That's in the cheriot-sail PR https://github.com/microsoft/cheriot-sail/pull/37

marnovandermaas commented 5 months ago

Thanks that very useful. I'm going to close this issue as discussing the speculative branching should be done separately.