Closed marnovandermaas closed 5 months ago
Looking at your failure case waveform, I noticed something interesting at the fetch interface. Basically there is a fetch request at 0x0000_0000 due to the speculative fetch. The fetch is granted however the rvalid for that never come (you can see there is a rvalid for the fetch at address 0x001_00a8 and then a rvalid for 0x001_00b8, but nothing in the middle). That's probably why the fetch_fifo went out of sync. Can you check the FPGA memory design?
Yes, it seems that currently the memory system grants the access to address zero but never returns anything. However, if I don't grant the access the ibex will just hang and retry fetching from that address. Either way, I think it's better not to do this speculative fetch for a capability that you know will fail.
I guess we are still not on the same page. There are 2 separate issues, 1) The sonata memory system behavior is clearly a bug. It's ok to return an instr_err together with instr_rvalid, but granting access without issuing a instr_rvalid is NOT acceptable. This condition causes CPU to go into an unrecoverable failure state (same consequence if this happens on the data interface 2) whether we want to issue a speculative fetch. To me it is similar to branch prediction and thus the fetching stage should be able to handle. The intention for introducing the speculative fetch is to improve timing on the instr interface (to remove bound checking from the critical path). While this is now mitigated by the ISA spec change, I'd like to defer it until we can fully evaluate the impact of the change (syntheis/timing analysis and regression simulation).
Agreed with point number 1. I have opened an issue in the Sonata repository to track this while I investigate: https://github.com/lowRISC/sonata-system/issues/28
In terms of the speculative fetch, I am curious about the ISA change that you mention. Is this published anywhere?
That's in the cheriot-sail PR https://github.com/microsoft/cheriot-sail/pull/37
Thanks that very useful. I'm going to close this issue as discussing the speculative branching should be done separately.
The behavior I am seeing is that the first 4 instruction bytes of the trap handler being dropped when a CJALR error happened.
You can see from this wave form that
fifo_clear
in the prefetch buffer gets asserted twice in succession. It causes theinstr_rdata_i
that corresponds to the PC of the trap handler12FD42C1
to be dropped.I think this happens because the CJALR clears an entry in the FIFO through
branch_req_spec_o
. However, the taking of the interrupt also causes an entry to be cleared. This extra entry causes the fetch FIFO to become out of sync. The solution is to not clear the FIFO in the CHERI execute stage when an instruction fault happens. This means using the behavior ofbranch_req_o
(no spec) for example.This is what I would consider the correct behavior to be after applying the patch from this PR:
Here is the dump that causes the error:
In case its useful here are the wave files that I used to generated the images above: wavefiles.zip