stnolting / neorv32

:desktop_computer: A small, customizable and extensible MCU-class 32-bit RISC-V soft-core CPU and microcontroller-like SoC written in platform-independent VHDL.
https://neorv32.org
BSD 3-Clause "New" or "Revised" License
1.6k stars 225 forks source link

Bus_keeper module incorrectly generates bus errors #314

Closed GideonZ closed 2 years ago

GideonZ commented 2 years ago

Observed behavior NeoRV32 crashes at random when executing from external DDR2 memory.

Cause The bus_keeper module incorrectly generates an internal bus error on an external cycle when the wishbone acknowledge arrives exactly when the internal bus cycle timeout counter is at zero. Not before, not after; only exactly at zero. It seems that this happens because the bus_xip_i signal is zero in this cycle, so the cycle is incorrectly marked as internal.

Screenshots image

stnolting commented 2 years ago

I have heard about problems where the external memory access latency is close to the internal timeout value, but I thought these problems were caused by some improper handling of the wishbone handshake... Seems like this is my fault 😅

Thanks for the great problem description! :+1: I think this can be easily fixed without additional logic. I will do some tests and after that I'll propose a PR to (hopefully) fix this.

stnolting commented 2 years ago

I am trying to reproduce the error. I can see that the BUSKEEPER assert it's ERROR signal - but 1 cycle after the ACK has been sent to the CPU. So the CPU does not raise an exception. Can you confirm this?

However, for debugging the internal bus this is bad. I have made a simple update in #315 that seems to fix this issue.

GideonZ commented 2 years ago

I merged the changes from the branch 'hotfix_buskeeper' into my branch and built the FPGA. I can indeed see that the error signal still fires, but the CPU no longer traps. My FreeRTOS test program now runs continuously. (Another that runs fine on Nios-II still crashes on RiscV, so I am not 'there' yet.) Thank you for the quick turnaround on this bug! You fixed it faster than I could find it!

stnolting commented 2 years ago

I merged the changes from the branch 'hotfix_buskeeper' into my branch and built the FPGA. I can indeed see that the error signal still fires, but the CPU no longer traps.

The CPU does not raise an exception if ERR is asserted right after ACK has been asserted. This is because of some recent changes in the CPU's bus unit (#303).

However, the BUSKEEPER should not assert it's ERROR signal anymore when using the (latest) changes from #315. This is what my test looks like - same scenario (I think) as in your setup: access to external memory via Wishbone; external memory has an access latency of 15 cycles (= max_proc_int_response_time_c):

buskeeper