openhwgroup / cv32e40s

4 stage, in-order, secure RISC-V core based on the CV32E40P
https://docs.openhwgroup.org/projects/cv32e40s-user-manual/en/latest/
Other
126 stars 21 forks source link

Core starts fetching instruction from different address than specified #526

Closed rdgarce closed 5 months ago

rdgarce commented 5 months ago

Hi, I have implemented a SoC (like the IBEX demo system) using the CV32E40S core and simulating the execution of a RV32 binary with the help of Verilator.

I found that the core is starting fetching instruction from a different address than the boot_addr_i. Inspecting the traces, I found that the first instruction is never fetched (instr_req_o = 0) and an Instruction access fault is raised (mcause = 0x1) and so the very first instruction fetched happens to be the one located at mtvec_addr_i.

Looking deeply in the traces I presume the error is generated by the PMP in the IF stage not finding any match for the Instruction address and so setting access_fault = 1. That signals, I presume, propagates into the Instruction access fault exception being raised.

If this is the case, I would not expect this to happens because I setted the core like follows:

PMP_NUM_REGIONS = 0
PMP_PMPNCFG_RV = '{default:PMPNCFG_DEFAULT}

Am I getting something wrong?

As a reference I drop some images.

The configuration of the Top entity. CORE_* signals are feeded without any modification to the CV32E40S. param_config

Inside the Top entity I define core_mtvec_addr = MEM_START = 0x100000 and core_boot_addr = MEM_START + 32'h80 = 0x100080. These two signals are assigned to mtvec_addr_i and boot_addr_i. boot_mtvec_addr

A snapshot of the Vector table of the executed file in the simulation located in memory starting from address MEM_START = 0x100000. Notice how there is a jump to the reset_handler both at 0x100000 and 0x100080. startup

The traces of the execution. traces

Thanks

Silabs-ArjanB commented 5 months ago

Hi @rdgarce ,

It looks weird that priv_lvl_i starts out as 0.

Could you please expand your trace so that it starts at time 0 and such that all cv32e40score instr signals are visible as well as the clk_i and rst_ni signals and the PMP parameters at that level? Also please add core_trans and prefetch_priv_lvl from within cv32e40s_if_stage.

Best regards, Arjan

rdgarce commented 5 months ago

Hi @Silabs-ArjanB,

This is the trace extended with the requested signals. Thanks, Raffaele

trace_2_1 trace_2_2

Silabs-ArjanB commented 5 months ago

Hi @rdgarce , thank you very much for your additional figures. The first error I see in these waves is that priv_lvl_i is 0 (user mode) at the time that the transaction with address 0x100080 goes into the PMP. This then causes a PMP error as it is not allowed to do user mode accesses not matching any PMP region. The real question is now why priv_lvl_i is 0 at 4ns in your waveform.

This is either an RTL bug or a Verilator bug. If you provide me a full VCD file for all signals within the IF stage and deeper then I can have a look. Otherwise it would already help if you at least also trace the following signals which feed into priv_lvl_i (and ideally study yourself if they should indeed lead to priv_lvl_i is 0):

cv32e40s_alignment_buffer.sv:

fetch_priv_lvl_o priv_lvl_ctrl_i.priv_lvl_set priv_lvl_ctrl_i.priv_lvl instr_priv_lvl_q
instr_priv_lvl_o fetch_priv_lvl_i

cv32e40s_prefetcher.sv:

state_q
fetch_priv_lvl_access_o
fetch_branch_i
fetch_priv_lvl_access_i
trans_priv_lvl_q

Best regards, Arjan

rdgarce commented 5 months ago

Hi @Silabs-ArjanB , Thanks a lot for your time and for the hint about the signals to check. I will do that and follow up if I discover something.

In the meantime this is the trace of the simulation. (I changed extension to .txt to load the plain text directly here). trace.txt

Regards, Raffaele

rdgarce commented 5 months ago

Hi @Silabs-ArjanB, I found the problem and the solution.

The root of the problem is in the cv32e40s_sleep_unit and in the clock gating together with the reset signal feeded into the components of the core. The rst_ni signal is passed from the core top module directly to every submodule but the clk_i is gated by the sleep unit and the gated version clk is passed to them.

What I expected (maybe I am wrong) is that I could simulate N (at least one) clock cycle with the rst_ni = 0 and be sure that the core (and every part of it) is resetted and after that set rst_ni = 1 and expect the core to start it's execution on the next positive edge of clk_i. This to me seems congruent with the physical way of how a digital machine is resetted on power on (I'm not an expert so please correct me if I'm wrong).

This found not to be true because (at least the prefetcher unit) wants a negedge of rst_n OR rst_n = 0 while posedge of clk to reset. This was not happening because clk is low until the sleep unit is in reset state (caused by rst_ni = 0) and there was no negedge of rst_n because it starts low, so the module was not resetted and resulted in the very first "bug" (if we can call it like that): trans_priv_lvl_q != PRIV_LVL_M at initialization.

After rst_ni = 1 the core starts running with an wrong internal state. Given that it followed that the PMP raised an error.

I applied the following "simulation" solution: Before the simulation starts (and with clk_i still low) do a verilator model evaluation with rst_ni = 1 followed by another evaluation with rst_ni = 0. In this way I can simulate a negedge and trigger an async reset. After that I permanently set rst_ni = 1 and start the simulation.

Please let me know if there is some better way of handling this situation and if I am missing some key information.

Thanks a lot for the help, Raffaele

Silabs-ArjanB commented 5 months ago

Hi @rdgarce ,

Thank you for finding the issue.

This found not to be true because (at least the prefetcher unit) wants a negedge of rst_n OR rst_n = 0 while posedge of clk to reset.

This then looks like a Verilator issue to me. The trans_priv_lvl_q flop is modeled to have an asynchronous reset, so the simulator should not be waiting for a negedge on the reset or a posedge of the flop. It should just asynchronously reset these flops whenver the reset signal is low.

Best regards, Arjan