openhwgroup / cva6

The CORE-V CVA6 is an Application class 6-stage RISC-V CPU capable of booting Linux
https://docs.openhwgroup.org/projects/cva6-user-manual/
Other
2.2k stars 670 forks source link

[BUG] std_cache_subsystem: AXI w_valid waits for aw_ready #1029

Open TobiasKaiser opened 1 year ago

TobiasKaiser commented 1 year ago

Is there an existing CVA6 bug for this?

Bug Description

On page A3-45 (AXI3 write transaction dependencies), the AXI spec says: "The Manager must not wait for the Subordinate to assert AWREADY or WREADY before asserting AWVALID or WVALID."

In core/cache_subsystem/std_cache_subsystem.sv, this seems to be violated: A new W channel element is pushed to i_fifo_w_channel only when aw_valid and aw_ready. As long as aw_ready is not asserted, no element is pushed to the FIFO. w_select_arbiter remains zero and i_stream_mux_w drives axi_req_o.w_valid low.

When an AXI subordinate waits for both aw_valid and w_valid before asserting aw_ready and w_ready, as it is allowed to, the AXI bus deadlocks. To reproduce: Wire up axi_to_mem directly to the cva6 module configured with std_cache_subsystem. Deadlock occurs on first store. Inserting an axi_cut instance between cva6 and axi_to_mem resolves the deadlock.

The condition .push_i ( axi_req_o.aw_valid & axi_resp_i.aw_ready ) seems logically important for how std_cache_subsystem works. As I have little knowlege about the module, I am not sure how the problem could be resolved or whether this is a user problem.

MikeOpenHWGroup commented 1 year ago

Hi @JeanRochCoulon, I am not sure who is best positioned to address this issue, so I assigned it to you. Please find the appropriate victim.

JeanRochCoulon commented 1 year ago

Hi @AEzzejjari It seems you are the best positioned to answer this GitHub issue. I will assign it to you.

AEzzejjari commented 1 year ago

Hello, In this case, the FIFO is not an intermediate between the CVA6 and the Subordinate. It is a back up to store all the requests sent by the CVA6 and accepted by the Subordinate, in order to check that the response sent by the Subordinate in the write response channel is compliant with the request.

We can test the case when CVA6 performs a request and aw_ready/w_ready = 0 with the AXI agent. You just have to change the attribute drv_slv_mode in cfg class with UVMA_AXI_DRV_SLV_MODE_RANDOM_LATENCY.

TobiasKaiser commented 1 year ago

Hello, In this case, the FIFO is not an intermediate between the CVA6 and the Subordinate. It is a back up to store all the requests sent by the CVA6 and accepted by the Subordinate, in order to check that the response sent by the Subordinate in the write response channel is compliant with the request.

Is this about i_fifo_w_channel? It generates the two-bit w_select_fifo signal, which to me does not seem related to protocol compliance checking.

We can test the case when CVA6 performs a request and aw_ready/w_ready = 0 with the AXI agent. You just have to change the attribute drv_slv_mode in cfg class with UVMA_AXI_DRV_SLV_MODE_RANDOM_LATENCY.

I guess this is about test coverage, can I help there somehow?

AEzzejjari commented 1 year ago

Is this about i_fifo_w_channel? It generates the two-bit w_select_fifo signal, which to me does not seem related to protocol compliance checking.

Yes, it's not about checking protocol compliance or when we can assert AWVALID or WVALID. It's just about having a traceability of the accepted transaction if you see the input data and the output data is just an ID. This FIFO allows us to verify that we have the right response to a request.

I guess this is about test coverage, can I help there somehow?

Thanks, but if you want to make sure that there is no error and the manager does not wait for the subordinate to assert AWREADY or WREADY before asserting AWVALID or WVALID. You can use the AXI agent, which can randomize ready-to-valid latency. If you want to do this, send me an email so I can show you how.

TobiasKaiser commented 1 year ago

I am not sure whether an agent with randomized latency triggers the problem, since it is about dependencies between the channels. Unfortunately, I currently do not have the time to try it out or look deeper into this. I guess it is up to others to decide whether this issue is serious enough to warrant more work.