lowRISC / opentitan

OpenTitan: Open source silicon root of trust
https://www.opentitan.org
Apache License 2.0
2.55k stars 761 forks source link

[prim] Add sync clear input to `prim_fifo_async` #15729

Closed eunchan closed 4 months ago

eunchan commented 1 year ago

Related Issue: https://github.com/lowRISC/opentitan/pull/15727

It is generally recommended to use same reset source (but synchronize to the port clock if needed) for both write/read ports of the asynchronous FIFO. When reset occurs, the read, write depths are reset abruptly as the reset is async reset. This may trigger metastability errors on Reset Domain Cross checker tools.

One solution is to create sync assert sync de-assert reset synchronizer.

andreaskurth commented 6 months ago

@vogelpi: Feels like a can of worms, wouldn't address this for PROD. We know and control the reset sources, so we can control the clearing of both sides. I would investigate if there are scenarios where we don't reset both sides. There are not many of those FIFOs.

@moidx: We did a similar check in the RTL for ES.

a-will commented 6 months ago

For what it's worth, I don't think a synchronous clear mixes well with an async FIFO with circular buffers. In addition, the original motivating module has changed to not use an async FIFO: The TPM RdFIFO is now a linear buffer in a two-port SRAM, where only one transaction is ever present at a time, and the data always begins from offset 0.

I'd recommend closing the issue.

cdgori commented 5 months ago

It's a can of worms, for sure, and the TPM RdFIFO was the genesis of this issue so the point seems to be moot there now.

Could we enumerate the places in the design where prim_fifo_async is used here / in an issue (e.g. with check boxes)? And then do the suggested investigation to see if there are cases where we don't reset both sides?

andreaskurth commented 5 months ago

Response WIP

prim_fifo_async is currently instantiated as follows: https://github.com/lowRISC/opentitan/blob/46ac5a7210e31ff4f2027b822cc9c163d78b99b2/hw/ip/spi_device/rtl/spi_tpm.sv#L1314-L1332 --> same reset signal for read and write port


https://github.com/lowRISC/opentitan/blob/46ac5a7210e31ff4f2027b822cc9c163d78b99b2/hw/ip/spi_device/rtl/spid_status.sv#L263-L282 --> same reset signal for read and write port


https://github.com/lowRISC/opentitan/blob/46ac5a7210e31ff4f2027b822cc9c163d78b99b2/hw/ip/tlul/rtl/tlul_fifo_async.sv#L28-L59 https://github.com/lowRISC/opentitan/blob/46ac5a7210e31ff4f2027b822cc9c163d78b99b2/hw/ip/tlul/rtl/tlul_fifo_async.sv#L65-L96

which is currently instantiated as follows: https://github.com/lowRISC/opentitan/blob/46ac5a7210e31ff4f2027b822cc9c163d78b99b2/hw/top_earlgrey/ip/xbar_main/rtl/autogen/xbar_main.sv#L1020-L1032 https://github.com/lowRISC/opentitan/blob/46ac5a7210e31ff4f2027b822cc9c163d78b99b2/hw/top_earlgrey/ip/xbar_main/rtl/autogen/xbar_main.sv#L1047-L1059 https://github.com/lowRISC/opentitan/blob/46ac5a7210e31ff4f2027b822cc9c163d78b99b2/hw/top_earlgrey/ip/xbar_main/rtl/autogen/xbar_main.sv#L1074-L1086 https://github.com/lowRISC/opentitan/blob/46ac5a7210e31ff4f2027b822cc9c163d78b99b2/hw/top_earlgrey/ip/xbar_main/rtl/autogen/xbar_main.sv#L1101-L1113


There's a related module, prim_fifo_async_simple, which is currently instantiated as follows: https://github.com/lowRISC/opentitan/blob/46ac5a7210e31ff4f2027b822cc9c163d78b99b2/hw/vendor/pulp_riscv_dbg/src/dmi_cdc.sv#L85-L101 https://github.com/lowRISC/opentitan/blob/46ac5a7210e31ff4f2027b822cc9c163d78b99b2/hw/vendor/pulp_riscv_dbg/src/dmi_cdc.sv#L103-L119 --> combined_rstn asynchronously resets together with trst_ni and rst_ni, and it synchronously clears with jtag_dmi_cdc_clear_i.

moidx commented 4 months ago

@andreaskurth to decide on next steps and determine if this should be part of M4.

moidx commented 4 months ago

@vogelpi is auditing the RTL. TL-UL instances have different reset connections. Inspecting to see if the resets are derived from the same root.

Alex: It may also be ok if the reset sequencing causes both resets to assert asynchronously at once and if the sync de-assertion prevents transactions from flowing.

a-will commented 4 months ago

Relevant resets for tlul_fifo_async are in xbar_main:

https://github.com/lowRISC/opentitan/blob/50c4e6f0764f0225479efe924e8177b220cd3ec2/hw/top_earlgrey/rtl/autogen/top_earlgrey.sv#L2829-L2833

These all reset together when the whole LC domain is reset:

https://github.com/lowRISC/opentitan/blob/50c4e6f0764f0225479efe924e8177b220cd3ec2/hw/top_earlgrey/ip_autogen/rstmgr/rtl/rstmgr.sv#L264-L266

There is no software control of these infrastructure resets, so the only source is that LC domain reset. In addition, all the downstream nodes in the TL-UL fabric are also reset with that LC domain reset.

vogelpi commented 4 months ago

Thanks @a-will for looking into this and feeding back! I started looking at the very same place but was then interrupted by something else before drawing a conclusion.

Based on your assessment that:

  1. These resets all reset together when the whole LC domain is reset, and
  2. There is no software control of these infrastructure resets,

I am thinking that we don't need to do anything here for the current designs. Theoretically, we could think to look into this again for future releases. However, if we really wanted to reset these resets more independently, we probably would need to do a bigger overhaul of the entire reset architecture and investigate many other RDCs (e.g. EDN connections, RV_DM etc.). I am not sure how likely that is.

My suggestion would thus be to simply close this as not planned.

moidx commented 4 months ago

Thanks @andreaskurth, @a-will and @vogelpi for your assessment.

Leaving this issue as part of M4 to document the auditing work.

Closing as not planned change based on the analysis outcome.