Timing of DMI req/rsp signals

rswarbrick commented 2 weeks ago

I've spent a rather confusing couple of hours trying to diagnose behaviour that I'm seeing in OpenTitan. There, we have a frontdoor agent that does DMI transactions as follows:

Do a JTAG write to the dmi register with the operation required.
Wait a short time
Do a JTAG write to the dmi register with op set to zero (a "nop").
Check the response. If it is equal to 3 ("operation in progress") then write over JTAG to the dmireset field in dtmcs to clear the flag, then go back to the previous step.
If the response is not 3 then the response can be treated as the result of a DMI register read (if needed) and we are done.

The "short time" that is currently in the agent is 10 TCK cycles. But this is not really enough. Specifically, the CDC component (i_dmi_cdc in dap) takes a reasonable time for the dmi_resp_i signal to be synchronised to dmi_resp in dmi_jtag.sv. This timing depends on the implementation of prim_fifo_async_simple.

In an example test that I've just run, the time from the end of the first JTAG transaction to when dmi_resp_valid goes high in dmi_jtag is 13 TCK cycles. Here's a screenshot of waves that show this:

Unfortunately, the OpenTitan frontdoor didn't wait long enough (because of the 10 TCK cycles above). The result is that the "nop" JTAG transaction starts too early and the CaptureDr JTAG state happens a cycle before the response comes back. The end result is that our frontdoor agent thinks the DMI operation is complete, but then gets rather confused when it sees a "busy" response in the next JTAG transaction it sends.

There's a trivial short-term workaround: to increase the "short time" that we wait in the DV agent. (I'm going to do that). But I'm a bit confused: is there a minimum number of clk/TCK cycles between request and reading the response? (And is it described anywhere?) If not, maybe it makes sense to make things a bit more robust by treating an operation as being in progress from when the req goes out to when the response comes back.

rswarbrick commented 2 weeks ago

Oh! I've just realised that the implausibly large delay is caused by the fact that TCK is not being toggled (so the CDC takes a long time!). So there's an easy workaround on the OpenTitan side: make sure that TCK is being toggled in that time.

The result ends up being a 4 cycle wait between request and response, which is much more sensible! But I think my question in the last paragraph still stands.

bluewww commented 2 weeks ago

But I'm a bit confused: is there a minimum number of clk/TCK cycles between request and reading the response? (And is it described anywhere?)

I don't think we have documented anything like that. Usually, we have logic that does an exponential (or linear) backoff whenever we hit "busy" as is described in the spec.

If not, maybe it makes sense to make things a bit more robust by treating an operation as being in progress from when the req goes out to when the response comes back.

I'm confused. Shouldn't that be the case already?

rswarbrick commented 2 weeks ago

I'm thinking about the following sequence:

Completion of JTAG transaction 0, which starts a DMI request
The request comes out of dmi_jtag and is sent to dm_top (going through a CDC).
The DMI request is handled in dm_top and a response is generated there and sent back towards dmi_jtag, entering the CDC.
JTAG transaction 1 starts. This will have op=0, so should be a "no-op". It gets as far as the capture phase.
The response captured for JTAG transaction 1 will say that the DMI operation had completed.
CDC completes and dmi_resp.resp happens to be dm::DTM_BUSY (say), so dmi_jtag sets error_dmi_busy.
JTAG transaction 1 completes.
JTAG transaction 2 runs. Regardless of the operation, it will result in a DMIBusy response.

From the outside agent's point of view, it sees:

Send JTAG transaction 0 to start DMI operation A.
Send a no-op (transaction 1) and discover that operation A completed.
Start DMI operation B (transaction 2), but discover that it was dropped because there is an operation in flight. Huh? What operation?

pulp-platform / riscv-dbg

Timing of DMI req/rsp signals #166