xlsynth / bedrock-rtl

High quality and composable base RTL libraries in Verilog
Apache License 2.0
11 stars 2 forks source link

RDC-robust credit initialization #73

Open bgelb-openai opened 1 month ago

bgelb-openai commented 1 month ago

It would be useful to define a credit exchange protocol for credit/valid style channels that cross a reset domain boundary.

The baseline credit_stall based protocol is a good start, but really only handles initial reset release:

What is needed is to handle arbitrary re-entry into reset of one or both sides of the channel. The desired behavior is:

I haven't mapped out in detail, but think this should need like a 3-4 state FSM on each side, the state of which is exported to the other link partner.

mgottscho commented 1 month ago

Would the scope of this protocol just be for managing credits across the RDC, or would it be part of a larger reset architecture? It seems there is some desire for the latter, given the mention of other cleanup actions.

Additionally, since this would be for an RDC, is the assumption that each domain is completely independent of the other, i.e., domain A can enter and exit reset freely while domain B may always be out of reset?

bgelb-openai commented 1 month ago

I would not want this common component to be too opinionated about a larger reset architecture. Surely it could provide useful capabilities to that end, but I don't think that is necessarily the point.

I think the desired behavior here is good from a resilience POV generally. If two independent components are connected by a credited channel, and one of them encountered an error/problem that can only be recovered from via a reset, it is generally possible to reset just that component (perhaps in combination w/ some additional action to quiesce traffic at a higher level) and recover.

That is a good property that generally opens up a lot of options from a survivability POV.

mgottscho commented 1 hour ago

Summarizing some offline discussion. Not trying to design a general reset architecture here. The goal is to support credit/valid flow control where there can be a reset domain crossing or a reset skew boundary in between sender and receiver.

Currently the credit modules support only a reset skew boundary, i.e., both sender and receiver must enter and exit reset (with some skew in entry and exit times). We want to generalize it so that sender and receiver can independently enter and exit reset without breaking the flow control correctness, i.e., no credits should be permanently lost.

We think this can be done by changing credit_stall to mean sender_in_reset (wire going from sender to receiver) and then adding a second signal called receiver_in_reset (wire going from receiver to sender). These wires will control the reinitialization of credit counters on either end and also "transmission gate" the datapath as needed.

This scheme won't actually quiesce traffic or do transaction-level reset protocol. But we feel having a reset-robust flow control mechanism is a necessary (but insufficient) condition for general reset architecture.