opensocdebug / osd-doc

Open Soc Debug Documentation, including the specification
http://www.opensocdebug.org
Other
1 stars 3 forks source link

Debug Network Flow Control #2

Open wsong83 opened 8 years ago

wsong83 commented 8 years ago

One concern has been discussed in the email with Stefan regarding the possible deadlock caused by the incapability of debug modules to ensure a packet can be fully accepted (enough space in their local buffer).

Here is another one. The Debug Packet Datagram (DPD) used between host and physical transport has the length field in the header. However, there is no length field in debug packets. So for the host interface, it needs to receive the whole packet to count a length and then it can send out the packet in the format of DPD. This requires the host interface must has enough space for at least one debug packet; however, the length of a debug packet can be arbitrarily long as indicated in the spec.

Besides, let us consider the scenario that the host interface is busy and there is a long debug packet incoming for host interface. The host interface is suppose to ignore the packet (low ready) as it is busy. Then the packet will continue cycling on the NoC. The available pipeline stages in the Ring NoC is limited (the number of Ring Routers). What happens when the packet header reaches the sending Router and that router is still sending more flits? Will the header be lost or the router must stop sending flits? If the router stops sending flits, it is suppose the remember where to resume? And could other router start sending a packet see there is a gap (host interface begins to accept), but actually the the previous packet is not finished yet. One solution could be enforcing all routers to track tail flits.

wallento commented 8 years ago

Hi Wei,

you are right, the spec should be updated to include a size limitation of the debug packet. This size is the minimum size of the new Host Interface Module derived from https://github.com/TUM-LIS/lisnoc/blob/master/rtl/infrastructure/lisnoc_packet_buffer.v This should solve the inconsistencies, right?

Regarding the second point, packets can generally not pass their destination. If it is not ready, the ring blocks. Otherwise you run into serious deadlock problems.

But this leads me to my plan to open a long term issue to discuss flow control in the debug network. I will hijack this one now :)

wsong83 commented 8 years ago

Still not sure how the NoC can be blocked considering the NoC is distributed without backpressure? Having a global enable/block signal is kind of heavy. Also now seems it could also have clock domain issues. What happens if routers are actually not in the same clock domain. Or may be you can force them to be letting cross-domain FIFOs inside Ring routers or even debug modules. For the 1st test chip, I assume we can ignore the clock domain issue as there is no DVFS yet.

wallento commented 8 years ago

There is flit flow control from router to router for backpressure.

All routers must be in the same clock domain. Clock domain crossing should occur in each debug module between the debugged clock domain and the debug clock domain. There can be multiple different debugged clock domains, but only one debug clock domain.

Does that sound reasonable?

wallento commented 8 years ago

For reference:

wsong83 commented 8 years ago

Yes, router to router flow control should be fine. So far I am OK with the clock domain issue but perhaps need more investigation when DVFS is actually implemented. My small concern is the number of FIFOs needed for the cross clk issue if FIFOs are in debug modules. Thanks for the clarification.

wallento commented 8 years ago

Debug Network

The debug network is a ring with the following properties:

Host Interface

The host interface width is identical to the width on the debug network. Instead of the extra first and last bit it convert it to the length-value format. The same applies to the other direction.

Packets from the Host

Packets from the host go into the debug module. For the moment we can safely assume they are processed sufficiently fast.

Packets from the Debug Modules

The debug modules can produce packets at an arbitrary rate.

Backpressure from the Host Interface

The host interface can be blocked, so that backpressure gets into the network. If the network can not further compensate with its buffers, the debug modules are accordingly blocked. Hence the debug modules should have capabilities to buffer debug packets. When one debug module overflows that means it has to raise an overflow signal.

There are currently two strategies for overflow signals planned:

Problem: How to implement proper flow control?

This leads to the issue of how to have proper flow control, namely how to avoid blockages of one debug module by another. I think there are many strategies out there. I will think about it, but maybe we can also get some real networking/NoC experts into the discussion.