Open litghost opened 5 years ago
After writing out https://github.com/verilog-to-routing/vtr-verilog-to-routing/issues/1027#issue-515780120, I realized a simple solution to fix the site pin "hump" behavior. There is still an open question of whether the modelling strategy should be changed to accommodate LUT rotation.
@vaughnbetz @kmurray @HackerFoo
@litghost What is the simple solution?
@litghost What is the simple solution?
There is already a connection box annotation per IPIN
node, so we can add the site pin delay to the that annotation, and combine the generic delay matrix with specific site pin delay.
During computation of the delay matrix, we can also subtract out the specific site pin delay, so that we have compatible data.
Node 3061293 and
3061293 -> 3061416
edge is the node/edge that models the site pin delay. When routing with an A* factor = 1.0, the router pops node3061293
until after 4017 pops. The router doesn't pop node3061416
until after 1210952 pops. This is because the backward cost of moving from node3061293
to3061416
increases from9.7148e-10
to1.48348e-09
(e.g. increase of5.12e-10
)
I'd classify this as a lookahead issue. If you aren't capturing these characteristics then the router will continue to explore since it doesn't realize there is no better path.
There is already a connection box annotation per
IPIN
node, so we can add the site pin delay to the that annotation, and combine the generic delay matrix with specific site pin delay.During computation of the delay matrix, we can also subtract out the specific site pin delay, so that we have compatible data.
Yes, fixing the lookahead to capture the effect of these different delays seems like the right approach.
I'll add that at the moment the packer router is not timing driven, so it only performs LUT rotations for routability purposes.
On the long-term roadmap the plan is to unify the inter and intra block RR graphs which would allow a single stage routing which would do LUT rotations correctly for delay.
Modeling the average input pin/site delay in the lookahead will help avoid back-tracking. You could at least model the smallest delay, which will lead to some backtracking, but less backtracking than if you model 0 ps in the lookahead for these elements. If the router really has no ability to choose a better input and get a faster result, then modeling the average delay in the lookahead and changing the rr-graph to match that also seems possible. I suggest we discuss this when I'm at Google tomorrow, since it will be easier to have an interactive discussion about the details.
Context
This question is a mix of modelling and route behavior. I'll start with a description of the 7-series rrgraph import behavior around "site pin delays", and then describe the observed effect this has on the router. Then a short discussion around potential alternatives.
Modelling of delay between the rrgraph and pb_type
The 7-series rrgraph part description has 3 principle parts:
Pips connect two nodes together. This is modelled in VPR as an edge between two CHANX / CHANX rr nodes.
Site pins represent the boundary between the routing graph (e.g. nodes and pips) and a site (BELs, routing BELs, site wires). In VPR terms, a "site" is roughly a
pb_type
. Site pins contain timing information, in particular:switch
How the site pin is currently modeled
Because the site pins lives on the boundary between the rrgraph and the
pb_type
, there are some choices where it is modelling. The output drive resistance and input capacitance pretty much have to live in the rrgraph to be models, so currently the 7-series graph import models the site pin as an additionalCHANX/Y
and edge between theIPIN
orOPIN
node. Currently the 7-series routing import places the intrinsic delay on this edge.Digression about BEL delays
The choice really comes about where to model the intrinsic delay of the site pin. There is an additional point of information that is very relevant here, which about BEL timing of BELs that connect directly to an
IPIN
. The combination timing information on thoseBEL
can be expressed without loss of generality as:therefore there is a free variable in how the delays are modeled. I'm explicitly mentioning this because of how the
LUT6
delays are modeled. TheBEL
delays are the same for each pin of theLUT6
, which I've pasted below:What you can see is that the
delay from BEL input to output
shows 0 delay differences between the 6 inputs, which is surprising. The LUT structure should likely have some variation between the inputs, and they do. However this delay variation is modelling not in theBEL
delay but in the site pin delay. Example is below. The delays for each input of the LUT are A1 - A6 respectively.As a speculation, this was done to allow the Vivado router the choice of LUT inputs at route time to perform LUT rotation, rather than doing LUT rotation during pack/place.
Implication of where the site pin intrinsic delay is modelled
There two implications to modelling the intrinsic delay in the routing graph:
IPIN
/SINK
nodeThe first point is not as important at this minute, so lets focus on the implication of the second point. Let me layout an example.
Router behavior example
The router wants to route from
CLBLL_R_X17Y138/CLBLL_L_CMUX
toCLBLM_R_X29Y18/CLBLM_M_A2
, this is an on graph distance of about(-28, -125)
. The final three nodes in the route will always be:Node 3061293 and
3061293 -> 3061416
edge is the node/edge that models the site pin delay. When routing with an A* factor = 1.0, the router pops node3061293
until after 4017 pops. The router doesn't pop node3061416
until after 1210952 pops. This is because the backward cost of moving from node3061293
to3061416
increases from9.7148e-10
to1.48348e-09
(e.g. increase of5.12e-10
). This increase in delay can be completely attributed to the site pin intrinsic delay:The lookahead doesn't account for this level of sharp delay at the end, because it varies dramatically from pin to pin, consider from the same tile: