verilog-to-routing / vtr-verilog-to-routing

Verilog to Routing -- Open Source CAD Flow for FPGA Research
https://verilogtorouting.org
Other
1k stars 385 forks source link

Router lookahead inefficiency for large blocks #1580

Open vaughnbetz opened 3 years ago

vaughnbetz commented 3 years ago

Currently the router lookahead takes two things:

For large blocks, VPR's rr-graph builder sets the (x,y) of all SINK nodes to the lower-left corner of the block. This may be far from the actual pins we're trying to get to, making the router slower and less effective.

Expected Behaviour

Router should route these connections efficiently.

Current Behaviour

Titan results suggest this is hurting us on large blocks; logically equivalent pins on those blocks don't help us as much as expected and we get some circuitous routes to large blocks both with and without logical equivalence.

Possible Solution

I think the cleanest solution would be for the rr-graph builder to assign a smarter (x,y) for SINKs in large blocks. It could be the average (x,y) of the IPINs that can reach that SINK, or the (x,y) of a randomly chosen IPIN that reaches the SINK.

vaughnbetz commented 3 years ago

@litghost @acomodi @tangxifan FYI. Looking for an owner ...

tangxifan commented 3 years ago

Hi @vaughnbetz , I have reading the router lookahead map codes.

https://github.com/verilog-to-routing/vtr-verilog-to-routing/blob/574525c9f14e4c2c8856ee992d37888eaa0b3526/vpr/src/route/router_lookahead_map.cpp#L1055

It seems that @acomodi has patched the issue as you said. Currently, the sink coordinate to be used in calculating the delta_xy() is the center coordinate of a multi-height and multi-width block. It may resolve the QoR problem. I am running the titan regression tests now.

Something we can try is to elaborate the adjust_rr_src_sink_posistion() function

Looking forward to your reply.

vaughnbetz commented 3 years ago

Thanks Xifan.

  1. The code averages the xlow,xhigh etc. coordinates. But I'm not sure that most rr-graphs actually set the SINK and SOURCE rr-nodes to have xlow != xhigh. It looks like the symbiflow graphs do (otherwise why have this code?) but I don't think the rr-graph builders in vpr do that (at least they used to just set the xlow and xhigh of SINK or SOURCE rr-node to be the lower-left (anchor) point of a large block.

    • Action: how do we set the xlow,xhigh,ylow,yhigh for SINKs and SOURCEs in the rr-graph builder? Should we change them to cover an entire large block (if they don't already)? Probably should change them if it doesn't break anything, as using a target location for a SINK/SOURCE of the middle of the block is probably better than using the lower-left corner.
  2. Using the center of a large block is a bit better, but still not as good as figuring out where we are really trying to get to for IPINs/SINKs. SINKs are the most important as they're at the end of a route, so having a good lookahead prediction is important. We should backtrack from the SINK to the IPINs that can reach it, and use the average of their location as the target location / SINK location.

  3. The x,y location error doesn't matter as much for SOURCEs, for routing as they are at the start of the routing path and hence the lookahead doesn't matter much for them -- any SOURCE->OPIN location mismatch will get fixed at the start of the router search for a net.

  4. This location offset will also matter for the placement, for both SOURCEs and SINKs, if we are simply looking at the center of the block (not the pin locations) for large blocks. This will affect both SOURCEs and SINKs (DRIVERs and RECEIVERs) during placement. I think that's a separate issue, but should be looked at after the router works well. I think the placer should calculate a pin location (location of pin connected to a signal in the netlist probably) on a block as it computes bounding boxes etc. if it doesn't already.

vaughnbetz commented 3 years ago

I agree with your plan (reproduced below), but fixed a typo: We can visit all the fan-out nodes of the SOURCE node, which are OPINs. The we average the x_low(), ylow() of these OPINs as the final coordinate to return. We can visit all the fan-in nodes of the SINK node, which are IPINs. The we average the x_low(), ylow() of these IPINs as the final coordinate to return.

tangxifan commented 3 years ago

@vaughnbetz Thanks for the insights. I double checked the rr_graph builder

https://github.com/verilog-to-routing/vtr-verilog-to-routing/blob/574525c9f14e4c2c8856ee992d37888eaa0b3526/vpr/src/route/rr_graph.cpp#L1437

You can see (xlow, ylow) is the left-bottom point of a grid, while (xhigh, yhigh) is the top-right point of a grid.

https://github.com/verilog-to-routing/vtr-verilog-to-routing/blob/574525c9f14e4c2c8856ee992d37888eaa0b3526/vpr/src/route/rr_graph2.cpp#L1120

Given the facts, I can focus on optimizing the coordinate adjustment in router lookahead map. There is no need to change the rr_graph builder.

vaughnbetz commented 3 years ago

Thanks @tangxifan. I agree.

tangxifan commented 3 years ago

@vaughnbetz I have started the code implementation. But I have realized a serious problem: There is no input edge storage for any rr_node in current data structure. Without the input edge list, I cannot find the IPIN nodes that drive a SINK node easily. As you said that SINK node position is very critical, it has been a critical roadblock then.

Therefore, I would like to learn any solution that can build the fan_in edges quickly.

I have two solutions in my mind:

litghost commented 3 years ago

@vaughnbetz I have started the code implementation. But I have realized a serious problem: There is no input edge storage for any rr_node in current data structure. Without the input edge list, I cannot find the IPIN nodes that drive a SINK node easily. As you said that SINK node position is very critical, it has been a critical roadblock then.

Therefore, I would like to learn any solution that can build the fan_in edges quickly.

I have two solutions in my mind:

  • I iterate over the rr_node. For each IPIN node, I check if its outgoing edge drives a SINK node. Then I can build a list.
  • Add the incoming edges to the rr_graph.

So one memory efficient way to do this is change how rr_graph_storage sorts the edge arrays, see https://github.com/verilog-to-routing/vtr-verilog-to-routing/blob/ce725b0f6b1b3fef1b880a5618b390cfd77a2552/vpr/src/route/rr_graph_storage.cpp#L406-L413 When you do this kind of sorting, you need to put it back once you are done.

Basically the process would look something like:

This operation is potentially CPU heavy, but won't increase the memory footprint.

vaughnbetz commented 3 years ago

Could we do this operation once (at rr_graph build or load time) by walking through an on-demand created fanin edge list (routines to create it exist), and updating the location of a SINK to be the average of the fan-in IPINs? Then this computation moves out of the CPU- and memory-critical router routines. I think it achieves the same results as computing the average of the pin locations during route time; we just pre-compute and store it as the SINK locations.

vaughnbetz commented 2 months ago

Sending this to @nedsels for consideration -- a bunch of this is brainstorming on large block sink locations for the router lookahead which should be relevant to the fix up you're contemplating.

vaughnbetz commented 2 months ago

Tagging @AlexandreSinger and @amin1377 as they may also be interested.