verilog-to-routing / vtr-verilog-to-routing

Verilog to Routing -- Open Source CAD Flow for FPGA Research
https://verilogtorouting.org
Other
1k stars 385 forks source link

High fanout routing logic causes routing failure #479

Open litghost opened 5 years ago

litghost commented 5 years ago

Expected Behaviour

Router should route all nets.

Current Behaviour

Outputs the warning: Warning 815554: No routing path found in high-fanout mode for net connection (to sink_rr 125107), retrying with full route tree

and eventually fails to route after a while.

Possible Solution

Disabling the high fanout mode with "--router_high_fanout_threshold -1" resolves the issue in the short term.

Steps to Reproduce

  1. Build VPR from https://github.com/verilog-to-routing/vtr-verilog-to-routing/pull/478
  2. Unzip failing_test.zip to a directory
  3. Run:
/usr/local/google/home/keithrothman/cat_x/vtr-verilog-to-routing/vpr/vpr arch.unique_pack.xml top.eblif --device xc7a50t-test --read_rr_graph rr_graph_xc7a50t_test.rr_graph.real.xml --min_route_chan_width_hint 100 --max_criticality 0.0 --max_router_iterations 500 --routing_failure_predictor off --constant_net_method route --route_chan_width 500 --clock_modeling route --place_algorithm bounding_box --enable_timing_computations off --allow_unrelated_clustering on --route

and it will fail.

run:

/usr/local/google/home/keithrothman/cat_x/vtr-verilog-to-routing/vpr/vpr arch.unique_pack.xml top.eblif --device xc7a50t-test --read_rr_graph rr_graph_xc7a50t_test.rr_graph.real.xml --min_route_chan_width_hint 100 --max_criticality 0.0 --max_router_iterations 500 --routing_failure_predictor off --constant_net_method route --route_chan_width 500 --clock_modeling route --place_algorithm bounding_box --enable_timing_computations off --allow_unrelated_clustering on --route --router_high_fanout_threshold -1

and routing will succeed.

litghost commented 5 years ago

Failing router output:

---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Iter   Time    pres  BBs    Heap  Re-Rtd  Re-Rtd Overused RR Nodes      Wirelength      CPD       sTNS       sWNS       hTNS       hWNS Est Succ
      (sec)     fac Updt    push    Nets   Conns                                       (ns)       (ns)       (ns)       (ns)       (ns)     Iter
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------

   1    7.0     0.0    0 3.4e+07     618    2415     866 ( 0.115%)   37029 ( 0.6%) 41914500000000.000 -6.694e+15 -41914500000000.000 -3.601e+14 -17096500000000.000      N/A
   2   11.5     0.5   20 4.6e+07     405    1857     275 ( 0.036%)   36918 ( 0.6%) 45776500000000.000 -6.743e+15 -45776500000000.000 -3.745e+14 -17096500000000.000      N/A
   7    6.4     1.9    3 2.7e+07     191    1004       2 ( 0.000%)   38239 ( 0.6%) 43017500000000.000
   8    6.1     2.4    2 2.5e+07     187     969       1 ( 0.000%)   38396 ( 0.6%) 43017500000000.000 
   9    6.5     3.1    1 2.7e+07     185     952       1 ( 0.000%)   38243 ( 0.6%) 45223500000000.000 -6.788e+15 -45223500000000.000 -3.993e+14 -17096500000000.000      N/A

You can see in this case that the failing case, the router gets close to a solution, but for whatever reason cannot place the last net.

Router output with "--router_high_fanout_threshold -1", the router takes long to get to a low overused count, but finishes converging.

   1    7.2     0.0    0 3.7e+07     618    2415     842 ( 0.112%)   36907 ( 0.6%) 35848000000000.000 -5.424e+15 -35848000000000.000      0.000      0.000      N/A
   2   11.0     0.5   20 4.9e+07     401    1868     260 ( 0.034%)   36465 ( 0.6%) 37502500000000.000 -5.632e+15 -37502500000000.000      0.000      0.000      N/A
   3   11.1     0.6   11 4.4e+07     265    1316     134 ( 0.018%)   37092 ( 0.6%) 35848500000000.000 -5.391e+15 -35848500000000.000      0.000      0.000      N/A
   4    7.2     0.8    4 2.7e+07     114     783      52 ( 0.007%)   37560 ( 0.6%) 36950500000000.000 -5.612e+15 -36950500000000.000      0.000      0.000      N/A
   5    7.6     1.1    5 3.1e+07     192    1035       7 ( 0.001%)   37624 ( 0.6%) 36399000000000.000 -5.322e+15 -36399000000000.000      0.000      0.000      N/A
   6    6.5     1.4    6 2.7e+07     188     962       2 ( 0.000%)   37587 ( 0.6%) 38053500000000.000 -5.289e+15 -38053500000000.000      0.000      0.000      N/A
   7    6.5     1.9    5 2.6e+07     179     921       3 ( 0.000%)   37633 ( 0.6%) 38053500000000.000 -5.369e+15 -38053500000000.000      0.000      0.000      N/A
   8    5.9     2.4    3 2.4e+07     175     898       0 ( 0.000%)   37756 ( 0.6%) 37503000000000.000 -5.425e+15 -37503000000000.000      0.000      0.000      N/A
kmurray commented 5 years ago

From the router output you listed it looks like the router is still succeeding at finding a path, but is running into convergence issues when resolving routing congestion. That's not as bad as being unable to find any path.

If you bump up the number of routing iterations (e.g. beyond 9) does it eventually converge? It's not unusual in more difficult cases for this to take a while.

litghost commented 5 years ago

If you bump up the number of routing iterations (e.g. beyond 9) does it eventually converge? It's not unusual in more difficult cases for this to take a while.

It never converges, it always get stuck with at least one congested node.

kmurray commented 5 years ago

OK. That probably means the bounding box or subset of the route tree used to route those congested connections is too restrictive in your architecture.

There is already code which tries to adjust the bounding box (for non-high fanout nets) to avoid this issue. This likely indicates we need something similar for routing high-fanout sinks as well.

Another potential idea would be to fall back to the full route tree (giving the router more flexibility to detour around congestion) if there are only a handful of congested nodes which are having difficulty resolving.

kmurray commented 5 years ago

@litghost Looks like the .zip doesn't include the rr graph. Would it be possible to get the RR graph to reproduce the issue?