verilog-to-routing / vtr-verilog-to-routing

Verilog to Routing -- Open Source CAD Flow for FPGA Research
https://verilogtorouting.org
Other
1.02k stars 393 forks source link

LUT rotation and congestion on 7-series graph #1046

Open litghost opened 5 years ago

litghost commented 5 years ago

@vaughnbetz suggested testing whether adding LUT equivalence would enable faster congestion avoidance on the 7-series graph. This issue for tracking the results of the router behavior in this case.

litghost commented 5 years ago

I've started a route run, but the results don't look good from a router congestion standpoint. The first run starts as follows:

---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Iter   Time    pres  BBs    Heap  Re-Rtd  Re-Rtd Overused RR Nodes      Wirelength      CPD       sTNS       sWNS       hTNS       hWNS Est Succ
      (sec)     fac Updt    push    Nets   Conns                                       (ns)       (ns)       (ns)       (ns)       (ns)     Iter
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Warning 109: No routing path for connection to sink_rr 1816053, retrying with full device bounding box
Warning 110: 90 timing endpoints were not constrained during timing analysis
   1    5.4     0.0    0 3.5e+07    5458   16533   18592 ( 0.545%)  617038 ( 7.3%)   17.930 -1.826e+04    -17.930   -0.07147     -0.036      N/A
   2   70.0     0.5   69 3.4e+08    4281   13447   15423 ( 0.452%)  441122 ( 5.2%)   18.333 -1.998e+04    -18.333    -0.2645     -0.132      N/A
   3  210.6     0.6   47 7.9e+08    3961   12841   14195 ( 0.416%)  449635 ( 5.3%)   17.925 -2.075e+04    -17.925    -0.2913     -0.146      N/A
   4  343.8     0.8   66 1.1e+09    3722   12422   12441 ( 0.365%)  457006 ( 5.4%)   17.704 -2.123e+04    -17.704      0.000      0.000      N/A

The number of overused rr nodes actually increased! It will take more time to see what will happen after further iterations.

FYI, this is using the maximum site pin delay in the lookahead.

litghost commented 5 years ago

For comparision, here the router log for the graph without LUT rotation support:

---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Iter   Time    pres  BBs    Heap  Re-Rtd  Re-Rtd Overused RR Nodes      Wirelength      CPD       sTNS       sWNS       hTNS       hWNS Est Succ
      (sec)     fac Updt    push    Nets   Conns                                       (ns)       (ns)       (ns)       (ns)       (ns)     Iter
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Warning 109: 90 timing endpoints were not constrained during timing analysis
   1   13.7     0.0    0 7.1e+07    5615   16933   12195 ( 0.345%)  772474 ( 9.1%)   17.971 -1.959e+04    -17.971      0.000      0.000      N/A
   2  117.8     0.5  108 5.3e+08    4558   13917   10500 ( 0.297%)  506006 ( 6.0%)   18.251 -2.156e+04    -18.251      0.000      0.000      N/A
   3  266.2     0.6   41 1.1e+09    4335   13450    9616 ( 0.272%)  514247 ( 6.1%)   18.095 -2.240e+04    -18.095      0.000      0.000      N/A
   4  360.9     0.8   71 1.5e+09    4102   12972    8808 ( 0.249%)  522693 ( 6.2%)   18.208 -2.302e+04    -18.208      0.000      0.000      N/A
   5  372.9     1.1   72 1.6e+09    3810   12351    7008 ( 0.198%)  529215 ( 6.2%)   18.118 -2.333e+04    -18.118      0.000      0.000      N/A
   6  390.6     1.4   52 1.7e+09    3491   11444    5575 ( 0.158%)  538942 ( 6.3%)   18.113 -2.361e+04    -18.113      0.000      0.000      N/A
   7  375.3     1.9   43 1.6e+09    3037   10209    4317 ( 0.122%)  545791 ( 6.4%)   18.121 -2.397e+04    -18.121      0.000      0.000      N/A
   8  318.0     2.4   25 1.4e+09    2605    8787    3099 ( 0.088%)  553873 ( 6.5%)   18.125 -2.414e+04    -18.125      0.000      0.000      N/A
   9  261.2     3.1   18 1.1e+09    2143    7317    2100 ( 0.059%)  560288 ( 6.6%)   18.091 -2.423e+04    -18.091      0.000      0.000      N/A
  10  185.3     4.1   20 8.3e+08    1690    5650    1367 ( 0.039%)  565327 ( 6.7%)   18.054 -2.432e+04    -18.054      0.000      0.000       35
  11  127.1     5.3   11 5.6e+08    1260    4214     885 ( 0.025%)  568271 ( 6.7%)   18.073 -2.444e+04    -18.073      0.000      0.000       31
  12   74.4     6.9    7 3.3e+08     898    2913     605 ( 0.017%)  570946 ( 6.7%)   18.137 -2.447e+04    -18.137      0.000      0.000       29
  13   51.2     9.0    7 2.3e+08     678    2219     395 ( 0.011%)  572867 ( 6.7%)   18.135 -2.450e+04    -18.135      0.000      0.000       28
  14   36.4    11.6    6 1.5e+08     500    1645     260 ( 0.007%)  574420 ( 6.8%)   18.135 -2.451e+04    -18.135      0.000      0.000       28
  15   29.0    15.1    3 1.1e+08     380    1289     162 ( 0.005%)  575277 ( 6.8%)   18.135 -2.454e+04    -18.135      0.000      0.000       27
  16   21.2    19.7    3 7.8e+07     296    1034     118 ( 0.003%)  575974 ( 6.8%)   18.135 -2.454e+04    -18.135      0.000      0.000       27
  17   13.5    25.6    3 5.2e+07     253     865      71 ( 0.002%)  576680 ( 6.8%)   18.121 -2.455e+04    -18.121      0.000      0.000       27
  18   13.0    33.3    3 4.7e+07     208     762      56 ( 0.002%)  577071 ( 6.8%)   18.121 -2.455e+04    -18.121      0.000      0.000       27
  19   12.7    43.3    1 4.0e+07     196     711      34 ( 0.001%)  577711 ( 6.8%)   18.121 -2.456e+04    -18.121      0.000      0.000       28
  20    7.4    56.2    1 2.3e+07     172     641      22 ( 0.001%)  577900 ( 6.8%)   18.121 -2.457e+04    -18.121      0.000      0.000       28
  21    5.2    73.1    1 1.8e+07     167     613      18 ( 0.001%)  577818 ( 6.8%)   18.121 -2.457e+04    -18.121      0.000      0.000       28
  22    5.2    95.0    2 1.6e+07     158     584      11 ( 0.000%)  578063 ( 6.8%)   18.121 -2.457e+04    -18.121      0.000      0.000       28
  23    4.9   123.5    1 1.7e+07     157     585       4 ( 0.000%)  578290 ( 6.8%)   18.183 -2.462e+04    -18.183      0.000      0.000       28
  24    2.3   160.6    0 8605946     150     556       5 ( 0.000%)  578313 ( 6.8%)   18.121 -2.457e+04    -18.121      0.000      0.000       27
  25    3.3   208.8    0 1.1e+07     152     571       2 ( 0.000%)  578396 ( 6.8%)   18.183 -2.462e+04    -18.183      0.000      0.000       28
  26    1.5   271.4    0 6300157     149     556       3 ( 0.000%)  578496 ( 6.8%)   18.121 -2.457e+04    -18.121      0.000      0.000       27
  27    2.5   352.8    1 9001826     152     564       2 ( 0.000%)  578430 ( 6.8%)   18.121 -2.457e+04    -18.121      0.000      0.000       28
  28    1.2   458.7    0 5443741     151     563       1 ( 0.000%)  578520 ( 6.8%)   18.121 -2.459e+04    -18.121      0.000      0.000       28
  29    6.5   596.3    1 1.6e+07     155     581       2 ( 0.000%)  578549 ( 6.8%)   18.121 -2.457e+04    -18.121      0.000      0.000       28
  30    1.2   775.1    0 5478818     151     563       0 ( 0.000%)  578623 ( 6.8%)   18.686 -2.479e+04    -18.686      0.000      0.000       29
vaughnbetz commented 5 years ago

Having more congestion initially might be due to everything wanting the fastest LUT input. Hopefully will still converge faster. You may also want to try a run with max_criticality clipped to something small (e.g. 0.1, or 0) so the router doesn't care that one input is faster.

litghost commented 5 years ago

Some results are in.

LUT equivilance pres_fac_mult acc_fac max_criticality CPD (ns) Runtime (sec) A* factor BB factor first_iter_pres_fac initial_pres_fac Reconvergence count Iterations
On 1.3 1 0.99 18.4141 2884.13 1.2 10 0 0.5 1 34
On 2 1 0.99 71.9663 1944.88 1.2 10 0 0.5 1 22
On 1.3 1 0.1 33.5886 2998.58 1.2 10 0 0.5 1 17
On 2 2 0.99 18.1179 1747.27 1.2 10 0 0.5 1 12
Off 1.3 1 0.99 20.1673 2310.6 1.2 10 0 0.5 1 30
Off 2 1 0.99 22.397 1403.21 1.2 10 0 0.5 1 16
Off 2 1 0.1 35.1067 1939.35 1.2 10 0 0.5 1 18
vaughnbetz commented 5 years ago

Thanks. The fact that max_crit = 0.1 takes longer than max_crit = 0.99 is very strange. It implies something weird is happening; maybe the lookahead is not predicting the total base_cost of the resources expected on the path well?

vaughnbetz commented 5 years ago

Adding @YFWang97 @xuqinziyue @cindyhou to the discussion as they're working on Symbiflow quality.