verilog-to-routing / vtr-verilog-to-routing

Verilog to Routing -- Open Source CAD Flow for FPGA Research
https://verilogtorouting.org
Other
1k stars 385 forks source link

Variable routing "effort" based on criticality #658

Open litghost opened 5 years ago

litghost commented 5 years ago

Current Behaviour

bb factor and A* are constant across nets.

Proposed Behaviour/Solution

bb factor and A variable vary based on slack. If the nets along the path meet timing, the just use default bb factor and A. Each iteration that a net is along the path does not meet timing, and the router chooses to "rip up", the bb factor and A* factor change to increase the effort the router searches for a route. The idea here is that critical path nets deserve more effort, rather than increasing the effort uniformly.

Context

I'm currently exploring making VPR produce "good" solutions on 7-series graph architecture, and have found that increase bb factor (effectively removing the bounding box) and lowering the A* factor (1.2 -> 0, 0.4), have been important for closing timing.

I'm currently working on making a demo using the Murax SoC close timing at a clock of 100 MHz. Vivado took the same synthesis and was "only" able to reach a slack of 1.434 ns (e.g. critical path delay of 8.566 ns).

Here is the current results I've been getting so far (with various A and bb factor values). These are all using the same placement (generated a delay matrix generated with A factor = 0 and bb factor = 10000):

A* factor bb factor Runtime (sec) CPD (ns) Freq (MHz)
1.6 Default 762.04 15.1719 65.9113
0 10000 16109.03 8.87276 112.704
1.4 10000 2629.47 14.2745 70.0552
0.4 10 4670.23 10.8469 92.1925
1 100 3147.56 11.1741 89.4929
1 50 3173.08 13.3147 75.1051
1 25 3037.93 15.0482 66.453
0.6 10 3470.07 11.0593 90.4215
0.6 20 5279.98 11.1071 90.0329
0.6 100 6545.01 10.4561 95.6375
1.2 100 2695.66 15.0492 66.4487
litghost commented 5 years ago

Example routing logs

A* factor = 0, bb factor = 10000:

---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Iter   Time    pres  BBs    Heap  Re-Rtd  Re-Rtd Overused RR Nodes      Wirelength      CPD       sTNS       sWNS       hTNS       hWNS Est Succ
      (sec)     fac Updt    push    Nets   Conns                                       (ns)       (ns)       (ns)       (ns)       (ns)     Iter
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Warning 271068: 17 timing startpoints were not constrained during timing analysis
Warning 271069: 18 timing endpoints were not constrained during timing analysis
   1 6379.7     0.0    0 1.5e+10    2661    8648    4495 ( 0.415%)  222599 ( 5.4%)   19.604     -404.7     -9.604     -145.0    -12.912      N/A
   2 1006.9     0.5    0 3.5e+09    2099    6550    3858 ( 0.356%)  158068 ( 3.8%)    8.780      0.000      0.000     -32.69     -3.321      N/A
   3 1366.9     0.6    0 4.4e+09    2008    6112    3276 ( 0.302%)  160241 ( 3.9%)    8.521      0.000      0.000      0.000      0.000      N/A
   4  906.6     0.8    0 2.8e+09    1906    5760    3389 ( 0.313%)  159127 ( 3.9%)    8.517      0.000      0.000      0.000      0.000      N/A
   5 1003.2     1.1    0 2.7e+09    1833    5475    2737 ( 0.253%)  159883 ( 3.9%)    8.517      0.000      0.000      0.000      0.000      N/A
   6  882.6     1.4    0 2.3e+09    1672    4991    2204 ( 0.203%)  160852 ( 3.9%)    8.517      0.000      0.000      0.000      0.000      N/A
   7  744.7     1.9    0 2.0e+09    1472    4398    1695 ( 0.156%)  160975 ( 3.9%)    8.517      0.000      0.000      0.000      0.000      N/A
   8  657.4     2.4    0 1.7e+09    1321    3866    1368 ( 0.126%)  162065 ( 3.9%)    8.517      0.000      0.000      0.000      0.000      N/A
   9  551.0     3.1    0 1.5e+09    1156    3399    1030 ( 0.095%)  162674 ( 3.9%)    8.517      0.000      0.000      0.000      0.000      N/A
  10  488.2     4.1    0 1.2e+09     963    2816     768 ( 0.071%)  163623 ( 4.0%)    8.517      0.000      0.000      0.000      0.000       38
  11  429.2     5.3    0 1.1e+09     787    2347     553 ( 0.051%)  164564 ( 4.0%)    8.517      0.000      0.000      0.000      0.000       36
  12  362.7     6.9    0 8.9e+08     611    1840     360 ( 0.033%)  166068 ( 4.0%)    8.517      0.000      0.000      0.000      0.000       34
  13  265.5     9.0    0 6.4e+08     411    1199     201 ( 0.019%)  167344 ( 4.1%)    8.517      0.000      0.000      0.000      0.000       31
  14  215.2    11.6    0 5.0e+08     267     755     103 ( 0.010%)  168199 ( 4.1%)    8.517      0.000      0.000      0.000      0.000       29
  15  137.4    15.1    0 3.0e+08     152     415      52 ( 0.005%)  168953 ( 4.1%)    8.897      0.000      0.000      0.000      0.000       26
  16   76.4    19.7    0 1.6e+08      75     227      29 ( 0.003%)  169434 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       24
  17   47.2    25.6    0 9.8e+07      44     122      10 ( 0.001%)  169532 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       23
  18   24.9    33.3    0 5.0e+07      23      56       6 ( 0.001%)  169781 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       22
  19   15.2    43.3    0 3.0e+07      13      36       2 ( 0.000%)  169837 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       21
  20   10.3    56.2    0 1.9e+07       8      17       2 ( 0.000%)  169846 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       21
  21   11.7    73.1    0 2.2e+07       9      18       2 ( 0.000%)  169827 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       21
  22    8.4    95.0    0 1.6e+07       7      17       2 ( 0.000%)  169837 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       21
  23    9.7   123.5    0 1.9e+07       8      17       2 ( 0.000%)  169865 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       22
  24    8.8   160.6    0 1.8e+07       8      17       1 ( 0.000%)  169864 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       22
  25    5.0   208.8    0 1.0e+07       5      11       1 ( 0.000%)  169875 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       23
  26    8.1   271.4    0 1.6e+07       7      14       1 ( 0.000%)  169893 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       23
  27    6.5   352.8    0 1.3e+07       6      12       1 ( 0.000%)  169875 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       24
  28    5.1   458.7    0 1.0e+07       5      11       1 ( 0.000%)  169886 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       25
  29    6.7   596.3    0 1.3e+07       6      12       1 ( 0.000%)  169902 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       25
  30    5.8   775.1    0 1.1e+07       6      12       1 ( 0.000%)  169920 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       26
  31    4.2  1007.7    0 8544744       5      11       1 ( 0.000%)  169902 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       27
  32    7.5  1310.0    0 1.4e+07       7      13       1 ( 0.000%)  169889 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       27
  33    4.2  1703.0    0 8548004       5      11       1 ( 0.000%)  169902 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       28
  34    7.4  2213.9    0 1.4e+07       7      14       1 ( 0.000%)  169891 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       29
  35    6.1  2878.1    0 1.1e+07       6      12       1 ( 0.000%)  169889 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       30
  36    4.3  3741.5    0 8565407       5      11       1 ( 0.000%)  169902 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       31
  37    7.4  4863.9    0 1.4e+07       7      14       1 ( 0.000%)  169914 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       32
  38    5.8  6323.1    0 1.1e+07       6      12       1 ( 0.000%)  169948 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       32
  39    5.8  8220.0    0 1.1e+07       6      10       1 ( 0.000%)  169945 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       33
  40    5.8 10686.0    0 1.1e+07       6       9       1 ( 0.000%)  169951 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       34
  41    7.3 13891.9    0 1.4e+07       7      11       1 ( 0.000%)  169964 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       34
  42    4.2 18059.4    0 8450483       5       8       1 ( 0.000%)  169956 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       35
  43    5.8 23477.2    0 1.1e+07       6       9       1 ( 0.000%)  169940 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       36
  44    7.3 30520.4    0 1.4e+07       7      11       1 ( 0.000%)  169953 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       37
  45    4.2 39676.5    0 8451320       5       8       1 ( 0.000%)  169920 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       37
  46    5.9 51579.5    0 1.2e+07       6      13       1 ( 0.000%)  169891 ( 4.1%)    8.873      0.000      0.000      0.000      0.000       38
  47    5.9 67053.3    0 1.1e+07       6      12       1 ( 0.000%)  169889 ( 4.1%)    8.873      0.000      0.000      0.000      0.000      N/A
  48    4.2 87169.3    0 8582545       5      11       1 ( 0.000%)  169902 ( 4.1%)    8.873      0.000      0.000      0.000      0.000      N/A
  49    7.4 1.1e+05    0 1.4e+07       7      14       1 ( 0.000%)  169891 ( 4.1%)    8.873      0.000      0.000      0.000      0.000      N/A
  50    5.9 1.5e+05    0 1.1e+07       6      12       1 ( 0.000%)  169889 ( 4.1%)    8.873      0.000      0.000      0.000      0.000      N/A
  51    4.2 1.9e+05    0 8606773       5      11       1 ( 0.000%)  169902 ( 4.1%)    8.873      0.000      0.000      0.000      0.000      N/A
  52    9.9 2.5e+05    0 1.9e+07       8      17       1 ( 0.000%)  169864 ( 4.1%)    8.873      0.000      0.000      0.000      0.000      N/A
  53    5.0 3.2e+05    0 1.0e+07       5      11       1 ( 0.000%)  169875 ( 4.1%)    8.873      0.000      0.000      0.000      0.000      N/A
  54    8.2 4.2e+05    0 1.6e+07       7      14       1 ( 0.000%)  169893 ( 4.1%)    8.873      0.000      0.000      0.000      0.000      N/A
  55    5.2 5.5e+05    0 1.1e+07       6      12       0 ( 0.000%)  169962 ( 4.1%)    8.873      0.000      0.000      0.000      0.000      N/A

A* factor = 1.0, bb factor = 100:

---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Iter   Time    pres  BBs    Heap  Re-Rtd  Re-Rtd Overused RR Nodes      Wirelength      CPD       sTNS       sWNS       hTNS       hWNS Est Succ
      (sec)     fac Updt    push    Nets   Conns                                       (ns)       (ns)       (ns)       (ns)       (ns)     Iter
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Warning 271068: No routing path for connection to sink_rr 109219, retrying with full device bounding box
Warning 271069: No routing path for connection to sink_rr 70660, retrying with full device bounding box
Warning 271070: 17 timing startpoints were not constrained during timing analysis
Warning 271071: 18 timing endpoints were not constrained during timing analysis
   1  675.5     0.0    0 1.9e+09    2661    8648    4801 ( 0.443%)  224892 ( 5.4%)   11.270     -31.72     -1.270     -334.7     -3.843      N/A
   2  170.0     0.5    2 6.3e+08    2138    6786    4112 ( 0.379%)  160649 ( 3.9%)   11.822     -37.19     -1.822     -819.5     -6.426      N/A
   3  163.9     0.6    2 6.1e+08    2047    6422    3611 ( 0.333%)  162065 ( 3.9%)   11.749     -26.90     -1.749     -716.0     -5.633      N/A
   4  164.9     0.8    2 6.2e+08    1954    6037    3268 ( 0.302%)  162297 ( 3.9%)   12.378     -42.45     -2.378     -763.1     -5.774      N/A
   5  145.6     1.1    2 5.5e+08    1813    5663    2755 ( 0.254%)  161927 ( 3.9%)   11.480     -17.22     -1.480     -792.5     -7.818      N/A
   6  133.8     1.4    2 5.1e+08    1699    5188    2270 ( 0.209%)  163149 ( 4.0%)   16.765     -1356.     -6.765     -1019.     -8.019      N/A
   7  127.0     1.9    2 4.7e+08    1533    4638    2015 ( 0.186%)  164052 ( 4.0%)   16.047     -1197.     -6.047     -781.7     -7.351      N/A
   8  114.0     2.4    2 4.2e+08    1388    4195    1614 ( 0.149%)  164677 ( 4.0%)   16.395     -1380.     -6.395     -960.9     -7.699      N/A
   9  103.2     3.1    2 3.8e+08    1214    3671    1239 ( 0.114%)  165857 ( 4.0%)   14.758     -710.9     -4.758     -826.2     -7.319      N/A
  10   95.4     4.1    1 3.4e+08    1087    3249     996 ( 0.092%)  166363 ( 4.0%)   14.766     -773.3     -4.766     -789.3     -7.327       46
  11   82.7     5.3    1 3.0e+08     924    2773     766 ( 0.071%)  167407 ( 4.1%)   11.738     -45.16     -1.738     -609.8     -6.337       42
  12   81.0     6.9    1 2.9e+08     800    2374     589 ( 0.054%)  168433 ( 4.1%)   14.032     -145.1     -4.032     -761.1     -6.845       41
  13   87.7     9.0    1 2.9e+08     614    1866     377 ( 0.035%)  169822 ( 4.1%)   14.523     -201.4     -4.523     -737.5     -7.363       38
  14   76.3    11.6    0 2.4e+08     433    1325     252 ( 0.023%)  171188 ( 4.1%)   12.789     -111.7     -2.789     -755.2     -5.813       35
  15   59.8    15.1    0 1.8e+08     304     944     174 ( 0.016%)  171418 ( 4.2%)   12.093     -28.06     -2.093     -489.4     -5.125       33
  16   52.1    19.7    0 1.5e+08     218     663     108 ( 0.010%)  172297 ( 4.2%)   14.547     -173.8     -4.547     -661.9     -7.193       32
  17   40.0    25.6    0 1.1e+08     125     435      46 ( 0.004%)  172942 ( 4.2%)   14.535     -185.6     -4.535     -585.4     -7.371       30
  18   20.0    33.3    0 5.4e+07      51     187      23 ( 0.002%)  173290 ( 4.2%)   14.365     -174.1     -4.365     -662.4     -7.201       28
  19   17.7    43.3    0 4.4e+07      31      99      20 ( 0.002%)  173398 ( 4.2%)   14.713     -215.9     -4.713     -643.9     -7.549       26
  20   15.3    56.2    0 3.9e+07      30     115      15 ( 0.001%)  173566 ( 4.2%)   14.713     -214.3     -4.713     -658.8     -7.549       26
  21   11.0    73.1    0 2.8e+07      19      76      10 ( 0.001%)  173542 ( 4.2%)   11.174     -4.723     -1.174     -373.5     -4.904       25
  22    8.0    95.0    0 1.8e+07      11      38       6 ( 0.001%)  173561 ( 4.2%)   11.174     -14.61     -1.174     -348.5     -4.904       26
  23    2.8   123.5    0 7999786       8      22       2 ( 0.000%)  173619 ( 4.2%)   11.174     -14.74     -1.174     -348.8     -4.904       26
  24    0.5   160.6    0 1837305       4       6       1 ( 0.000%)  173612 ( 4.2%)   11.174     -14.74     -1.174     -348.8     -4.904       25
  25    0.9   208.8    0 2324745       2       2       1 ( 0.000%)  173621 ( 4.2%)   11.174     -14.74     -1.174     -348.8     -4.904       25
  26    1.2   271.4    0 2826408       2       2       1 ( 0.000%)  173611 ( 4.2%)   11.174     -14.74     -1.174     -348.8     -4.904       25
  27    1.2   352.8    0 2803552       2       2       1 ( 0.000%)  173621 ( 4.2%)   11.174     -14.74     -1.174     -348.8     -4.904       25
  28    1.3   458.7    0 3833159       3       9       2 ( 0.000%)  173604 ( 4.2%)   11.174     -14.74     -1.174     -347.7     -4.904       25
  29    0.2   596.3    0 1033242       2       4       0 ( 0.000%)  173442 ( 4.2%)   11.174     -15.01     -1.174     -345.5     -4.904       26
vaughnbetz commented 5 years ago

Thanks for the data; it's interesting. I think the data shows there are still major lookahead issues though, as astar_fac = 1 is not performing well. So a per-net (or per-connection) astar could be added, but before doing that and trying to tune it I think we need to get the lookahead in a reasonable state.

bb_factor is normally a moderate runtime optimization (once you have a good lookahead and are running with astar_fac somewhere near 1). With astar_fac near 0, smaller bb_factors are much more important to control runtime, but it'll still be bad even with small bb_factor.

So I think it's too soon to start adding fine-tuning knobs; there are deeper lookahead issues that should be fixed first. Also, once you get into fine-tuning algorithms you'll need to use a larger benchmark suite as you only want to do debugging and very coarse tuning on a single circuit.

litghost commented 5 years ago

Thanks for the data; it's interesting. I think the data shows there are still major lookahead issues though, as astar_fac = 1 is not performing well. So a per-net (or per-connection) astar could be added, but before doing that and trying to tune it I think we need to get the lookahead in a reasonable state.

Thanks for the feedback. I'll gather some more data on lookahead behavior. It is worth noting that this new lookahead is doing better, but if you think based on this data that the lookahead is still not good enough, I'll focus my energy there.

The good outcome here is that I have a placement that is "good enough" to close timing, I just need to enable the router to reach it. This is important in my eyes because it means I can ignore problems in packing and placement and focus on routing behavior.

To be clear, there may be additional gains to be had, but the pack and place is good enough (by existance proof) to focus on the router behavior.

vaughnbetz commented 5 years ago

Yes, that is good news. In an uncongested design, I'd expect the router to quickly converge to a solution with near best-case routing delay, with astar somewhere around 1. Until that happens, modifying the fundamental algorithm is probably not a good idea, as you're working around a bad lookahead and more likely to detune than tune the algorithm for the case when the lookahead works well.