Open litghost opened 5 years ago
Example routing logs
A* factor = 0, bb factor = 10000:
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Iter Time pres BBs Heap Re-Rtd Re-Rtd Overused RR Nodes Wirelength CPD sTNS sWNS hTNS hWNS Est Succ
(sec) fac Updt push Nets Conns (ns) (ns) (ns) (ns) (ns) Iter
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Warning 271068: 17 timing startpoints were not constrained during timing analysis
Warning 271069: 18 timing endpoints were not constrained during timing analysis
1 6379.7 0.0 0 1.5e+10 2661 8648 4495 ( 0.415%) 222599 ( 5.4%) 19.604 -404.7 -9.604 -145.0 -12.912 N/A
2 1006.9 0.5 0 3.5e+09 2099 6550 3858 ( 0.356%) 158068 ( 3.8%) 8.780 0.000 0.000 -32.69 -3.321 N/A
3 1366.9 0.6 0 4.4e+09 2008 6112 3276 ( 0.302%) 160241 ( 3.9%) 8.521 0.000 0.000 0.000 0.000 N/A
4 906.6 0.8 0 2.8e+09 1906 5760 3389 ( 0.313%) 159127 ( 3.9%) 8.517 0.000 0.000 0.000 0.000 N/A
5 1003.2 1.1 0 2.7e+09 1833 5475 2737 ( 0.253%) 159883 ( 3.9%) 8.517 0.000 0.000 0.000 0.000 N/A
6 882.6 1.4 0 2.3e+09 1672 4991 2204 ( 0.203%) 160852 ( 3.9%) 8.517 0.000 0.000 0.000 0.000 N/A
7 744.7 1.9 0 2.0e+09 1472 4398 1695 ( 0.156%) 160975 ( 3.9%) 8.517 0.000 0.000 0.000 0.000 N/A
8 657.4 2.4 0 1.7e+09 1321 3866 1368 ( 0.126%) 162065 ( 3.9%) 8.517 0.000 0.000 0.000 0.000 N/A
9 551.0 3.1 0 1.5e+09 1156 3399 1030 ( 0.095%) 162674 ( 3.9%) 8.517 0.000 0.000 0.000 0.000 N/A
10 488.2 4.1 0 1.2e+09 963 2816 768 ( 0.071%) 163623 ( 4.0%) 8.517 0.000 0.000 0.000 0.000 38
11 429.2 5.3 0 1.1e+09 787 2347 553 ( 0.051%) 164564 ( 4.0%) 8.517 0.000 0.000 0.000 0.000 36
12 362.7 6.9 0 8.9e+08 611 1840 360 ( 0.033%) 166068 ( 4.0%) 8.517 0.000 0.000 0.000 0.000 34
13 265.5 9.0 0 6.4e+08 411 1199 201 ( 0.019%) 167344 ( 4.1%) 8.517 0.000 0.000 0.000 0.000 31
14 215.2 11.6 0 5.0e+08 267 755 103 ( 0.010%) 168199 ( 4.1%) 8.517 0.000 0.000 0.000 0.000 29
15 137.4 15.1 0 3.0e+08 152 415 52 ( 0.005%) 168953 ( 4.1%) 8.897 0.000 0.000 0.000 0.000 26
16 76.4 19.7 0 1.6e+08 75 227 29 ( 0.003%) 169434 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 24
17 47.2 25.6 0 9.8e+07 44 122 10 ( 0.001%) 169532 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 23
18 24.9 33.3 0 5.0e+07 23 56 6 ( 0.001%) 169781 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 22
19 15.2 43.3 0 3.0e+07 13 36 2 ( 0.000%) 169837 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 21
20 10.3 56.2 0 1.9e+07 8 17 2 ( 0.000%) 169846 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 21
21 11.7 73.1 0 2.2e+07 9 18 2 ( 0.000%) 169827 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 21
22 8.4 95.0 0 1.6e+07 7 17 2 ( 0.000%) 169837 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 21
23 9.7 123.5 0 1.9e+07 8 17 2 ( 0.000%) 169865 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 22
24 8.8 160.6 0 1.8e+07 8 17 1 ( 0.000%) 169864 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 22
25 5.0 208.8 0 1.0e+07 5 11 1 ( 0.000%) 169875 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 23
26 8.1 271.4 0 1.6e+07 7 14 1 ( 0.000%) 169893 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 23
27 6.5 352.8 0 1.3e+07 6 12 1 ( 0.000%) 169875 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 24
28 5.1 458.7 0 1.0e+07 5 11 1 ( 0.000%) 169886 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 25
29 6.7 596.3 0 1.3e+07 6 12 1 ( 0.000%) 169902 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 25
30 5.8 775.1 0 1.1e+07 6 12 1 ( 0.000%) 169920 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 26
31 4.2 1007.7 0 8544744 5 11 1 ( 0.000%) 169902 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 27
32 7.5 1310.0 0 1.4e+07 7 13 1 ( 0.000%) 169889 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 27
33 4.2 1703.0 0 8548004 5 11 1 ( 0.000%) 169902 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 28
34 7.4 2213.9 0 1.4e+07 7 14 1 ( 0.000%) 169891 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 29
35 6.1 2878.1 0 1.1e+07 6 12 1 ( 0.000%) 169889 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 30
36 4.3 3741.5 0 8565407 5 11 1 ( 0.000%) 169902 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 31
37 7.4 4863.9 0 1.4e+07 7 14 1 ( 0.000%) 169914 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 32
38 5.8 6323.1 0 1.1e+07 6 12 1 ( 0.000%) 169948 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 32
39 5.8 8220.0 0 1.1e+07 6 10 1 ( 0.000%) 169945 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 33
40 5.8 10686.0 0 1.1e+07 6 9 1 ( 0.000%) 169951 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 34
41 7.3 13891.9 0 1.4e+07 7 11 1 ( 0.000%) 169964 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 34
42 4.2 18059.4 0 8450483 5 8 1 ( 0.000%) 169956 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 35
43 5.8 23477.2 0 1.1e+07 6 9 1 ( 0.000%) 169940 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 36
44 7.3 30520.4 0 1.4e+07 7 11 1 ( 0.000%) 169953 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 37
45 4.2 39676.5 0 8451320 5 8 1 ( 0.000%) 169920 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 37
46 5.9 51579.5 0 1.2e+07 6 13 1 ( 0.000%) 169891 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 38
47 5.9 67053.3 0 1.1e+07 6 12 1 ( 0.000%) 169889 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 N/A
48 4.2 87169.3 0 8582545 5 11 1 ( 0.000%) 169902 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 N/A
49 7.4 1.1e+05 0 1.4e+07 7 14 1 ( 0.000%) 169891 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 N/A
50 5.9 1.5e+05 0 1.1e+07 6 12 1 ( 0.000%) 169889 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 N/A
51 4.2 1.9e+05 0 8606773 5 11 1 ( 0.000%) 169902 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 N/A
52 9.9 2.5e+05 0 1.9e+07 8 17 1 ( 0.000%) 169864 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 N/A
53 5.0 3.2e+05 0 1.0e+07 5 11 1 ( 0.000%) 169875 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 N/A
54 8.2 4.2e+05 0 1.6e+07 7 14 1 ( 0.000%) 169893 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 N/A
55 5.2 5.5e+05 0 1.1e+07 6 12 0 ( 0.000%) 169962 ( 4.1%) 8.873 0.000 0.000 0.000 0.000 N/A
A* factor = 1.0, bb factor = 100:
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Iter Time pres BBs Heap Re-Rtd Re-Rtd Overused RR Nodes Wirelength CPD sTNS sWNS hTNS hWNS Est Succ
(sec) fac Updt push Nets Conns (ns) (ns) (ns) (ns) (ns) Iter
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Warning 271068: No routing path for connection to sink_rr 109219, retrying with full device bounding box
Warning 271069: No routing path for connection to sink_rr 70660, retrying with full device bounding box
Warning 271070: 17 timing startpoints were not constrained during timing analysis
Warning 271071: 18 timing endpoints were not constrained during timing analysis
1 675.5 0.0 0 1.9e+09 2661 8648 4801 ( 0.443%) 224892 ( 5.4%) 11.270 -31.72 -1.270 -334.7 -3.843 N/A
2 170.0 0.5 2 6.3e+08 2138 6786 4112 ( 0.379%) 160649 ( 3.9%) 11.822 -37.19 -1.822 -819.5 -6.426 N/A
3 163.9 0.6 2 6.1e+08 2047 6422 3611 ( 0.333%) 162065 ( 3.9%) 11.749 -26.90 -1.749 -716.0 -5.633 N/A
4 164.9 0.8 2 6.2e+08 1954 6037 3268 ( 0.302%) 162297 ( 3.9%) 12.378 -42.45 -2.378 -763.1 -5.774 N/A
5 145.6 1.1 2 5.5e+08 1813 5663 2755 ( 0.254%) 161927 ( 3.9%) 11.480 -17.22 -1.480 -792.5 -7.818 N/A
6 133.8 1.4 2 5.1e+08 1699 5188 2270 ( 0.209%) 163149 ( 4.0%) 16.765 -1356. -6.765 -1019. -8.019 N/A
7 127.0 1.9 2 4.7e+08 1533 4638 2015 ( 0.186%) 164052 ( 4.0%) 16.047 -1197. -6.047 -781.7 -7.351 N/A
8 114.0 2.4 2 4.2e+08 1388 4195 1614 ( 0.149%) 164677 ( 4.0%) 16.395 -1380. -6.395 -960.9 -7.699 N/A
9 103.2 3.1 2 3.8e+08 1214 3671 1239 ( 0.114%) 165857 ( 4.0%) 14.758 -710.9 -4.758 -826.2 -7.319 N/A
10 95.4 4.1 1 3.4e+08 1087 3249 996 ( 0.092%) 166363 ( 4.0%) 14.766 -773.3 -4.766 -789.3 -7.327 46
11 82.7 5.3 1 3.0e+08 924 2773 766 ( 0.071%) 167407 ( 4.1%) 11.738 -45.16 -1.738 -609.8 -6.337 42
12 81.0 6.9 1 2.9e+08 800 2374 589 ( 0.054%) 168433 ( 4.1%) 14.032 -145.1 -4.032 -761.1 -6.845 41
13 87.7 9.0 1 2.9e+08 614 1866 377 ( 0.035%) 169822 ( 4.1%) 14.523 -201.4 -4.523 -737.5 -7.363 38
14 76.3 11.6 0 2.4e+08 433 1325 252 ( 0.023%) 171188 ( 4.1%) 12.789 -111.7 -2.789 -755.2 -5.813 35
15 59.8 15.1 0 1.8e+08 304 944 174 ( 0.016%) 171418 ( 4.2%) 12.093 -28.06 -2.093 -489.4 -5.125 33
16 52.1 19.7 0 1.5e+08 218 663 108 ( 0.010%) 172297 ( 4.2%) 14.547 -173.8 -4.547 -661.9 -7.193 32
17 40.0 25.6 0 1.1e+08 125 435 46 ( 0.004%) 172942 ( 4.2%) 14.535 -185.6 -4.535 -585.4 -7.371 30
18 20.0 33.3 0 5.4e+07 51 187 23 ( 0.002%) 173290 ( 4.2%) 14.365 -174.1 -4.365 -662.4 -7.201 28
19 17.7 43.3 0 4.4e+07 31 99 20 ( 0.002%) 173398 ( 4.2%) 14.713 -215.9 -4.713 -643.9 -7.549 26
20 15.3 56.2 0 3.9e+07 30 115 15 ( 0.001%) 173566 ( 4.2%) 14.713 -214.3 -4.713 -658.8 -7.549 26
21 11.0 73.1 0 2.8e+07 19 76 10 ( 0.001%) 173542 ( 4.2%) 11.174 -4.723 -1.174 -373.5 -4.904 25
22 8.0 95.0 0 1.8e+07 11 38 6 ( 0.001%) 173561 ( 4.2%) 11.174 -14.61 -1.174 -348.5 -4.904 26
23 2.8 123.5 0 7999786 8 22 2 ( 0.000%) 173619 ( 4.2%) 11.174 -14.74 -1.174 -348.8 -4.904 26
24 0.5 160.6 0 1837305 4 6 1 ( 0.000%) 173612 ( 4.2%) 11.174 -14.74 -1.174 -348.8 -4.904 25
25 0.9 208.8 0 2324745 2 2 1 ( 0.000%) 173621 ( 4.2%) 11.174 -14.74 -1.174 -348.8 -4.904 25
26 1.2 271.4 0 2826408 2 2 1 ( 0.000%) 173611 ( 4.2%) 11.174 -14.74 -1.174 -348.8 -4.904 25
27 1.2 352.8 0 2803552 2 2 1 ( 0.000%) 173621 ( 4.2%) 11.174 -14.74 -1.174 -348.8 -4.904 25
28 1.3 458.7 0 3833159 3 9 2 ( 0.000%) 173604 ( 4.2%) 11.174 -14.74 -1.174 -347.7 -4.904 25
29 0.2 596.3 0 1033242 2 4 0 ( 0.000%) 173442 ( 4.2%) 11.174 -15.01 -1.174 -345.5 -4.904 26
Thanks for the data; it's interesting. I think the data shows there are still major lookahead issues though, as astar_fac = 1 is not performing well. So a per-net (or per-connection) astar could be added, but before doing that and trying to tune it I think we need to get the lookahead in a reasonable state.
bb_factor is normally a moderate runtime optimization (once you have a good lookahead and are running with astar_fac somewhere near 1). With astar_fac near 0, smaller bb_factors are much more important to control runtime, but it'll still be bad even with small bb_factor.
So I think it's too soon to start adding fine-tuning knobs; there are deeper lookahead issues that should be fixed first. Also, once you get into fine-tuning algorithms you'll need to use a larger benchmark suite as you only want to do debugging and very coarse tuning on a single circuit.
Thanks for the data; it's interesting. I think the data shows there are still major lookahead issues though, as astar_fac = 1 is not performing well. So a per-net (or per-connection) astar could be added, but before doing that and trying to tune it I think we need to get the lookahead in a reasonable state.
Thanks for the feedback. I'll gather some more data on lookahead behavior. It is worth noting that this new lookahead is doing better, but if you think based on this data that the lookahead is still not good enough, I'll focus my energy there.
The good outcome here is that I have a placement that is "good enough" to close timing, I just need to enable the router to reach it. This is important in my eyes because it means I can ignore problems in packing and placement and focus on routing behavior.
To be clear, there may be additional gains to be had, but the pack and place is good enough (by existance proof) to focus on the router behavior.
Yes, that is good news. In an uncongested design, I'd expect the router to quickly converge to a solution with near best-case routing delay, with astar somewhere around 1. Until that happens, modifying the fundamental algorithm is probably not a good idea, as you're working around a bad lookahead and more likely to detune than tune the algorithm for the case when the lookahead works well.
Current Behaviour
bb factor and A* are constant across nets.
Proposed Behaviour/Solution
bb factor and A variable vary based on slack. If the nets along the path meet timing, the just use default bb factor and A. Each iteration that a net is along the path does not meet timing, and the router chooses to "rip up", the bb factor and A* factor change to increase the effort the router searches for a route. The idea here is that critical path nets deserve more effort, rather than increasing the effort uniformly.
Context
I'm currently exploring making VPR produce "good" solutions on 7-series graph architecture, and have found that increase bb factor (effectively removing the bounding box) and lowering the A* factor (1.2 -> 0, 0.4), have been important for closing timing.
I'm currently working on making a demo using the Murax SoC close timing at a clock of 100 MHz. Vivado took the same synthesis and was "only" able to reach a slack of 1.434 ns (e.g. critical path delay of 8.566 ns).
Here is the current results I've been getting so far (with various A and bb factor values). These are all using the same placement (generated a delay matrix generated with A factor = 0 and bb factor = 10000):