Placer fails with final cost check error

litghost commented 5 years ago

Expected Behaviour

Current Behaviour

Error 1: timing_cost_check: 13.5642 and timing_cost: 12.5681 differ in check_place.
# Placement took 0.17 seconds (max_rss 58.5 MiB, delta_rss +0.0 MiB)
Error 2: 
Type: Placement
File: /usr/local/google/home/keithrothman/cat_x/vtr-verilog-to-routing/vpr/src/place/place.cpp
Line: 3254
Message: 
Completed placement consistency check, 1 errors found.
Aborting program.

Steps to Reproduce

Build VPR including https://github.com/verilog-to-routing/vtr-verilog-to-routing/pull/483
Unzip tolerance_issue.zip
Execute:

vpr arch.unique_pack.xml top.eblif --read_rr_graph rr_graph_4x4_dummy.rr_graph.real.xml --device 4x4 --min_route_chan_width_hint 100 --max_criticality 0.0 --max_router_iterations 500 --routing_failure_predictor off --router_high_fanout_threshold -1 --constant_net_method route --route_chan_width 20 --pack --place

litghost commented 5 years ago

33a6c955d0f9a9f08ba0ff807c0fb655d977ace8 is the first bad commit
commit 33a6c955d0f9a9f08ba0ff807c0fb655d977ace8
Author: kmurray <k.murray@utoronto.ca>
Date:   Wed Aug 8 16:31:47 2018 -0400

    vpr: Fix default behaviour of --cluster_seed_type

    Previously the seed type incorrectly defaulted to TIMING unless
    manually specified, which disagreed with the option description.

    It now defaults to BLEND if timing driven, and MAX_INPUTS as described
    in the help message.

:040000 040000 370c45d2f4fcf55b42b0fba56ca1562287d50938 cdbc337bbf6e8b09a55b2cfaf883222c8f12049c M  vpr

litghost commented 5 years ago

Per the bisect, prior to 33a6c955d0f9a9f08ba0ff807c0fb655d977ace8

 --cluster_seed_type timing

afterwards

--cluster_seed_type blend

With --cluster_seed_type timing, the consistency error does not arise.

kmurray commented 5 years ago

I suspect that's not the fundamental cause. The change in cluster seed just changes how the clusters are created (so a different packing result). It should have no effect on the timing calculation during placement.

(However it may be possible the clusters formed with a different seed have some characteristic which exacerbates the incremental vs from-scratch round-off issue which the failing check is addressing during placement.)

Another thing to try would be to leave the cluster seed type unchanged and instead trying a couple of different placement seeds (--seed). Changing the seed will result in a different optimization result during placement. If changing the seed causes the issue to disappear as well it would indicate the issue is more general and not tied to the clustering seed change.

litghost commented 5 years ago

Changing the seed indeed changes the behavior. Both cluster_seed_type's fail with different seed values.

kmurray commented 5 years ago

Thanks for confirming that!

That likely means a closer look into the full vs incremental cost calculation code is needed (although I don't think it's changed). More likely the round-off checking test should be re-considered to see if it's just set too sensitively or if we should be the full calculation more often to prevent the accumulation of round-off.

litghost commented 5 years ago

I can confirm that the code prior to 33a6c95 shares the seed behavior.

litghost commented 5 years ago

Thanks for confirming that!

That likely means a closer look into the full vs incremental cost calculation code is needed (although I don't think it's changed). More likely the round-off checking test should be re-considered to see if it's just set too sensitively or if we should be the full calculation more often to prevent the accumulation of round-off.

The question is how much is too much?

I've seen errors on the order to 7% to 15%, which seems "big".

litghost commented 5 years ago

The attached graph diverges very quickly, so I think it may indicate a numerical instability in the incremental cost calculations. Probably worth investigating if a restructuring can help.

kmurray commented 5 years ago

So I've taken a look and you seem to have some unrealistic timing-related values on your RR nodes.

For example most CHANX/CHANY wires are something like:

    <node capacity="1" direction="INC_DIR" id="416" type="CHANX">
      <loc ptc="0" xhigh="8" xlow="1" yhigh="0" ylow="0"/>
      <timing C="1" R="1"/>
      <segment segment_id="0"/>
    </node>

Which specifies a 1 Farad capacitance (which is huge, typical values would be in the picofarads) and 1 Ohm resistance.

As a result you get huge critical path delays. For example:

------- ------- ---------- ---------- ---------- ---------- ------- ---------- -------- ------- ------- ------- ------ --------- ------
      T    Cost Av BB Cost Av TD Cost Av Tot Del P to P Del     CPD       sTNS     sWNS Ac Rate Std Dev R limit    Exp Tot Moves  Alpha
------- ------- ---------- ---------- ---------- ---------- ------- ---------- -------- ------- ------- ------- ------ --------- ------
  0.293  0.9679     4.7485 32.515     38.294     1.3889     6000000000.000     -6e+09 -6000000000.000  0.9444  0.0221  9.0000  1.000       36  0.900
  0.264  1.0671     5.0496 32.296     37.25      1.3889     6000000000.000     -6e+09 -6000000000.000  1.0000  0.0259  9.0000  1.000       72  0.500
  0.132  1.0105     5.3049 29         36.765     1.2778     5000000000.000     -5e+09 -5000000000.000  0.9444  0.0202  9.0000  1.000      108  0.900
  0.119  0.9758     5.0298 33.213     38.621     1.3889     6000000000.000     -6e+09 -6000000000.000  0.9167  0.0263  9.0000  1.000      144  0.900
  0.107  1.0132     5.1083 33.272     37.176     1.3889     6000000000.000     -6e+09 -6000000000.000  0.9444  0.0360  9.0000  1.000      180  0.900
  0.096  0.9545     4.8848 30.443     36.531     1.3889     6000000000.000     -6e+09 -6000000000.000  0.8889  0.0301  9.0000  1.000      216  0.900
  0.087  1.0113     4.7086 29.464     36.094     1.3519     6000000000.000     -6e+09 -6000000000.000  0.8889  0.0255  9.0000  1.000      252  0.900
  0.078  1.0405     4.9132 29.176     36.618     1.2407     6000000000.000     -6e+09 -6000000000.000  0.9444  0.0173  9.0000  1.000      288  0.900
  0.070  0.9696     4.8756 32.424     37.235     1.3889     6000000000.000     -6e+09 -6000000000.000  0.9444  0.0211  9.0000  1.000      324  0.900
  0.063  0.9970     4.7331 27.575     34.3       1.3519     6000000000.000     -6e+09 -6000000000.000  0.9722  0.0268  9.0000  1.000      360  0.500
  0.032  1.0094     4.6616 26.353     33.532     1.2778     6000000000.000     -6e+09 -6000000000.000  0.8611  0.0242  9.0000  1.000      396  0.900

shows a Critical Path Delay (CPD) of 6 billion seconds.

Since these values are so far out of the standard range it's less surprising that the round-off checks fail.

Two potential work arounds:

If you don't have a valid timing model, you may be better off running VPR with --timing_analysis off which will cause VPR to do a non-timing-driven implementation (which avoids the incremental timing check causing you issues).

Remove the unrealistic timing values:

#Zero out the invalid R/C values
$ cat rr_graph_4x4_dummy.rr_graph.real.xml | sed 's/C="1" R="1"/C="0" R="0"/g' > rr_graph2.xml
$ vpr arch.unique_pack.xml top.eblif --read_rr_graph rr_graph2.xml --device 4x4 --max_router_iterations 500 --route_chan_width 20 --place

litghost commented 5 years ago

Making the timing conform to better values actually doesn't help.

litghost commented 5 years ago

Here is a version of the same routing graph with more "reasonable" timing values, tolerance_issue_v2.zip.

vaughnbetz commented 5 years ago

Usually to debug issues like this, I turn MAX_MOVES_PER_RECOMPUTE way down (to 1 if you can tolerate the CPU time hit for a while). If you still fail the test then it definitely isn't round-off. Given the magnitude of the differnence you're seeing, it doesn't look like round off. Also note that this check is there to catch optimization flaws, not correctness issues, so if this is holding you up you can set ERROR_TOL to something big (e.g. 1 or 10) and nothing bad will happen.

Looking at the placement_inner_loop () function, there may be a problem -- with the restructured code for checking the tolerance (put in 6 months ago) we might get a timing analysis between the incremental cost calculation and the recompute_from_scratch, which could change the costs. That could lead to false failures.

vaughnbetz commented 5 years ago

I looked at the code more carefully and I don't think it's possible for the timing analysis to occur without the timing costs being recomputed, so I don't think that's what leads to the placement cost recompute error. I suggest setting MAX_MOVES_PER_RECOMPUTE to 1 and seeing if the error fires; then take a look at that move.

verilog-to-routing / vtr-verilog-to-routing