Closed nedsels closed 3 months ago
@vaughnbetz I'm wondering about two things:
BinaryHeap
and FourAryHeap
?.h
and .cpp
files for BinaryHeap
and FourAryHeap
, but they are fairly small. Should I put these new classes in k_ary_heap.h
and .cpp
instead?You'll have to check out the CI results. I look at vtr_nightly_test1_odin, and it is OK. A few small circuits have QoR changes that are beyond the tolerances, but for these small circuits they are not concerning. One has too good a minimum channel width, while the others have ones that are a bit worse but not concerning for small designs. You update the golden results and check them in (see the developer guide) to make those pass. Below are the tests from vtr_nightly_test1_odin as an example.
Amin was also going to widen the error tolerance for some of these tests; I'll check on that PR as rebasing after it is merged may save you some work.
You should also attach your QoR spreadsheets so we can see the results (which you have summarized) on big circuits. We'd want to see a speedup and no overall critical path increase or wirelength increase (or sub-1% so it is within noise).
I'm fine with the current code and the small binary heap file -- more smaller files is fine.
2024-06-21T20:47:50.7138774Z [32;1m20:47:50[0m | Calculating QoR results... 2024-06-21T20:47:50.7139378Z [32;1m20:47:50[0m | 2024-06-21T20:47:50.7140262Z [32;1m20:47:50[0m | regression_tests/vtr_reg_nightly_test1_odin/vpr_reg_mcnc_equiv...[Fail] 2024-06-21T20:47:50.7141338Z [32;1m20:47:50[0m | [Fail] 2024-06-21T20:47:50.7142816Z [32;1m20:47:50[0m | k6_N10_40nm.xml/tseng.pre-vpr.blif/common min_chan_width relative value 0.17582417582417584 outside of range [0.25,1.3] and not equal to golden value: 182.0 2024-06-21T20:47:50.7144133Z [32;1m20:47:50[0m | [Fail] 2024-06-21T20:47:50.7145671Z [32;1m20:47:50[0m | k6_N10_40nm.xml/tseng.pre-vpr.blif/common min_chan_width_routing_area_total relative value 0.2212341391243292 outside of range [0.7,1.3] and not equal to golden value: 1595120.0 2024-06-21T20:47:50.7147087Z [32;1m20:47:50[0m | [Fail] 2024-06-21T20:47:50.7148618Z [32;1m20:47:50[0m | k6_N10_40nm.xml/tseng.pre-vpr.blif/common min_chan_width_routing_area_per_tile relative value 0.22123501890118832 outside of range [0.7,1.3] and not equal to golden value: 9438.56 2024-06-21T20:47:50.7150057Z [32;1m20:47:50[0m | [Fail] 2024-06-21T20:47:50.7151603Z [32;1m20:47:50[0m | k6_N10_40nm.xml/tseng.pre-vpr.blif/common crit_path_routing_area_total relative value 0.2147835400787127 outside of range [0.7,1.3] and not equal to golden value: 2007300.0 2024-06-21T20:47:50.7152993Z [32;1m20:47:50[0m | [Fail] 2024-06-21T20:47:50.7154521Z [32;1m20:47:50[0m | k6_N10_40nm.xml/tseng.pre-vpr.blif/common crit_path_routing_area_per_tile relative value 0.21478341401810147 outside of range [0.7,1.3] and not equal to golden value: 11877.5 2024-06-21T20:47:50.7156147Z [32;1m20:47:50[0m | regression_tests/vtr_reg_nightly_test1_odin/vpr_reg_mcnc...[Pass] 2024-06-21T20:47:50.7157289Z [32;1m20:47:50[0m | regression_tests/vtr_reg_nightly_test1_odin/vtr_reg_fpu_soft_logic_arch...[Pass] 2024-06-21T20:47:50.7158497Z [32;1m20:47:50[0m | regression_tests/vtr_reg_nightly_test1_odin/vtr_reg_fpu_hard_block_arch...[Pass] 2024-06-21T20:47:50.7159596Z [32;1m20:47:50[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/figure_8...[Pass] 2024-06-21T20:47:50.7160606Z [32;1m20:47:50[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/FIR_filters...[Pass] 2024-06-21T20:47:50.7161696Z [32;1m20:47:50[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/FIR_filters_frac...[Pass] 2024-06-21T20:47:50.7162512Z [32;1m20:47:50[0m | 2024-06-21T20:47:50.7163420Z [32;1m20:47:50[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/multless_consts...[Fail] 2024-06-21T20:47:50.7164302Z [32;1m20:47:50[0m | [Fail] 2024-06-21T20:47:50.7165605Z [32;1m20:47:50[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_022.v/common min_chan_width relative value 1.4666666666666666 outside of range [0.25,1.3] and not equal to golden value: 30.0 2024-06-21T20:47:50.7166939Z [32;1m20:47:50[0m | [Fail] 2024-06-21T20:47:50.7168337Z [32;1m20:47:50[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_022.v/common min_chan_width_routing_area_total relative value 1.4112435202627822 outside of range [0.7,1.3] and not equal to golden value: 526063.0 2024-06-21T20:47:50.7169764Z [32;1m20:47:50[0m | [Fail] 2024-06-21T20:47:50.7171207Z [32;1m20:47:50[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_022.v/common min_chan_width_routing_area_per_tile relative value 1.4112421647100186 outside of range [0.7,1.3] and not equal to golden value: 1820.29 2024-06-21T20:47:50.7172665Z [32;1m20:47:50[0m | [Fail] 2024-06-21T20:47:50.7174032Z [32;1m20:47:50[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_022.v/common crit_path_routing_area_total relative value 1.4061912035217121 outside of range [0.7,1.3] and not equal to golden value: 666494.0 2024-06-21T20:47:50.7175545Z [32;1m20:47:50[0m | [Fail] 2024-06-21T20:47:50.7176930Z [32;1m20:47:50[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_022.v/common crit_path_routing_area_per_tile relative value 1.406190242866001 outside of range [0.7,1.3] and not equal to golden value: 2306.21 2024-06-21T20:47:50.7178342Z [32;1m20:47:50[0m | 2024-06-21T20:47:50.7179070Z [32;1m20:47:50[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/multless_consts...[Fail] 2024-06-21T20:47:50.7179892Z [32;1m20:47:50[0m | [Fail] 2024-06-21T20:47:50.7181245Z [32;1m20:47:50[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_045.v/common min_chan_width relative value 1.3333333333333333 outside of range [0.25,1.3] and not equal to golden value: 30.0 2024-06-21T20:47:50.7182652Z [32;1m20:47:50[0m | [Fail] 2024-06-21T20:47:50.7184024Z [32;1m20:47:50[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_045.v/common crit_path_routing_area_total relative value 1.3088865016039155 outside of range [0.7,1.3] and not equal to golden value: 666494.0 2024-06-21T20:47:50.7185435Z [32;1m20:47:50[0m | [Fail] 2024-06-21T20:47:50.7186818Z [32;1m20:47:50[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_045.v/common crit_path_routing_area_per_tile relative value 1.3088834061078565 outside of range [0.7,1.3] and not equal to golden value: 2306.21 2024-06-21T20:47:50.7188231Z [32;1m20:47:50[0m | 2024-06-21T20:47:50.7188965Z [32;1m20:47:50[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/multless_consts...[Fail] 2024-06-21T20:47:50.7189788Z [32;1m20:47:50[0m | [Fail] 2024-06-21T20:47:50.7191204Z [32;1m20:47:50[0m | fixed_k6_frac_2ripple_N8_22nm.xml/mult_007.v/common min_chan_width_routing_area_total relative value 1.4869164661785093 outside of range [0.7,1.3] and not equal to golden value: 706193.0 2024-06-21T20:47:50.7192700Z [32;1m20:47:50[0m | [Fail] 2024-06-21T20:47:50.7194097Z [32;1m20:47:50[0m | fixed_k6_frac_2ripple_N8_22nm.xml/mult_007.v/common min_chan_width_routing_area_per_tile relative value 1.4869085522061893 outside of range [0.7,1.3] and not equal to golden value: 2443.58 2024-06-21T20:47:50.7195532Z [32;1m20:47:50[0m | 2024-06-21T20:47:50.7196265Z [32;1m20:47:50[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/multless_consts...[Fail] 2024-06-21T20:47:50.7197088Z [32;1m20:47:50[0m | [Fail] 2024-06-21T20:47:50.7198333Z [32;1m20:47:50[0m | fixed_k6_frac_N8_22nm.xml/mult_012.v/common min_chan_width relative value 1.3333333333333333 outside of range [0.25,1.3] and not equal to golden value: 36.0 2024-06-21T20:47:50.7199606Z [32;1m20:47:50[0m | [Fail] 2024-06-21T20:47:50.7200944Z [32;1m20:47:50[0m | fixed_k6_frac_N8_22nm.xml/mult_012.v/common min_chan_width_routing_area_total relative value 1.3335469993281848 outside of range [0.7,1.3] and not equal to golden value: 648988.0 2024-06-21T20:47:50.7202360Z [32;1m20:47:50[0m | [Fail] 2024-06-21T20:47:50.7203829Z [32;1m20:47:50[0m | fixed_k6_frac_N8_22nm.xml/mult_012.v/common min_chan_width_routing_area_per_tile relative value 1.3335500505426092 outside of range [0.7,1.3] and not equal to golden value: 2245.63 2024-06-21T20:47:50.7205459Z [32;1m20:47:50[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/open_cores...[Pass] 2024-06-21T20:47:50.7206495Z [32;1m20:47:50[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/open_cores_frac...[Pass] 2024-06-21T20:47:50.7207621Z [32;1m20:47:50[0m | regression_tests/vtr_reg_nightly_test1_odin/power_extended_arch_list...[Pass] 2024-06-21T20:47:50.7208488Z [32;1m20:47:50[0m | 2024-06-21T20:47:50.7209294Z [32;1m20:47:50[0m | regression_tests/vtr_reg_nightly_test1_odin/power_extended_circuit_list...[Fail] 2024-06-21T20:47:50.7210181Z [32;1m20:47:50[0m | [Fail] 2024-06-21T20:47:50.7211558Z [32;1m20:47:50[0m | k6_N10_I40_Fi6_L4_frac1_ff1_45nm.xml/mkDelayWorker32B.v/common total_power relative value 0.8973242065961418 outside of range [0.9,1.1] and not equal to golden value: 0.1607 2024-06-21T20:47:50.7212942Z [32;1m20:47:50[0m | 2024-06-21T20:47:50.7213746Z [32;1m20:47:50[0m | Test 'vtr_reg_nightly_test1_odin' had 19 qor test failures 2024-06-21T20:47:50.7214572Z [32;1m20:47:50[0m | 2024-06-21T20:47:50.7215303Z [32;1m20:47:50[0m | Test 'vtr_reg_nightly_test1_odin' had 0 run failures
Looks good. I suggest some more comments in various places, as listed in the detailed feedback.
These issues should now be fixed
I added the heap section to the API documentation. However, I think this instruction may be outdated or incorrect; I could not run it successfully (doc/conf.py
imports markdown_code_symlinks
; pip install markdown_code_symlinks
gives an error "ERROR: No matching distribution found for markdown_code_symlinks
").
@vaughnbetz All tests now passing
A new FourAryHeap has been added, and this has been made the default VPR heap implementation.
Description
A new abstract class
KAryHeap
has been created. This class is like the previousBinaryHeap
, except it doesn't implementis_valid()
,get_heap_head()
, orsift_down()
. Also, a new structheap_elem
has been added, which contains at_heap
pointer and a floatcost
for a certain node. This is now what theheap_
vector contains, instead of justt_heap
pointers. InKAryHeap
's methods, instead of dereferencing at_heap
pointer to get its cost for node comparisons, it is stored inheap_elem
.BinaryHeap
andFourAryHeap
now inherit fromKAryHeap
, and implement its pure virtual methods (all other methods are not specific to either of these two heap types).FOUR_ARY_HEAP
has been added to thee_heap_type
enum, and has been made the default heap implementation.Motivation and Context
heap_
vector containst_heap
pointers for each node in the heap. To get a node's cost, the respective pointer would be dereferenced, which increase runtime since this would occur whenever there was a comparison between two nodes. Now, instead ofheap_
being made of these pointers, it is of a structheap_elem
which contains both the pointer and the node's cost. This cost is attained by dereferencing the pointer once inadd_to_heap()
, eliminating the need for dereferencing in future comparisons.heap_elem
is 12 bytes (pointer and float), which means 5 can fit on a 64-byte cache line; so, using a 4-ary heap instead of a binary heap is more cache-friendly, and improves runtime even with more comparisons (between 4 nodes vs 2).For the future: if, instead of using
t_heap
pointers, a 4-byte ID can be used,heap_elem
can be reduced to 8 bytes. Then, an 8-ary heap might be even better, even with many more comparisons. An 8-ary heap is fastest when usingt_heap
pointers inheap_
(8 can fit on a cache line) and this heap implementation as VPR's heap.How Has This Been Tested?
Three changes were considered:
heap_elem
instead oft_heap
pointers in theheap_
vector to eliminate unnecessary dereferencingThe 8 combinations of these changes were tested on the
koios_medium
benchmarks, and some combinations ontitan
benchmarks, on desktop. For smaller circuits, the results are inconsistent, but ontitan
, the both changes (1.) and (2.) result in runtime improvements. Change (3.) seems to have a negligible effect on runtime. Using changes (1.) and (2.) together results in a 4.2% reduction in runtime.max_rss
)Full Results
Types of changes
Checklist: