Closed kmurray closed 5 years ago
@vaughnbetz @KennethKent @jeanlego @aarongraham9: Any thoughts?
It seems to we have only two options:
i) Convert .latch
es to black-boxes as described above.
ii) Synthesize each clock-domain through ABC independently, and then re-stich the netlist together into one netlist later.
Odin already has an option builtin to convert these into black boxes thanks to @aarongraham9 https://github.com/verilog-to-routing/vtr-verilog-to-routing/blob/master/ODIN_II/SRC/odin_ii.cpp#L333:L337
. Also he already made a conversion script to get these back. The principle is that Odin puts the clock information within the latch name and the conversion simply takes these and convert them back. Doing so does prevent ABC from doing some optimization and we saw a couple slow down if memory serves right Aaron can explain that in more details
As for splitting by clock domain, it is possible but would require a significant amount of effort. I will leave that to others to decide the route to take. But I think to get it back running asap using Aaron's work makes the most sense
@jeanlego Thanks for the reminder that some of this is already implemented!
I agree getting Aaron's black-box code back running seems the best approach to fix this.
@aarongraham9 What would be involved in enabling the black-box latches in the VTR flow? Would you be able to look into re-evaluating the QoR/Run-time impact of converting latches to black-boxes? If I recall the last evaluation was done with the old version of ABC; it would be good to know what the impact is with the new version of ABC.
Looking into this @jeanlego and @kmurray.
I'd like to try and see if the restoration script can be improved and we can fix it that way. I'd rather not Black-Box them if possible as ABC isn't able to optimize just about anything and the QoR values are very bad in comparison to VTR 7 let alone with the updated ABC/VTR 8.
Aaron, can you post the QoR impact of using black box latches? It sounds like you have this data from a prior run; can you attach a spreadsheet and summary? If you don't have complete data, I think it would be good to rerun this so we have a precise QoR impact estimate, as the black box solution definitely seems easiest. If the QoR is not too bad, maybe it will be the winning candidate.
I think the options are:
A. black box the latches: simplest, but hurts QoR. Need to see by how much.
B. black box the latches only for circuits with more than one clock. This mitigates the QoR loss; all VTR circuits are single clock so they're OK. I think this is likely the best option: make a script that by default converts to black boxes only if there is more than one clock. Arguments to the script can let users override this (always black box, never black box, auto black box).
C. For circuits with more than one clock, split the circuit into multiple clock domains, and synthesize each separately in ABC. This is essentially a netlist covering/colouring problem I think. One sub-netlist per clock domain (all src and dst registers in same clock domain along with the logic between them) plus one netlist for all the inter-clock domain logic and logic reachable / reaching primary I/Os (in this netlist, all flops should be converted to primary inputs and outputs to prevent sequential optimizations, which are unsafe between clock domains).
D. Fix the current script. Kevin doesn't think this is doable, as the latches are being renamed in unpredictable ways; it's not clear how we can match them.
Kevin and I think B is probably the best mix of being robust and mitigating QoR. C is probably the best solution for QoR, but is complex and could delay the VTR 8 release.
We agree that the “fix the current script” solution is not viable. Given the unpredictable ways in which latches are being renamed it is not feasible to recall the latches. I have proposed an iterative black box approach to achieve the cirucit splitting solution. That is, we select one clock domain to remain and black box the remaining; run through abc; and repeat for each of the clock domains. This should achieve the reductions without the separation/stitching of clock domains.
Regardless, the approach above and the base solution of using black boxes for ALL latches need to be tested and documented for QoR impact. Both of these are currently under development and test.
On Aug 23, 2018, at 4:13 PM, vaughnbetz notifications@github.com<mailto:notifications@github.com> wrote:
Aaron, can you post the QoR impact of using black box latches? It sounds like you have this data from a prior run; can you attach a spreadsheet and summary? If you don't have complete data, I think it would be good to rerun this so we have a precise QoR impact estimate, as the black box solution definitely seems easiest. If the QoR is not too bad, maybe it will be the winning candidate.
I think the options are:
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/verilog-to-routing/vtr-verilog-to-routing/issues/384#issuecomment-415536334, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKDEpvQSP64nLtsGnlvIZawyiu4aMsJeks5uTv7zgaJpZM4WB9R9.
Dr. Kenneth B. Kent Professor & Director, IBM Centre for Advanced Studies - Atlantic Faculty of Computer Science, University of New Brunswick 540 Windsor Street, Fredericton, New Brunswick, E3B 5A3 ph. (506) 451-6971 fax. (506) 453-3566
Honorary Professor, Department of Computer Science and Institute of Visual Computing Bonn-Rhein-Sieg University of Applied Sciences Grantham-Allee 20, 53757 Sankt Augustin, Germany
With the new approach you mention above (black box all but one domain at a time) it seems like you'll get duplicate synthesis of the logic between clock domains. How do you ensure you get only one copy in the end?
Interestingly, none of these approaches, except black boxing all latches, is fully safe for cross-clock domain paths -- the registers involved in those paths shouldn't be duplicated or retimed as they can form synchronizers etc. Since abc doesn't understand that, I think fixing it is beyond the scope of what we want to do at least for the foreseeable future.
I've opened a PR with a way to test all three options, it also includes simulation to verify the blif pre vs post ABC.
view PR #408 I've included in the flow a simulation before and after abc.
for 1 case odin could not simulate fpu_hard_block_arch: hard_fpu_arch_timing.xml/mm3.v/common (failed to include simulation) OK but this is because the odin siulator doesn't have the hardblock built for the simulator
for 2 cases, odin failed siulation, but that is because they have no input pins, I'll make a workaround so that it generates N cycles for the simulator derived from the ouput vector file length.
here are the culprit: func_formal_flow: k6_frac_N10_40nm.xml/const_true.blif/common (failed: odin Simulation) OK func_formal_flow: k6_frac_N10_40nm.xml/const_false.blif/common (failed: odin Simulation) OK
But all other cases pass odin simulation. The good news is that the iterative approach seem to yield better result in a lot of cases!
to run the flow with the itterative approach you can set the variable in the run_vtr_flow as such:
##########
# ABC flow modifiers
# type 1 is skipping ABC
# type 2 is using only vaniila latches
# type 3 is black box all
# type 4 is itterative black boxing of latches per clock domain
my $abc_flow_type = 4;
my $use_new_latches_restoration_script = 1;
my $odin_run_simulation = 1;
my $disable_simulation_failure = 1;
Also with the itterative approach I think we could enable sequential optimization with ABC. which brings me to this and I'll explore to see if it does indeed work
The .latch \<input> \<output> \<type> \<control> \<init_val> directive in the extended BLIF can list \<type> as an integer number. This number represents the class of a given register. The register class can be used to limit transformations during sequential synthesis. For example, two registers that are proved equivalent cannot be merged if they belong to different class. When computing equivalent registers, computation is limited to one clock-domain at a time while the registers of other domains are treated as primary inputs. In the example below, the second registers is listed as having class 15. *https://people.eecs.berkeley.edu/~alanmi/publications/other/boxes01.pdf
the thought is to use ABC in two passes, before the first pass register all the clock domains in a file and assign them a class rewrite the the blif file to use the classes for the latches run a sequential only optimization through ABC recover the clock domains from the classes file second pass is to run the iterative approach with combinational optimization only.
with all that said here are the result of vtr_reg_strong
122 qor test failures
0 run failures
pass
pass
pass
pass
pass
pass
pass
arch | circuit | changes | golden | result |
---|---|---|---|---|
k6_frac_N10_40nm | always_true.blif | num_pre_packed_nets: | 1 | 7 |
k6_frac_N10_40nm | always_true.blif | num_post_packed_nets: | 1 | 7 |
k6_frac_N10_40nm | always_true.blif | placed_CPD_est: | nan | 0.734653 |
k6_frac_N10_40nm | always_true.blif | placed_setup_TNS_est: | 0 | -0.734653 |
k6_frac_N10_40nm | always_true.blif | placed_setup_WNS_est: | 0 | -0.734653 |
k6_frac_N10_40nm | always_true.blif | routed_wirelength: | 0 | 7 |
k6_frac_N10_40nm | always_true.blif | num_io: | 0 | 6 |
k6_frac_N10_40nm | always_false.blif | num_pre_packed_nets: | 1 | 7 |
k6_frac_N10_40nm | always_false.blif | num_post_packed_nets: | 1 | 7 |
k6_frac_N10_40nm | always_false.blif | placed_CPD_est: | nan | 0.734653 |
k6_frac_N10_40nm | always_false.blif | placed_setup_TNS_est: | 0 | -0.734653 |
k6_frac_N10_40nm | always_false.blif | placed_setup_WNS_est: | 0 | -0.734653 |
k6_frac_N10_40nm | always_false.blif | routed_wirelength: | 0 | 7 |
k6_frac_N10_40nm | always_false.blif | num_io: | 0 | 6 |
k6_frac_N10_40nm | false_path_mux.blif | num_pre_packed_nets: | 4 | 7 |
k6_frac_N10_40nm | false_path_mux.blif | num_pre_packed_blocks: | 6 | 8 |
k6_frac_N10_40nm | false_path_mux.blif | num_post_packed_nets: | 4 | 5 |
k6_frac_N10_40nm | false_path_mux.blif | placed_CPD_est: | 0.708653 | 1.37865 |
k6_frac_N10_40nm | false_path_mux.blif | placed_setup_TNS_est: | -0.708653 | -1.37865 |
k6_frac_N10_40nm | false_path_mux.blif | placed_setup_WNS_est: | -0.708653 | -1.37865 |
k6_frac_N10_40nm | false_path_mux.blif | num_io: | 3 | 4 |
k6_frac_N10_40nm | mult_4x4.blif | num_post_packed_nets: | 21 | 27 |
k6_frac_N10_40nm | mult_5x5.blif | num_pre_packed_nets: | 47 | 43 |
k6_frac_N10_40nm | mult_5x5.blif | num_pre_packed_blocks: | 57 | 53 |
k6_frac_N10_40nm | mult_5x5.blif | logic_block_area_used: | 215576 | 161682 |
k6_frac_N10_40nm | mult_5x5.blif | num_clb: | 4 | 3 |
k6_frac_N10_40nm | rca_3bit.blif | num_pre_packed_nets: | 12 | 13 |
k6_frac_N10_40nm | rca_3bit.blif | num_pre_packed_blocks: | 16 | 17 |
k6_frac_N10_40nm | rca_4bit.blif | num_pre_packed_nets: | 15 | 17 |
k6_frac_N10_40nm | rca_4bit.blif | num_pre_packed_blocks: | 20 | 22 |
k6_frac_N10_40nm | rca_4bit.blif | placed_CPD_est: | 1.04365 | 1.71365 |
k6_frac_N10_40nm | rca_4bit.blif | placed_setup_TNS_est: | -4.54826 | -6.55826 |
k6_frac_N10_40nm | rca_4bit.blif | placed_setup_WNS_est: | -1.04365 | -1.71365 |
pass
arch | circuit | changes | golden | result |
---|---|---|---|---|
k6_N10_mem32K_40nm | stereovision3.v | num_pre_packed_nets: | 287 | 267 |
k6_N10_mem32K_40nm | stereovision3.v | num_pre_packed_blocks: | 317 | 297 |
k6_N10_mem32K_40nm | stereovision3.v | device_width: | 8 | 7 |
k6_N10_mem32K_40nm | stereovision3.v | device_height: | 8 | 7 |
k6_N10_mem32K_40nm | stereovision3.v | device_grid_tiles: | 64 | 49 |
k6_N10_mem32K_40nm | stereovision3.v | logic_block_area_total: | 2.23746e+06 | 1.07788e+06 |
k6_N10_mem32K_40nm | stereovision3.v | min_chan_width_routing_area_total: | 69362.3 | 49980.0 |
k6_N10_mem32K_40nm | stereovision3.v | crit_path_routing_area_total: | 90639.5 | 65453.8 |
k6_N10_mem32K_40nm | stereovision3.v | num_clb: | 22 | 19 |
arch | circuit | changes | golden | result |
---|---|---|---|---|
k6_N10_mem32K_40nm | stereovision3.v | num_pre_packed_nets: | 287 | 262 |
k6_N10_mem32K_40nm | stereovision3.v | num_pre_packed_blocks: | 317 | 292 |
k6_N10_mem32K_40nm | stereovision3.v | device_width: | 8 | 7 |
k6_N10_mem32K_40nm | stereovision3.v | device_height: | 8 | 7 |
k6_N10_mem32K_40nm | stereovision3.v | device_grid_tiles: | 64 | 49 |
k6_N10_mem32K_40nm | stereovision3.v | logic_block_area_total: | 2.23746e+06 | 1.07788e+06 |
k6_N10_mem32K_40nm | stereovision3.v | routing_area_total: | 289172. | 207176. |
k6_N10_mem32K_40nm | stereovision3.v | num_clb: | 22 | 19 |
arch | circuit | changes | golden | result |
---|---|---|---|---|
k6_N10_mem32K_40nm | stereovision3.v | num_pre_packed_nets: | 287 | 267 |
k6_N10_mem32K_40nm | stereovision3.v | num_pre_packed_blocks: | 317 | 297 |
k6_N10_mem32K_40nm | stereovision3.v | device_width: | 8 | 7 |
k6_N10_mem32K_40nm | stereovision3.v | device_height: | 8 | 7 |
k6_N10_mem32K_40nm | stereovision3.v | device_grid_tiles: | 64 | 49 |
k6_N10_mem32K_40nm | stereovision3.v | routed_wirelength: | 1110 | 650 |
k6_N10_mem32K_40nm | stereovision3.v | logic_block_area_total: | 2.23746e+06 | 1.07788e+06 |
k6_N10_mem32K_40nm | stereovision3.v | min_chan_width_routing_area_total: | 63135.7 | 49980.0 |
k6_N10_mem32K_40nm | stereovision3.v | crit_path_routed_wirelength: | 796 | 465 |
k6_N10_mem32K_40nm | stereovision3.v | crit_path_routing_area_total: | 82288.3 | 65453.8 |
k6_N10_mem32K_40nm | stereovision3.v | num_clb: | 22 | 19 |
pass
pass
arch | circuit | changes | golden | result |
---|---|---|---|---|
k6_N10_40nm | stereovision3.v | num_pre_packed_nets: | 287 | 267 |
k6_N10_40nm | stereovision3.v | num_pre_packed_blocks: | 317 | 297 |
k6_N10_40nm | stereovision3.v | num_clb: | 22 | 19 |
arch | circuit | changes | golden | result |
---|---|---|---|---|
k6_N10_mem32K_40nm | stereovision3.v | num_pre_packed_nets: | 287 | 267 |
k6_N10_mem32K_40nm | stereovision3.v | num_pre_packed_blocks: | 317 | 297 |
k6_N10_mem32K_40nm | stereovision3.v | device_width: | 8 | 7 |
k6_N10_mem32K_40nm | stereovision3.v | device_height: | 8 | 7 |
k6_N10_mem32K_40nm | stereovision3.v | device_grid_tiles: | 64 | 49 |
k6_N10_mem32K_40nm | stereovision3.v | logic_block_area_total: | 2.23746e+06 | 1.07788e+06 |
k6_N10_mem32K_40nm | stereovision3.v | min_chan_width_routing_area_total: | 69362.3 | 49980.0 |
k6_N10_mem32K_40nm | stereovision3.v | crit_path_routing_area_total: | 90639.5 | 65453.8 |
k6_N10_mem32K_40nm | stereovision3.v | num_clb: | 22 | 19 |
arch | circuit | changes | golden | result |
---|---|---|---|---|
k6_N10_mem32K_40nm | stereovision3.v | num_pre_packed_nets: | 287 | 267 |
k6_N10_mem32K_40nm | stereovision3.v | num_pre_packed_blocks: | 317 | 297 |
k6_N10_mem32K_40nm | stereovision3.v | device_width: | 8 | 7 |
k6_N10_mem32K_40nm | stereovision3.v | device_height: | 8 | 7 |
k6_N10_mem32K_40nm | stereovision3.v | device_grid_tiles: | 64 | 49 |
k6_N10_mem32K_40nm | stereovision3.v | logic_block_area_total: | 2.23746e+06 | 1.07788e+06 |
k6_N10_mem32K_40nm | stereovision3.v | num_clb: | 22 | 19 |
k6_N10_mem32K_40nm_nonuniform | stereovision3.v | num_pre_packed_nets: | 287 | 267 |
k6_N10_mem32K_40nm_nonuniform | stereovision3.v | num_pre_packed_blocks: | 317 | 297 |
k6_N10_mem32K_40nm_nonuniform | stereovision3.v | device_width: | 8 | 7 |
k6_N10_mem32K_40nm_nonuniform | stereovision3.v | device_height: | 8 | 7 |
k6_N10_mem32K_40nm_nonuniform | stereovision3.v | device_grid_tiles: | 64 | 49 |
k6_N10_mem32K_40nm_nonuniform | stereovision3.v | logic_block_area_total: | 2.23746e+06 | 1.07788e+06 |
k6_N10_mem32K_40nm_nonuniform | stereovision3.v | num_clb: | 22 | 19 |
k6_N10_mem32K_40nm_pulse | stereovision3.v | num_pre_packed_nets: | 287 | 262 |
k6_N10_mem32K_40nm_pulse | stereovision3.v | num_pre_packed_blocks: | 317 | 292 |
k6_N10_mem32K_40nm_pulse | stereovision3.v | device_width: | 8 | 7 |
k6_N10_mem32K_40nm_pulse | stereovision3.v | device_height: | 8 | 7 |
k6_N10_mem32K_40nm_pulse | stereovision3.v | device_grid_tiles: | 64 | 49 |
k6_N10_mem32K_40nm_pulse | stereovision3.v | min_chan_width: | 12 | 16 |
k6_N10_mem32K_40nm_pulse | stereovision3.v | logic_block_area_total: | 2.23746e+06 | 1.07788e+06 |
k6_N10_mem32K_40nm_pulse | stereovision3.v | num_clb: | 22 | 19 |
arch | circuit | changes | golden | result |
---|---|---|---|---|
k6_frac_N10_40nm | stereovision3.v | num_pre_packed_nets: | 287 | 262 |
k6_frac_N10_40nm | stereovision3.v | num_pre_packed_blocks: | 317 | 292 |
k6_frac_N10_40nm | stereovision3.v | num_clb: | 16 | 13 |
arch | circuit | changes | golden | result |
---|---|---|---|---|
k6_N10_mem32K_40nm | stereovision3.v | num_pre_packed_nets: | 287 | 267 |
k6_N10_mem32K_40nm | stereovision3.v | num_pre_packed_blocks: | 317 | 297 |
k6_N10_mem32K_40nm | stereovision3.v | device_width: | 8 | 7 |
k6_N10_mem32K_40nm | stereovision3.v | device_height: | 8 | 7 |
k6_N10_mem32K_40nm | stereovision3.v | device_grid_tiles: | 64 | 49 |
k6_N10_mem32K_40nm | stereovision3.v | num_clb: | 22 | 19 |
arch | circuit | changes | golden | result |
---|---|---|---|---|
k6_N10_mem32K_40nm | stereovision3.v | num_pre_packed_nets: | 287 | 267 |
k6_N10_mem32K_40nm | stereovision3.v | num_pre_packed_blocks: | 317 | 297 |
k6_N10_mem32K_40nm | stereovision3.v | device_width: | 8 | 7 |
k6_N10_mem32K_40nm | stereovision3.v | device_height: | 8 | 7 |
k6_N10_mem32K_40nm | stereovision3.v | device_grid_tiles: | 64 | 49 |
k6_N10_mem32K_40nm | stereovision3.v | num_clb: | 22 | 19 |
arch | circuit | changes | golden | result |
---|---|---|---|---|
k6_N10_mem32K_40nm_fc_abs | stereovision3.v | num_pre_packed_nets: | 287 | 262 |
k6_N10_mem32K_40nm_fc_abs | stereovision3.v | num_pre_packed_blocks: | 317 | 292 |
k6_N10_mem32K_40nm_fc_abs | stereovision3.v | device_width: | 8 | 7 |
k6_N10_mem32K_40nm_fc_abs | stereovision3.v | device_height: | 8 | 7 |
k6_N10_mem32K_40nm_fc_abs | stereovision3.v | device_grid_tiles: | 64 | 49 |
k6_N10_mem32K_40nm_fc_abs | stereovision3.v | logic_block_area_total: | 2.23746e+06 | 1.07788e+06 |
k6_N10_mem32K_40nm_fc_abs | stereovision3.v | min_chan_width_routing_area_total: | 123323. | 88828.2 |
k6_N10_mem32K_40nm_fc_abs | stereovision3.v | crit_path_routing_area_total: | 144103. | 104221. |
k6_N10_mem32K_40nm_fc_abs | stereovision3.v | num_clb: | 22 | 19 |
pass
pass
pass
pass
pass
arch | circuit | changes | golden | result |
---|---|---|---|---|
k6_frac_N10_frac_chain_mem32K_40nm | stereovision3.v | logic_block_area_used: | 970092 | 700622 |
k6_frac_N10_frac_chain_mem32K_40nm | stereovision3.v | num_clb: | 18 | 13 |
arch | circuit | changes | golden | result |
---|---|---|---|---|
k6_frac_N10_frac_chain_mem32K_40nm | stereovision3.v | num_clb: | 18 | 15 |
pass
pass
pass
Thanks @jeanlego for putting this all together, it looks like a lot of work!
Regression Failures
for 1 case odin could not simulate fpu_hard_block_arch: hard_fpu_arch_timing.xml/mm3.v/common (failed to include simulation) OK but this is because the odin siulator doesn't have the hardblock built for the simulator
Seems like disabling simulation is the right approach here.
ABC Latch Classes
the thought is to use ABC in two passes, before the first pass register all the clock domains in a file and assign them a class rewrite the the blif file to use the classes for the latches run a sequential only optimization through ABC recover the clock domains from the classes file second pass is to run the iterative approach with combinational optimization only.
Provided the class feature works correctly, why would we need to go through ABC in two passes? Wouldn't one pass be sufficient (would could then restore the clocks from the saved class information)?
I will note that at one point I did look into ABC's latch class feature, but when I tried it the class information was not included in ABC's output BLIF (i.e. it was dropped just like the regular clocks...).
QoR Failures Taking a look at the QoR failures you've included they all seem OK.
They mostly look like improvements caused by fewer netlist primitives (fewer pre-packed blocks and pre-packed nets) for the stereovision3
benchmark.
failed to include simulation
I'm seeing a large number of failed to include simulation OK
lines when I use run_vtr_flow on this branch.
Can you clarify what this means? I think it means ODIN is not performing simulation to verify that the ODIN and ABC BLIF's are equivalent.
If that is the case, and we don't want to run simulation, then we probably should only print OK
(since skipping the simulation is expected). Any thoughts?
Verification Can you clarify what testing this branch has gone through?
How have you verified that the different flow types are functionally correct?
How have you verified that the correct clocks are being restored?
given
The register class can be used to limit transformations during sequential synthesis
I understood that I would need to separate the sequential synthesis/optimization into another pass. I have very limited knowledge of ABC so any clarification is welcomed.
Also if classes are indeed respected throughout ABC optimization, I think it would be a good approach to not blackbox the latches and simply let ABC handle this. The script included already provides a way to list all the clock domain and modify them, it would be trivial to tag .latch using it.
This means that trying to run the simulator with a given architecture & verilog file exited with a failure, so we skip verification using the simulator, to know what exactly failed the file is stored within the temp directory, this is a case by case issue, i'd need to know wich one failed to tell you why.
I'll look into deeper, I only had 2 failures to simulate for vtr_reg_strong and only 1 failed to be included at all.
The simulator does have good hard block support (supports ram, multiplier and etc...) but doesnt have the hard block logic for the arch: hard_fpu_arch_timing.xml.
by default the run_vtr_flow script is set to run vanilla (the way we currently use it) but with the simulator included now.
I've tested the branch using the vtr_reg_basic/strong But again I would not merge this yet as I'd like to have more hand written vector files and benchmark for ODIN as I have regenerate all of the current vectors to now include clock and display both edge of the clock. I'd like to have something very soon ready to be merged.
I havent encountered any problem and the simulator produces the expected output so far, but Id to be sure before overwritting the benchmarking suite vectors.
Thanks @jeanlego for pulling this together.
Some questions:
Here are the results I generated from the mulipass_abc_flow branch on the VTR benchmarks.
As mentioned, iterative
seems to produce fewer primitives, lower channel width, and wirelength but high critical path delay.
However since all the VTR benchmarks (except the tiny stereovision3
benchmark) are all single clock, we would expect the results to be the same as vanilla
-- since for the single clock case we'd expect iterative
to reduce down to vanilla
.
The difference could be that i force an initial value on ABC using the init command, and I now force Odin to default to an initial value of 0 rather than X for .latch unless specified otherwise.
I had to do these for the simulator to assure an initial value that matched before and after abc.
I defined "clock domains" to be a triplet of driver, edge sensitivity, and initial value. Each run includes a report.clk file providing the different "clock domain" that are then used to itterate. Each individual itteration include a report of domain that are vanilla and black boxed file that display the number of latches within this clock domain. Because of this a lot of single clock driver circuit can have many "clock domain" for my script. I did try to only use the driver for a clock domain but it would fail to give the right result.
I also had to modify Odin as it used to only display rising edge for latches but now as support for all 5 types. Some result discrepancy could come from there if you compared with the old version.
Vanilla is running with the patched restoration script ( typo prevented the script from rewinding the file (that is original) ) Blanket approach is when we black box all the latches and itteration is one "clock domain" triplet (described above) But the biggest takeaway here is that all other than itteration fails to match initial vs post ABC simulation
Should this be closed based on #411
Yes, please :)
On Feb 18, 2019, at 2:20 AM, jeanlego notifications@github.com<mailto:notifications@github.com> wrote:
Should this be closed based on 411
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/verilog-to-routing/vtr-verilog-to-routing/issues/384#issuecomment-464600718, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKDEpvTkEFRehRZdGSlZn9AzzR3lN8_qks5vOkYcgaJpZM4WB9R9.
Dr. Kenneth B. Kent, P.Eng. Professor & Director, IBM Centre for Advanced Studies - Atlantic Faculty of Computer Science, University of New Brunswick 540 Windsor Street, Fredericton, New Brunswick, E3B 5A3 ph. (506) 451-6971 fax. (506) 453-3566
Honorary Professor, Department of Computer Science and Institute of Visual Computing Bonn-Rhein-Sieg University of Applied Sciences Grantham-Allee 20, 53757 Sankt Augustin, Germany
closing as PR #411 fixed the issue
Expected Behaviour
The VTR flow should handle circuits with no, single, or multiple clocks correctly.
Current Behaviour
There are actually two (inter-related) bugs here:
A) Clock restoration fails in certain cases, due to a bug in the clock restoration script. This occurs when the .latches are re-ordered in the ABC output.
B) More seriously, fixing (A) reveals that clock restoration is currently incorrect in many cases, and no warning/error is generated. This has been passing regression tests since the script incorrectly uses the first clock for all latches; and we've only tested these code-paths with single-clock circuits (which coincidentally produces the correct results).
Steps to Reproduce
A) Clock restoration fails if
.latch
es are re-ordered (thanks to Dipika Badri for the initial report!)Observe the following error:
However, the .latch top^FF_NODE~50 does in fact exist in both the reference and post-ABC netlists.
B) Fix file seeking behaviour in restore_multiclock_latch_information.pl. Run
vtr_reg_basic
and observe new cock restoration failures on previously passing circuits.Context
ABC strips away clock information when it performs synthesis. This information is required at later stages of the design flow (e.g. VPR) so that multi-clock circuits are handled correctly.
We previously beleived that simply name-matching the
.latch
es would be sufficient to uniquely identify.latch
es in both the pre and post ABC netlists. However this is not always the case.For instance, before ABC:
and after ABC:
indicating that ABC is significantly modifying the latch names and structure in non-trivial ways. (Investigation shows that some latches are optimized away during the ABC command
ifraig
). This means that simple name matching is insufficient to correctly restore clock information correctly.Possible Solution
For (A) it is trivial to fix the script to correct this issue by ensuring it seeks to the beginning of the file correctly. However fixing this causes (B) to start generating errors (it was a silent error before).
As shown above, it seems that in some instances ABC completely re-names the connections to/from .latches, which breaks the fundamental assumption that the clock restoration script makes (i.e. that it can match .latches based on output signal names).
It seems to me that the best course of action is to convert all
.latch
es to blackboxes before passing netlists through ABC, and then restoring them to.latch
es after ABC. This would ensure all clock information was preserved, but at the cost of disabling sequential optimizations within ABC (we already disable most sequential optimizations like re-timing anyway).For example, given an initial
.latch
:a script would convert it to a blackbox as it was sent through ABC:
where the subckt type (
latch_re_3
) encodes that it is rising-edge trigged with an initial value set to don't care (3
). Since ABC leaves blackboxes alone, a script could then convert the blackbox back into a latch without loosing any information:Your Environment