hdl / bazel_rules_hdl

Hardware Description Language (Verilog, VHDL, Chisel, nMigen, etc) with open tools (Yosys, Verilator, OpenROAD, etc) rules for Bazel (https://bazel.build)
Apache License 2.0
104 stars 40 forks source link

Add tests for ASIC flow with ASAP7 #172

Closed lpawelcz closed 10 months ago

lpawelcz commented 11 months ago

From what I see, when it comes to ASAP7 support in bazel_rules_hdl the tests cover only the synthesis step: https://github.com/hdl/bazel_rules_hdl/blob/e30c65c618dabffdcdeb45ad66e27fbaa2dc573e/synthesis/tests/BUILD#L71-L78

It would be good to add tests that will run place_and_route and gds_write flows with the results from verilog_counter_asap7_synth as input. This is a blocker for https://github.com/google/xls/issues/996

I started work on adding this but I encountered errors in Clock Tree Synthesis. Looks like there are problems with placing FILLER cells, here is the log:

[INFO ORD-0030] Using 12 thread(s).
[INFO CTS-0049] Characterization buffer is: BUFx4_ASAP7_75t_R.
[INFO CTS-0039] Number of created patterns = 11880.
[INFO CTS-0084] Compiling LUT.
Min. len    Max. len    Min. cap    Max. cap    Min. slew   Max. slew
2           8           1           34          1           64          
[WARNING CTS-0043] 1584 wires are pure wire and no slew degradation.
TritonCTS forced slew degradation on these wires.
[INFO CTS-0046]     Number of wire segments: 7500.
[INFO CTS-0047]     Number of keys in characterization LUT: 1744.
[INFO CTS-0048]     Actual min input cap: 1.
[INFO CTS-0007] Net "clk" found for clock "core_clock".
[INFO CTS-0010]  Clock net "clk" has 4 sinks.
[INFO CTS-0008] TritonCTS found 1 clock nets.
[INFO CTS-0097] Characterization used 1 buffer(s) types.
[INFO CTS-0027] Generating H-Tree topology for net clk.
[INFO CTS-0028]  Total number of sinks: 4.
[INFO CTS-0090]  Sinks will be clustered based on buffer max cap.
[INFO CTS-0030]  Number of static layers: 0.
[INFO CTS-0020]  Wire segment unit: 20000  dbu (5 um).
[INFO CTS-0023]  Original sink region: [(91411, 144720), (96595, 166320)].
[INFO CTS-0024]  Normalized sink region: [(4.57055, 7.236), (4.82975, 8.316)].
[INFO CTS-0025]     Width:  0.2592.
[INFO CTS-0026]     Height: 1.0800.
[WARNING CTS-0045] Creating fake entries in the LUT.
 Level 1
    Direction: Vertical
    Sinks per sub-region: 2
    Sub-region size: 0.2592 X 0.5400
[INFO CTS-0034]     Segment length (rounded): 1.
    Key: 7500 outSlew: 2 load: 1 length: 1 isBuffered: false
[INFO CTS-0032]  Stop criterion found. Max number of sinks is 15.
[INFO CTS-0035]  Number of sinks covered: 4.
[INFO CTS-0018]     Created 3 clock buffers.
[INFO CTS-0012]     Minimum number of buffers in the clock path: 2.
[INFO CTS-0013]     Maximum number of buffers in the clock path: 2.
[INFO CTS-0015]     Created 3 clock nets.
[INFO CTS-0016]     Fanout distribution for the current clock = 2:2..
[INFO CTS-0017]     Max level of the clock tree: 1.
[INFO CTS-0098] Clock net "clk"
[INFO CTS-0099]  Sinks 4
[INFO CTS-0100]  Leaf buffers 0
[INFO CTS-0101]  Average sink wire length 25.68 um
[INFO CTS-0102]  Path depth 2 - 2
[INFO RSZ-0058] Using max wire length 294um.
Placement Analysis
---------------------------------
total displacement          8.5 u
average displacement        0.3 u
max displacement            3.9 u
original HPWL             163.7 u
legalized HPWL            175.1 u
delta HPWL                    7 %

[INFO RSZ-0040] Inserted 1 buffers.
[INFO RSZ-0041] Resized 1 instances.
[WARNING RSZ-0062] Unable to repair all setup violations.
[INFO RSZ-0046] Found 4 endpoints with hold violations.
[WARNING RSZ-0066] Unable to repair all hold violations.
Startpoint: reset (input port clocked by core_clock)
Endpoint: _23_ (removal check against rising-edge clock core_clock)
Path Group: **async_default**
Path Type: min

     Cap     Slew    Delay     Time   Description
---------------------------------------------------------------------------
                     0.000    0.000   clock core_clock (rise edge)
                     0.000    0.000   clock network delay (propagated)
                     2.000    2.000 v input external delay
   1.001    0.000    0.000    2.000 v reset (in)
            0.247    0.078    2.078 v input1/A (BUFx2_ASAP7_75t_R)
   3.644   26.326   32.692   34.770 v input1/Y (BUFx2_ASAP7_75t_R)
           26.349    0.456   35.226 v _23_/RESET (ASYNC_DFFHx1_ASAP7_75t_R)
                             35.226   data arrival time

                     0.000    0.000   clock core_clock (rise edge)
                     0.000    0.000   clock source latency
   1.781    0.000    0.000    0.000 ^ clk (in)
            0.393    0.124    0.124 ^ clkbuf_0_clk/A (BUFx4_ASAP7_75t_R)
   2.889   20.818   31.857   31.981 ^ clkbuf_0_clk/Y (BUFx4_ASAP7_75t_R)
           20.842    0.373   32.354 ^ clkbuf_1_0__f_clk/A (BUFx4_ASAP7_75t_R)
   1.982   17.694   37.305   69.659 ^ clkbuf_1_0__f_clk/Y (BUFx4_ASAP7_75t_R)
           17.701    0.180   69.839 ^ _23_/CLK (ASYNC_DFFHx1_ASAP7_75t_R)
                     0.000   69.839   clock reconvergence pessimism
                  9999999848243207295109594873856.000 9999999848243207295109594873856.000   library removal time
                           9999999848243207295109594873856.000   data required time
---------------------------------------------------------------------------
                           9999999848243207295109594873856.000   data required time
                            -35.226   data arrival time
---------------------------------------------------------------------------
                           -9999999848243207295109594873856.000   slack (VIOLATED)

Startpoint: _26_ (rising edge-triggered flip-flop clocked by core_clock)
Endpoint: _26_ (rising edge-triggered flip-flop clocked by core_clock)
Path Group: core_clock
Path Type: min

     Cap     Slew    Delay     Time   Description
---------------------------------------------------------------------------
                     0.000    0.000   clock core_clock (rise edge)
                     0.000    0.000   clock source latency
   1.781    0.000    0.000    0.000 ^ clk (in)
            0.393    0.124    0.124 ^ clkbuf_0_clk/A (BUFx4_ASAP7_75t_R)
   2.889   20.818   31.857   31.981 ^ clkbuf_0_clk/Y (BUFx4_ASAP7_75t_R)
           20.842    0.373   32.354 ^ clkbuf_1_0__f_clk/A (BUFx4_ASAP7_75t_R)
   1.982   17.694   37.305   69.659 ^ clkbuf_1_0__f_clk/Y (BUFx4_ASAP7_75t_R)
           17.700    0.166   69.825 ^ _26_/CLK (ASYNC_DFFHx1_ASAP7_75t_R)
   2.626   59.998  101.647  171.473 ^ _26_/QN (ASYNC_DFFHx1_ASAP7_75t_R)
           60.006    0.382  171.854 ^ _14_/A (XOR2x1_ASAP7_75t_R)
   0.732   15.797   21.475  193.329 v _14_/Y (XOR2x1_ASAP7_75t_R)
           15.797    0.039  193.368 v _26_/D (ASYNC_DFFHx1_ASAP7_75t_R)
                            193.368   data arrival time

                     0.000    0.000   clock core_clock (rise edge)
                     0.000    0.000   clock source latency
   1.781    0.000    0.000    0.000 ^ clk (in)
            0.393    0.124    0.124 ^ clkbuf_0_clk/A (BUFx4_ASAP7_75t_R)
   2.889   20.818   31.857   31.981 ^ clkbuf_0_clk/Y (BUFx4_ASAP7_75t_R)
           20.842    0.373   32.354 ^ clkbuf_1_0__f_clk/A (BUFx4_ASAP7_75t_R)
   1.982   17.694   37.305   69.659 ^ clkbuf_1_0__f_clk/Y (BUFx4_ASAP7_75t_R)
           17.700    0.166   69.825 ^ _26_/CLK (ASYNC_DFFHx1_ASAP7_75t_R)
                     0.000   69.825   clock reconvergence pessimism
                     8.565   78.390   library hold time
                             78.390   data required time
---------------------------------------------------------------------------
                             78.390   data required time
                           -193.368   data arrival time
---------------------------------------------------------------------------
                            114.978   slack (MET)

Startpoint: reset (input port clocked by core_clock)
Endpoint: _25_ (recovery check against rising-edge clock core_clock)
Path Group: **async_default**
Path Type: max

     Cap     Slew    Delay     Time   Description
---------------------------------------------------------------------------
                     0.000    0.000   clock core_clock (rise edge)
                     0.000    0.000   clock network delay (propagated)
                     2.000    2.000 v input external delay
   1.001    0.000    0.000    2.000 v reset (in)
            0.247    0.078    2.078 v input1/A (BUFx2_ASAP7_75t_R)
   3.644   26.326   32.692   34.770 v input1/Y (BUFx2_ASAP7_75t_R)
           26.337    0.312   35.082 v _25_/RESET (ASYNC_DFFHx1_ASAP7_75t_R)
                             35.082   data arrival time

                    10.000   10.000   clock core_clock (rise edge)
                     0.000   10.000   clock source latency
   1.781    0.000    0.000   10.000 ^ clk (in)
            0.393    0.124   10.124 ^ clkbuf_0_clk/A (BUFx4_ASAP7_75t_R)
   2.889   20.818   31.857   41.981 ^ clkbuf_0_clk/Y (BUFx4_ASAP7_75t_R)
           20.825    0.210   42.191 ^ clkbuf_1_1__f_clk/A (BUFx4_ASAP7_75t_R)
   1.390   15.786   36.094   78.285 ^ clkbuf_1_1__f_clk/Y (BUFx4_ASAP7_75t_R)
           15.787    0.047   78.333 ^ _25_/CLK (ASYNC_DFFHx1_ASAP7_75t_R)
                     0.000   78.333   clock reconvergence pessimism
                    50.681  129.014   library recovery time
                            129.014   data required time
---------------------------------------------------------------------------
                            129.014   data required time
                            -35.082   data arrival time
---------------------------------------------------------------------------
                             93.932   slack (MET)

Startpoint: _23_ (rising edge-triggered flip-flop clocked by core_clock)
Endpoint: _25_ (rising edge-triggered flip-flop clocked by core_clock)
Path Group: core_clock
Path Type: max

     Cap     Slew    Delay     Time   Description
---------------------------------------------------------------------------
                     0.000    0.000   clock core_clock (rise edge)
                     0.000    0.000   clock source latency
   1.781    0.000    0.000    0.000 ^ clk (in)
            0.393    0.124    0.124 ^ clkbuf_0_clk/A (BUFx4_ASAP7_75t_R)
   2.889   20.818   31.857   31.981 ^ clkbuf_0_clk/Y (BUFx4_ASAP7_75t_R)
           20.842    0.373   32.354 ^ clkbuf_1_0__f_clk/A (BUFx4_ASAP7_75t_R)
   1.982   17.694   37.305   69.659 ^ clkbuf_1_0__f_clk/Y (BUFx4_ASAP7_75t_R)
           17.701    0.180   69.839 ^ _23_/CLK (ASYNC_DFFHx1_ASAP7_75t_R)
   2.028   51.511   96.019  165.858 ^ _23_/QN (ASYNC_DFFHx1_ASAP7_75t_R)
           51.513    0.199  166.057 ^ _15_/A (INVx2_ASAP7_75t_R)
   1.952   22.485   20.996  187.053 v _15_/Y (INVx2_ASAP7_75t_R)
           22.493    0.245  187.298 v _21_/A (HAxp5_ASAP7_75t_R)
   0.827   38.129   26.500  213.798 ^ _21_/CON (HAxp5_ASAP7_75t_R)
           38.129    0.053  213.851 ^ _20_/A (INVx1_ASAP7_75t_R)
   1.054   20.339   20.017  233.868 v _20_/Y (INVx1_ASAP7_75t_R)
           20.340    0.058  233.926 v _22_/B (HAxp5_ASAP7_75t_R)
   1.722   56.608   33.466  267.392 ^ _22_/CON (HAxp5_ASAP7_75t_R)
   0.691   33.513   25.024  292.417 v _22_/SN (HAxp5_ASAP7_75t_R)
           33.513    0.022  292.438 v _19_/A (INVx1_ASAP7_75t_R)
   0.712   20.685   18.485  310.923 ^ _19_/Y (INVx1_ASAP7_75t_R)
           20.686    0.035  310.958 ^ _25_/D (ASYNC_DFFHx1_ASAP7_75t_R)
                            310.958   data arrival time

                    10.000   10.000   clock core_clock (rise edge)
                     0.000   10.000   clock source latency
   1.781    0.000    0.000   10.000 ^ clk (in)
            0.393    0.124   10.124 ^ clkbuf_0_clk/A (BUFx4_ASAP7_75t_R)
   2.889   20.818   31.857   41.981 ^ clkbuf_0_clk/Y (BUFx4_ASAP7_75t_R)
           20.825    0.210   42.191 ^ clkbuf_1_1__f_clk/A (BUFx4_ASAP7_75t_R)
   1.390   15.786   36.094   78.285 ^ clkbuf_1_1__f_clk/Y (BUFx4_ASAP7_75t_R)
           15.787    0.047   78.333 ^ _25_/CLK (ASYNC_DFFHx1_ASAP7_75t_R)
                     0.000   78.333   clock reconvergence pessimism
                   -18.877   59.456   library setup time
                             59.456   data required time
---------------------------------------------------------------------------
                             59.456   data required time
                           -310.958   data arrival time
---------------------------------------------------------------------------
                           -251.502   slack (VIOLATED)

Placement Analysis
---------------------------------
total displacement          5.1 u
average displacement        0.2 u
max displacement            2.4 u
original HPWL             178.1 u
legalized HPWL            182.0 u
delta HPWL                    2 %

[INFO DPL-0001] Placed 6405 filler instances.
[WARNING DPL-0005] Overlap check failed (6369).
 FILLER_0_1 overlaps FILLER_0_0
 .
 .
 .
 FILLER_35_183 overlaps FILLER_35_181
[ERROR DPL-0033] detailed placement checks failed.

CC @proppy

lpawelcz commented 11 months ago

It is possible to finish the flow by moving the check_placement step before filler_placement so that the check won't fail. That is only for testing purpose of course. However, the result GDS looks a lot different than one I got straight from ORFS flow.

In GDS from ORFS flow filler cells are placed uniformly throughout the whole die and they cover the whole unused part of the layout. There are multiple types of filler cells used:

It looks like filler cells are placed along Power Delivery Network lines. In GDS from bazel_rules flow those cells are spaced between themselves - exactly 4 lengths/widths of the filler cell apart from each other. They are also limited to only one cell type (which is easy-to-fix problem with incorrect configuration here):

PDN looks quite different - it has less power lines (horizontal, parallel lines from one edge of the die to other), the spacing between those lines in bazel_rules GDS is 4 times bigger than in GDS from ORFS.

I will attach here GDS files for comparison. counter_gds_files.zip Those were generated from 8-bit counter verilog, with clk_period 10ns with configuration for ORFS flow:

export DIE_AREA = 0 0 20 20
export CORE_AREA = 1 1 19 19
export PLACE_DENSITY = 0.65

and for bazel flow:

placement_density = "0.65",            
core_padding_microns = 1,
die_width_microns = 20,
die_height_microns = 20,

Additional notes on attached GDS generated from bazel flow:

ASAP7 comes with 2 variants of the technology: regular and scaled (4 times bigger). However, ASAP7 does not provide all required files for the regular variant (e.g. technology LEF), all files required for RTL->GDS flow are available only for the scaled variant. ORFS uses regular variant. It doesn't have ASAP7 pinned as submodule or anything like that. Instead of that, at some point in history, the most important files from this PDK were imported straight into ORFS repository. Missing files for the regular variant were imported from scaled variant and manually converted to match regular variant. Then those files were updated with changes from upstream ASAP7 repository and probably with some additional external changes. Because of that it is hard to tell which version of ASAP7 is the closest to one used in ORFS.

On the other hand, bazel_rules_hdl in rules for ASAP7 uses scaled variant (even though cell library target is named asap7_rvt_1x) because it is the only one which has all the required files available directly in PDK repository.

I also noticed that ASAP7 version used in bazel_rules_hdl is very old (The-OpenROAD-Project/asap7/157c92cf), for sure much older than one in ORFS. This could be the cause of some of the differences between layouts.

Having said that, I feel that most of the issues that I have now with the bazel flow with ASAP7 are caused by some possible inconsistencies in this scaling between two variants of the PDK.

For now, I think the most important question is to decide whether bazel flow implementation should be as close to ORFS implementation as possible (use regular PDK variant, modify missing files on the fly or fix upstream PDK) or should it just use what is currently available in the PDK (stick with the existing implementation, use scaled x4 variant of the PDK). @proppy and @QuantamHD I would greatly appreciate your opinion here.

proppy commented 11 months ago

ASAP7 comes with 2 variants of the technology: regular and scaled (4 times bigger). However, ASAP7 does not provide all required files for the regular variant (e.g. technology LEF), all files required for RTL->GDS flow are available only for the scaled variant. ORFS uses regular variant. It doesn't have ASAP7 pinned as submodule or anything like that. Instead of that, at some point in history, the most important files from this PDK were imported straight into ORFS repository. Missing files for the regular variant were imported from scaled variant and manually converted to match regular variant. Then those files were updated with changes from upstream ASAP7 repository and probably with some additional external changes. Because of that it is hard to tell which version of ASAP7 is the closest to one used in ORFS.

Is there already an issue in the ASAP7 repo to discuss reconciling the two version? It sounds like we should be able to import ORFS version in the ASAP7 repo since it's incomplete there.

I will attach here GDS files for comparison.

I think it would also be useful to compare the TCLs generated by bazel_rules_hdl with the ones coming from ORFS for ASAP7, to see how the two flows differ.

lpawelcz commented 11 months ago

Is there already an issue in the ASAP7 repo to discuss reconciling the two version? It sounds like we should be able to import ORFS version in the ASAP7 repo since it's incomplete there.

There was https://github.com/The-OpenROAD-Project/asap7/issues/19 which touched the problem. There is a comment there that states the files are missing due to licensing issues.

I think it would also be useful to compare the TCLs generated by bazel_rules_hdl with the ones coming from ORFS for ASAP7, to see how the two flows differ.

I will check TCL scripts, compare them and post the results.

proppy commented 11 months ago

@QuantamHD for visilibility

lpawelcz commented 11 months ago

I did an experiment with the following changes to bazel_rules_hdl flow:

GDS file generated with such modified bazel flow is very similar to GDS generated with ORFS. The spacings between filler cells and PDN lines are the same now, however the flow still fails on check_placement and in order to generate the GDS I had to comment out this check. It is not yet clear to me what is the cause of those placement failures. What is even more interesting is that if you check the frames of filler cells that are reported in placement failure in bazel flow GDS, you would see that they indeed overlap each other but you would see exactly the same overlap in ORFS GDS and for some reason the placement check wont fail in case of ORFS flow.

proppy commented 11 months ago

bumped ASAP7 dependency to current upstream version

curious what happens if you use if you grab the ASAP7 bits from ORFS (https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/tree/master/flow/platforms/asap7) rather than the ASAP7 upstream repo? Do you still get the same failures?

proppy commented 11 months ago

GDS file generated with such modified bazel flow is very similar to GDS generated with ORFS

klayout has a gdstxt export which could be useful for diffing: https://github.com/KLayout/klayout/blob/master/src/buddies/src/bd/strm2gdstxt.cc

proppy commented 11 months ago

flow still fails on placement_check

do you mean check_placement?

proppy commented 11 months ago

for some reason the placement check wont fail in case of ORFS flow.

did you already compare the *_place_and_route_commands.tcl file generated in bazel-bin with the ORFS TCL scripts in https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/tree/master/flow/scripts?

lpawelcz commented 11 months ago

do you mean check_placement?

Yes, sorry

did you already compare the *_place_and_route_commands.tcl file generated in bazel-bin with the ORFS TCL scripts in https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/tree/master/flow/scripts?

In bazel_rules_hdl filler cell placement is done in the same script as clock tree synthesis. clock_tree_synthesis command has additional -post_cts_disable arg which seems to be obsolete, while in ORFS there are additional args: -sink_clustering_size, -sink_clustering_max_diameter, -distance_between_buffers and -balance_levels Then there is set_placement_padding which had different values in bazel flow (-left 2 -right 2) and ORFS (-left 1 -right 1), I tried to set values from ORFS but it didn't change anything.

I will try to use ASAP7 files from ORFS and check this gdstxt feature of klayout

lpawelcz commented 11 months ago

@proppy I was able to finish the flow and pass check_placement with setup described in https://github.com/hdl/bazel_rules_hdl/issues/172#issuecomment-1651225432 but without specifying more filler cell types in ASAP7 open_road_pdk_configuration (there was only FILLERxp5_ASAP7_75t_R).

Now I will focus on isolating the fix which is probably one of the changes described in the linked comment. Then I will try to answer the question: Why those additional filler cells cause placement check failures in bazel flow? For that I will compare filler cells definitions from ORFS files with definitions from ASAP7 repository.

EDIT: The crucial changes required for enabling ASAP7 in bazel_rules_hdl flow are:

lpawelcz commented 10 months ago

In order to enable processing designs targeting ASAP7 PDK and to achieve results similar to OpenROAD-flow-scripts it is required to:

With recent updates to ASAP7: