tdene / synth_opt_adders

Prefix tree adder space exploration library
Apache License 2.0
53 stars 8 forks source link

Is there a way to generate higher radix adders? #134

Open rptii opened 1 year ago

rptii commented 1 year ago

I've tried using radix=4 on forest. However I get the following error.

File ~/gits/synth_opt_adders/src/pptrees/ExpressionTree.py:108, in ExpressionTree.__init__(self, width, in_ports, out_ports, name, start_point, alias, no_shape, radix, idem, leaf_labels, node_defs)
    106 cocycle_out_shape = [self.radix * x for x in cocycle_out_shape]
    107 if cocycle_out_shape != self.cocycle_shape:
--> 108     raise ValueError(
    109         (
    110             "The main recurrence node of the tree"
    111             " must have the same input and output shape"
    112         )
    113     )
    114 del cocycle_out_shape
    116 # Check that node shapes are compatible with port shapes
    117 ### NOTE: INPUT SHAPE IS CURRENTLY ASSUMED TO BE [1,1,1,..]
    118 ### TO-DO: Allow for arbitrary input shape
    119 ### Need to check whole file for this implicit assumption

ValueError: The main recurrence node of the tree must have the same input and output shape
tdene commented 1 year ago

Excellent question!

In short, I had to stop supporting higher-radix structures because it was too much effort and it went beyond the papers I was relying on.

Here's what's needed to support higher-radix adders:

  1. Create new templates for higher-radix HDL building blocks (like for example, turning this binary black cell into a radix-4 black cell).
  2. Go into ExpressionTree.py and fix all the places where radix-2 is hard-coded. Example. This might be most of the file; it definitely would require the useful algorithms like rotate / rank to be adjusted.

Item 1 is what's causing the error you see: there are no HDL templates for radix-4 adders.
In the end it would be a lot of work. It's something I actively want to do but do not have the time for.

tdene commented 1 year ago

If, however, the HDL generated by this tool is plugged into a synthesis tool with a logic optimizer, radix-4 adders can be emulated. A radix-2 adder could just be flattened/partitioned in a way that makes the logic optimizer treat it as a radix-4 adder. That kind of functionality is implemented and under active use. If you're interested, we can chat about that.

My experience, however, shows that logic optimizers don't think radix-4 adders are useful and when given the chance will just optimize them down to a radix-2 adder with occasional radix-3 sections. Now, whether that's the reality or just a fault with the tools/cells/PDKs/technology, I don't know.

rptii commented 1 year ago

Hi, thank you for the detailed response and for the awesome project. Actually, I'm not targeting any hardware but trying to replicate the results of the this paper. See, in such approaches you go against the typical hardware idea of minimizing xors and using 2-input gates. There, you need maximum xors and more 4-input gates since they reduce logic depth and cost the same for evaluation.

I'd be really interested in seeing how one can achieve that with your code. Note also, they are replacing the or gate with an xor, because it seems to be equivalent.

I've tried generating a Brent-Kung network using this repo but passing that through ABC generates the following critical path which is twice as long as the one in the paper (since we only AND gates in the critical path, XORs come for free).

WireLoad = "none"  Gates =    106 ( 12.3 %)   Cap = 33.8 ff (  0.0 %)   Area =     3900.00 ( 99.1 %)   Delay =  1926.27 ps  ( 13.2 %)               
Path  0 --      19 : 0    2 pi      A =   0.00  Df =   0.0   -0.0 ps  S =   0.0 ps  Cin =  0.0 ff  Cout =  46.5 ff  Cmax =   0.0 ff  G =    0  
Path  1 --      70 : 2    2 XOR2X1  A =   0.00  Df = 155.4  -28.0 ps  S = 143.5 ps  Cin = 31.9 ff  Cout =  42.2 ff  Cmax = 484.4 ff  G =  131  
Path  2 --      71 : 3    1 AND3X1  A = 100.00  Df = 268.2  -11.6 ps  S =  85.6 ps  Cin = 12.6 ff  Cout =  29.7 ff  Cmax = 505.5 ff  G =  236  
Path  3 --      74 : 2    3 XNOR2X1 A =   0.00  Df = 438.6  -24.5 ps  S = 171.4 ps  Cin = 31.9 ff  Cout =  55.1 ff  Cmax = 484.6 ff  G =  172  
Path  4 --      75 : 3    1 AND3X1  A = 100.00  Df = 552.8   -5.1 ps  S =  84.1 ps  Cin = 12.6 ff  Cout =  29.7 ff  Cmax = 505.5 ff  G =  236  
Path  5 --      79 : 2    4 XNOR2X1 A =   0.00  Df = 743.1  -30.0 ps  S = 200.8 ps  Cin = 31.9 ff  Cout =  67.6 ff  Cmax = 484.6 ff  G =  211  
Path  6 --      80 : 4    1 AND4X1  A = 100.00  Df = 858.4   -7.1 ps  S =  84.5 ps  Cin = 12.5 ff  Cout =  29.7 ff  Cmax = 505.5 ff  G =  236  
Path  7 --      91 : 2    4 XNOR2X1 A =   0.00  Df =1048.8  -29.9 ps  S = 200.8 ps  Cin = 31.9 ff  Cout =  67.6 ff  Cmax = 484.6 ff  G =  211  
Path  8 --      92 : 4    1 AND4X1  A = 100.00  Df =1164.2   -7.0 ps  S =  84.5 ps  Cin = 12.5 ff  Cout =  29.7 ff  Cmax = 505.5 ff  G =  236  
Path  9 --     103 : 2    3 XNOR2X1 A =   0.00  Df =1334.5  -24.6 ps  S = 171.3 ps  Cin = 31.9 ff  Cout =  55.1 ff  Cmax = 484.6 ff  G =  172  
Path 10 --     104 : 3    1 AND3X1  A = 100.00  Df =1448.6   -5.2 ps  S =  84.1 ps  Cin = 12.6 ff  Cout =  29.7 ff  Cmax = 505.5 ff  G =  236  
Path 11 --     108 : 2    2 XNOR2X1 A =   0.00  Df =1598.9  -19.3 ps  S = 142.4 ps  Cin = 31.9 ff  Cout =  42.6 ff  Cmax = 484.6 ff  G =  133  
Path 12 --     111 : 2    1 AND2X1  A = 100.00  Df =1711.7   -3.4 ps  S =  84.5 ps  Cin = 12.6 ff  Cout =  29.7 ff  Cmax = 505.5 ff  G =  234  
Path 13 --     114 : 2    1 XNOR2X1 A =   0.00  Df =1841.4  -14.1 ps  S = 112.5 ps  Cin = 31.9 ff  Cout =  29.7 ff  Cmax = 484.6 ff  G =   92  
Path 14 --     116 : 2    1 XNOR2X1 A =   0.00  Df =1926.3   -2.4 ps  S =  48.0 ps  Cin = 31.9 ff  Cout =   0.0 ff  Cmax = 484.6 ff  G =    0  

Using your generator the reported critical is bellow for tree_type="brent-kung"

WireLoad = "none"  Gates =    131 ( 34.4 %)   Cap = 25.1 ff (  0.0 %)   Area =     5500.00 (100.0 %)   Delay =  2136.77 ps  ( 19.1 %)               
Path  0 --      18 : 0    2 pi      A =   0.00  Df =   0.0   -0.0 ps  S =   0.0 ps  Cin =  0.0 ff  Cout =  46.5 ff  Cmax =   0.0 ff  G =    0  
Path  1 --      50 : 2    2 XOR2X1  A =   0.00  Df = 155.4  -28.0 ps  S = 143.5 ps  Cin = 31.9 ff  Cout =  42.2 ff  Cmax = 484.4 ff  G =  131  
Path  2 --      53 : 3    1 AND3X1  A = 100.00  Df = 232.5  -10.0 ps  S =  40.6 ps  Cin = 12.6 ff  Cout =   9.3 ff  Cmax = 505.5 ff  G =   74  
Path  3 --      54 : 1    1 INVX1   A =   0.00  Df = 275.6   -6.5 ps  S =  41.2 ps  Cin =  9.3 ff  Cout =  12.9 ff  Cmax = 503.8 ff  G =  138  
Path  4 --      57 : 2    2 AND2X1  A = 100.00  Df = 399.7   -6.5 ps  S = 106.1 ps  Cin = 12.6 ff  Cout =  39.0 ff  Cmax = 505.5 ff  G =  308  
Path  5 --      60 : 1    2 INVX1   A =   0.00  Df = 483.8  -15.7 ps  S =  78.7 ps  Cin =  9.3 ff  Cout =  25.4 ff  Cmax = 503.8 ff  G =  271  
Path  6 --      72 : 3    1 AND3X1  A = 100.00  Df = 557.0   -2.1 ps  S =  42.1 ps  Cin = 12.6 ff  Cout =   9.3 ff  Cmax = 505.5 ff  G =   74  
Path  7 --      73 : 1    1 INVX1   A =   0.00  Df = 601.2   -1.9 ps  S =  40.7 ps  Cin =  9.3 ff  Cout =  12.5 ff  Cmax = 503.8 ff  G =  132  
Path  8 --      74 : 3    2 AND3X1  A = 100.00  Df = 725.3   -2.0 ps  S = 106.1 ps  Cin = 12.6 ff  Cout =  39.0 ff  Cmax = 505.5 ff  G =  310  
Path  9 --      77 : 1    3 INVX1   A =   0.00  Df = 828.6   -9.3 ps  S = 106.3 ps  Cin =  9.3 ff  Cout =  38.0 ff  Cmax = 503.8 ff  G =  404  
Path 10 --     111 : 4    1 AND4X1  A = 100.00  Df = 911.3   -7.7 ps  S =  43.7 ps  Cin = 12.5 ff  Cout =   9.3 ff  Cmax = 505.5 ff  G =   74  
Path 11 --     112 : 1    1 INVX1   A =   0.00  Df = 957.8  -11.8 ps  S =  40.9 ps  Cin =  9.3 ff  Cout =  12.5 ff  Cmax = 503.8 ff  G =  132  
Path 12 --     113 : 4    2 AND4X1  A = 100.00  Df =1081.8  -11.7 ps  S = 106.1 ps  Cin = 12.5 ff  Cout =  39.0 ff  Cmax = 505.5 ff  G =  311  
Path 13 --     116 : 1    3 INVX1   A =   0.00  Df =1175.8   -0.4 ps  S = 106.3 ps  Cin =  9.3 ff  Cout =  38.0 ff  Cmax = 503.8 ff  G =  404  
Path 14 --     150 : 4    1 AND4X1  A = 100.00  Df =1267.8  -17.5 ps  S =  43.7 ps  Cin = 12.5 ff  Cout =   9.3 ff  Cmax = 505.5 ff  G =   74  
Path 15 --     151 : 1    1 INVX1   A =   0.00  Df =1314.3  -21.5 ps  S =  40.9 ps  Cin =  9.3 ff  Cout =  12.5 ff  Cmax = 503.8 ff  G =  132  
Path 16 --     152 : 4    2 AND4X1  A = 100.00  Df =1438.4  -21.4 ps  S = 106.1 ps  Cin = 12.5 ff  Cout =  39.0 ff  Cmax = 505.5 ff  G =  311  
Path 17 --     155 : 1    2 INVX1   A =   0.00  Df =1513.2  -12.3 ps  S =  78.7 ps  Cin =  9.3 ff  Cout =  25.4 ff  Cmax = 503.8 ff  G =  271  
Path 18 --     167 : 3    1 AND3X1  A = 100.00  Df =1600.1  -25.9 ps  S =  42.1 ps  Cin = 12.6 ff  Cout =   9.3 ff  Cmax = 505.5 ff  G =   74  
Path 19 --     168 : 1    1 INVX1   A =   0.00  Df =1646.4  -29.9 ps  S =  40.7 ps  Cin =  9.3 ff  Cout =  12.5 ff  Cmax = 503.8 ff  G =  132  
Path 20 --     169 : 3    2 AND3X1  A = 100.00  Df =1770.4  -29.9 ps  S = 106.1 ps  Cin = 12.6 ff  Cout =  39.0 ff  Cmax = 505.5 ff  G =  310  
Path 21 --     172 : 1    1 INVX1   A =   0.00  Df =1820.8  -19.6 ps  S =  55.5 ps  Cin =  9.3 ff  Cout =  12.9 ff  Cmax = 503.8 ff  G =  138  
Path 22 --     173 : 2    1 AND2X1  A = 100.00  Df =1902.5  -29.5 ps  S =  40.1 ps  Cin = 12.6 ff  Cout =   9.3 ff  Cmax = 505.5 ff  G =   73  
Path 23 --     174 : 1    1 INVX1   A =   0.00  Df =1949.3  -33.2 ps  S =  41.3 ps  Cin =  9.3 ff  Cout =  12.9 ff  Cmax = 503.8 ff  G =  138  
Path 24 --     177 : 2    1 AND2X1  A = 100.00  Df =2057.5  -31.1 ps  S =  84.8 ps  Cin = 12.6 ff  Cout =  29.7 ff  Cmax = 505.5 ff  G =  234  
Path 25 --     179 : 2    1 XNOR2X1 A =   0.00  Df =2136.8   -0.4 ps  S =  47.9 ps  Cin = 31.9 ff  Cout =   0.0 ff  Cmax = 484.6 ff  G =    0  

Lmk if it's possible to improve on this.

tdene commented 1 year ago

Really quickly, what width/size are the adders you're generating?

rptii commented 1 year ago

Sorry, width=8.

tdene commented 1 year ago

Just want to update: give me a few days, this work-week is busy for me. I'll look at it when I have a break.

Perhaps this may be relevant though: someone found a bug that I wasn't running into personally. I pushed their fix upstream just now. pip install pptrees --upgrade and see if it changes anything. I doubt it will, but it's worth a shot.

rptii commented 1 year ago

Yeah, no luck. But the critical path changes in the new version. I've written the adder by hand for width=16 and even so I can't reach the 3 levels of AND gates reported on the paper. Is there a way to get boolean equations out instead of Verilog?