Constant and inverter "replication" during packing

rs-dhow commented 2 weeks ago

@KA7E and anyone else working on the packer.

First, I was happy to find: https://docs.verilogtorouting.org/en/latest/tutorials/arch/equivalent_sites/ We will use this feature.

Proposed Behavior

Currently, we don't implement programmable constants or inverters on special inputs (resets and enables). Instead, we make their unused values be all 1 so we can get away with routing only 1 as a constant. (And of course, LUTs don't need this.)

If we have a signal that directly drives e.g. some ENs on FFs, and through an inverter some other ENs on other FFs, we are interested in packing that SINGLE inverter into each block that wants it. This means the number of inverter atoms increases during packing. After the packing is done, if the original inverter is no longer driving anything, it would be dropped from the netlist. It would be best if the inverter comes from a .names in the netlist, not a special .subckt.

We looked into this 18 months ago but didn't figure it out. Maybe we were working on too much at the same time, and we didn't hear about someone else's solution to this. We could also expand some control-signal muxes to accept 1'b0 and 1'b1 and handle constant generators the same way as inverters above.

Current Behavior

Current behavior relies on using an explicit inverter, as well as explicit LUTs generating constants. The real cost is routing their outputs everywhere.

Possible Solution

We are wondering to what degree others have thought about replicating constant generators and inverters as described above. Doing this before the packer (in the netlist) doesn't make sense, since you don't know whether each consumption site is in the same block after packing or not.

Context

Local replication of constant generators and inverters would eliminate at least one dense wire that must route everywhere currently, and more in some cases. This wire doesn't respond well to typical congestion mitigation approaches either since it can be so dense.

Thanks.

vaughnbetz commented 1 week ago

Couldn't (and shouldn't) this be done at the end of synthesis? If the primitive being fed by a signal can handle programmable inversion, then remove the inverter.
LUTs -> remove (already done I believe) FF enables etc: -> if the FF has a programmable inverter on it, remove the inverter

I think the remaining case you're trying to solve is if you have a programmable inverter at the cluster level, but not the primitive level (e.g. maybe you can invert an enable for the entire cluster?). I think that should be doable by putting an inverter in the cluster architecture definition, with interconnect that allows it to drive the enable signal or the enable input pin to drive it directly.

Am I capturing the situation correctly?

rs-dhow commented 1 week ago

Yes, the problem is if the "perverter" is at the cluster level. Imagine a signal goes uninverted to EN on some flops and inverted to other flops (in the original design). Each cluster (CLB) supports say 4 enables, which could be any mixture of 4 enables to 2 phases of 2 enables. (UltraScale+ is similar to this.) The "phase needs" of each FF interact with the clustering decisions (it's a signal/phase counting problem, maybe related to current pin counting), which is why this can't be done in synthesis. Now, if I put the inverter in the cluster level of the arch definition, then I end up back at the replication problem I started with. Make sense? Thanks.

verilog-to-routing / vtr-verilog-to-routing