verilog-to-routing / vtr-verilog-to-routing

Verilog to Routing -- Open Source CAD Flow for FPGA Research
https://verilogtorouting.org
Other
988 stars 380 forks source link

Better comments for reserve_locally_used_opins() and make calling it optional #360

Closed vaughnbetz closed 6 years ago

vaughnbetz commented 6 years ago

This function has a behaviour that is mysterious to most -- it should be better commented. It is for architectures where you have logically equivalent outputs, but they are achieved not by having an output crossbar from the logic block, but instead by changing which LUT/BLE implements which function. If you have an input crossbar that gives the same connectivity to each LUT/BLE input, you can make this change without modifying the routing to the cluster inputs.

However, there is a corner case -- you could use outputs that are not available by assuming you can duplicate a LUT/BLE into multiple locations in the cluster (each of which could reach a different output) even though there are some LUT/BLE locations that are used solely to create functions that are routed within the cluster and never use an output pin. reserve_locally_used_opins() handles this case; it makes sure the nets leaving the cluster don't use more outputs than are available considering this case.

We should also have an architecture or command line option to turn this on or off, as some architectures (those with an output crossbar) shouldn't use it as it will be overly conservative. So there are two ways to get output pin equivalence, and only one should use this function. Other architectures without logically equivalent output pins will not be affected by whether this function is run or not.

Proposed Behaviour

  1. Better comments.
  2. Architecture or command line option to control its use.

Current Behaviour

Always on, for all blocks.

Possible Solution

Architecture file describes this as a different logical equivalence setting, per cluster level block type. And then comment it well.

Context

I've had multiple questions on this function; it's too mysterious. Latest from Mirjana Stojilovic at EPFL. Would like Kevin's thoughts on this solution.

@kmurray

vaughnbetz commented 6 years ago

Adding @mirjanastojilovic to the discussion.

kmurray commented 6 years ago

I have also found this functionality to be mysterious.

It seems like adding a more flexible logical equivalence specification in the architecture, and then using that to drive whether this feature is enabled within the router is the most flexible approach.

Perhaps we need three settings for output port equivalence:

Is that sufficient to cover all the use-cases?

vaughnbetz commented 6 years ago

Yes, I think that coves it. Thanks Kevin. We can talk Thursday about who does what but I think that's a good spec.

kmurray commented 6 years ago

I've updated the architecture parser and VPR to support the new equivalence options, and upgrade the VTR architecture files to use the new options. The options are documented here.

I have also updated the relevant code comments to be more descriptive. For example here and here.

As noted in the comments, currently VPR makes a pessimistic simplifying assumption for the instance equivalence type (which is the same as the old equivalent=true behaviour). It enforces that every OPIN which is unused (i.e. disconnected) in the netlist be reserved. The result of this is that each actual output signal will use only one output pin. While the result is legal, it is pessimistic since this effectively forbids LUT/BLE duplication (i.e. duplicating LUT to generate two output signals which can reach two different output pins), although it does still permit LUT/BLE swapping.

@vaughnbetz Any thoughts about forbidding LUT/BLE duplication? The fix would seem to be to identify the specific LUT/BLE positions used to drive internal signals and specifically reserve the corresponding OPINs. However figuring out the OPIN corresponding to a specific LUT/BLE position is likely non-trivial, since the internal block routing architecture can be arbitrarily specified (it seems like to do so would effectively be routing inside the clusters, which is a longer term fix).

Additionally we don't update the packing based on the router results meaning:

vaughnbetz commented 6 years ago

Looks great Kevin; thanks! I like the comments -- very clear. I added an example of when full equivalence is appropriate (output crossbar) to the documentation, but otherwise looks very good. I agree with the simplifying assumption that there is no BLE/LUT duplication. That doesn't gain much, and it would only gain it for certain architectures anyway (those with input crossbars that allow this kind of duplication, but also have only one output per BLE as otherwise you need to have some more nodes and code to enforce that all the signals from a BLE that gets swapped move to the same location). It's not worth the effort, so this is the right balance.