Support Equivalent Placement Sites

Proposed Behaviour

It is possible that a single logical block type could be mapped to multiple potential physical grid tiles in the FPGA.

Examples of this include

IO types associated with different locations (e.g. IO_LEFT, IO_RIGHT, IO_TOP, IO_BOTTOM)
LUTRAM (CLBL/CLBM, LAB/MLAB)

In general it should be possible to list a set of equivalent placement locations/sites for each block type.

Current Behaviour

VPR assumes that each top-level pb_type can only be placed at a placement location of exactly the same type.

Possible Solution

We should probably decouple the packing decisions (i.e. logical block types exist and can be created), from the placer's decisions (what grid tiles the logical block types can be placed in). Currently there is no distinction.

Context

This will help improve the generality of the flow. It will also fix non-intuitive/confusing behaviour such as #268, #512, #349.

Proposal from @mithro

Hi @kmurray. I have been thinking about this issue and how to proceed with its implementation.

Considerations

Currently there are three types of pb_types that have different properties (top level, intermediate and primitives). The idea would be to get to a situation for which the top level pb_type is represented as a physical tile, and this representation will be used by the placer to decide where the clustered blocks obtained from the packer have to be positioned. As far as I understand, there already is a data structure that represents a physical type which is t_type_descriptor and is used to hold information of the top level pb_types of the architecture (correct me if I'm wrong).

How to proceed

First thing I would focus on, is to substitute the idea of the top level pb_type with idea of tile. Therefore, in the XML description we will have a new top level tiles tag representing all the previous top level pb_types with all the tags and properties of a top level pb_type. IMO this will help decoupling packer and placer decisions. The packer will still use information gathered from the pb_types structure, while the placer will work at the tile level. An idea on how it will look like is the following:

<tiles>
    <tile name="CLBM" capacity="1" width="...">
            <fc ...>
            <pinlocation ...>
    </tile>
    <tile name="CLBL">
            <equivalent_tiles>
                <mode="CLBM">
            </equivalent_tiles>
            <fc ...>
            <pinlocation ...>
    </tile>
      <tile>
            <mode ... />
            ...
      </tile>
      ...
</tiles>
<complexblocklist>
      <pb_type name="CLBM">
            <input/output/>
            <pb_type>
                 ...
            </pb_type>
            <interconnect/>
      </pb_type>
      ...
</complexblocklist>
...

Later on there will be the addition of the feature for which different block types can be placed in the same physical location (e.g. CLBL into a CLBM location). This step will require a mechanism to map the various block types IO pins with the tile type they are allowed to be placed in. For instance, I have tried to produce a very first initial implementation of this feature without taking into account the different connections of a CLBL w.r.t. the CLBM and routing resulted in the following error:

vpr/src/route/rr_graph2.cpp:1319 get_rr_node_index: Assertion 'type->class_inf[ptc].type == DRIVER' failed.

As far as I understand, there already is a data structure that represents a physical type which is t_type_descriptor and is used to hold information of the top level pb_types of the architecture (correct me if I'm wrong).

Yep thats correct.

The packer will still use information gathered from the pb_types structure, while the placer will work at the tile level.

This makes a lot of sense to me.

First thing I would focus on, is to substitute the idea of the top level pb_type with idea of tile. Later on there will be the addition of the feature for which different block types can be placed in the same physical location (e.g. CLBL into a CLBM location).

Splitting this into two stages seems like a reasonable approach.

This step will require a mechanism to map the various block types IO pins with the tile type they are allowed to be placed in.

I think this is the key challenge. We'll need to think carefully how this is handled.

In particular:

What if the underlying <pb_type>'s have different ports/pins?
- Perhaps require that they are identical?
- Perhaps we just build the super-set of pins?
- Perhaps we need the flexibility to specify what the internal mapping is like? However this starts to look a lot like the <direct>/<mux>/<complete> of the packer's <interconnect> section, and duplicating it seems somewhat redundant.
How to handle equivalent pins (pin classes), which may differ between types (even if they have the same pin-outs)?

My initial tendency would be to keep it as simple as possible initially (it can then be generalized incrementally over time). That probably means:

Requiring that the top-level I/Os of the <pb_type>'s which can map to the same tile must be identical (i.e. ports, pins and equivalence are all the same).

This would handle the case of equivalent IO block locations (since they should all have the same pin-outs), and probably LAB/MLAB, CLBL/CLBM as well (if the pb_types are written appropriately).

I have tried to produce a very first initial implementation of this feature without taking into account the different connections of a CLBL w.r.t. the CLBM and routing resulted in the following error:
vpr/src/route/rr_graph2.cpp:1319 get_rr_node_index: Assertion 'type->class_inf[ptc].type == DRIVER' failed.

That looks like you are running into inconsistent pin equivalences between the different <pb_type>s. Each set of equivalent pins is treated as an equivalent class of pins. The error indicates a driver class was expected but not found.

I've taken a crack writing up potential ideas for how this could be done.

There are basically 3 approaches (described below) each of increasing difficulty.

I'd strongly suggest starting with something like (1) which is more restrictive but simpler to implement. If needed it could be generalized over time towards something like (2).

I'm very hesitant to even consider (3), as I think it adds significant complexity and blurs the line between packing and placement (and between pb_type and tile). Supporting it in general would break a number of significant assumptions made by packing and placement. Which would require substantial effort to fix (and it's not clear it would actually be any better). As a result there would need to be an extremely compelling motivation to make the degradation in code maintainability worthwhile.

Simple Equivalent Placement Sites

<!-- Simple Equivalent Placement Sites (Identical pin-out & equivalence)

    This approach requires that the pin-out of each grid tile *exactly*
    matches the pin-out of each pb_type (including pin equivalence). This
    avoids having to specify a mapping from tile to pb_type pins. (Note that
    this is not that restrictive, you just make each 'equivalent' pb_type
    have the same top-level pin-out. Some of those pins may be unused by
    some pb_types but that is OK).

    Information such as pin-locations and Fc are moved to the tile
    specification (out of the top-level pb_type).  Attributes like block
    width/height/area are also moved to the tile.

    In this formulation capacity is applied to the tile (rather than the
    sites) since this keeps the direct 1:1 mapping from tile to pb_type pins.

    A key advantage of this approach is that it should not require any
    modifications to the RR graph. In particular, since pin outs and equivalnce
    are enforced to be the same the SOURCE/SINK/IPIN/OPIN nodes match between
    the tile and any equivalent pb_types. The only change required in the
    RR graph generator would be to drive SOURCE/SINK/IPIN/OPIN creation off
    of the tile (rather than block type) description.
 -->
<tiles>
    <tile name="MLAB_tile" width="1" height="1" capacity="1" area="XXX">
        <input name="inputs" num_pins="50" equivalnce="full"/>
        <input name="clk" num_pins="2"/>
        <output name="outputs" num_pins="50"/>
        <pinlocations ...>
        <fc ...>
        <equivalent_sites> <!-- NOTE: both pb_types required to have identical pin outs which match the tile's (this should be error checked) -->
            <site pb_type="LAB"/>
            <site pb_type="MLAB"/>
        </equivalent_sites>
    </tile>

    <tile name="LAB_tile">
        <input name="inputs" num_pins="50" equivalnce="full"/>
        <input name="clk" num_pins="2"/>
        <output name="outputs" num_pins="50"/>
        <pinlocations ...>
        <fc ...>
        <equivalent_sites>
            <site pb_type="LAB"/>
        </equivalent_sites>
    </tile>

    <tile name="IOL_tile" capacity="2"> <!-- IO left -->
        <input name="inputs" num_pins="1"/>
        <input name="clk" num_pins="1"/>
        <output name="outputs" num_pins="1"/>
        <pinlocations pattern="custom">
            <loc side="right">IOL_tile.inputs IOL_tile.clk IOL_tile.outputs</loc>
        </pinlocations>
        <fc ...>
        <equivalent_sites>
            <site pb_type="IO"/>
        </equivalent_sites>
    </tile>

    <tile name="IOR_tile" capacity="2"> <!-- IO right -->
        <input name="inputs" num_pins="1"/>
        <input name="clk" num_pins="1"/>
        <output name="outputs" num_pins="1"/>
        <pinlocations pattern="custom">
            <loc side="left">IOR_tile.inputs IOR_tile.clk IOR_tile.outputs</loc>
        </pinlocations>
        <fc ...>
        <equivalent_sites>
            <site pb_type="IO"/>
        </equivalent_sites>
    </tile>
</tiles>

<complexblocklist>
    <pb_type name="IO"/>
        <input name="inputs" num_pins="1"/>
        <clock name="clk" num_pins="1"/>
        <output name="outputs" num_pins="2"/>
        ...
    </pb_type>

    <pb_type name="MLAB"/>
        <input name="inputs" num_pins="50" equivalent="full"/>
        <clock name="clk" num_pins="2"/>
        <output name="outputs" num_pins="50"/>
        ...
    </pb_type>

    <pb_type name="LAB"/>
        <input name="inputs" num_pins="50" equivalent="full"/>
        <clock name="clk" num_pins="2"/>
        <output name="outputs" num_pins="50"/>
        ...
    </pb_type>
</complexblocklist>

Flexible Equivalent Placement Sites

<!-- Flexible Equivalent Placement Sites  (Non-identical pin-out & equivalence)

This approach extend the 'Simple Equivalent Placement Sites' approach
such that the pin-outs of each grid tile do *not* need to *exactly*
matches the pin-out of each pb_type.

In particular, each site must specify how the tile pins connect to the
associated pb_type's pins. Furthermore, pin equivalence can not be
specified on tile pins, but is controlled on the pb_types.

This formulation still only allows capacity to be specified on *tiles*,
which ensures invalid architectures can not be specified (see comments on
'Complex Equivalent Placement Sites' for details on this issue).

In addition to the additional complexity of specification, this approach
also requires changes to the RR graph. Effectively the tile pin's specify
the RR Graph's IPINs and OPINs (and how the connect to wires), while the 
pb_type's specify the SOURCEs and SINKs.

We would need to build unique SOURCE/SINK nodes for each pin classes of
each site's pb_type. The <direct> specifications then become the edges
between the tiles pins (IPINs/OPINs) and the pb_types SOURCE/SINKs.

This would require updating the router to ensure it picks the courrect
SOURCE/SINK depending on which site a particular cluster was placed at.
-->
<tiles>
<tile name="MLAB_tile" width="1" height="1" capacity="1" area="XXX">
    <input name="inputs" num_pins="50"/>
    <input name="clk" num_pins="2"/>
    <output name="outputs" num_pins="50"/>
    <pinlocations ...>
    <fc ...>
    <equivalent_sites> <!-- NOTE: each site must specify it's pin mapping -->
        <site pb_type="LAB">
            <direct from="MLAB_tile.inputs" to="LAB.inputs"/> <!-- Note LAB inputs are equivalent -->
            <direct from="MLAB_tile.clk" to="LAB.clk"/>
            <direct from="LAB.outputs" to="MLAB_tile.outpus"/>
        </site>
        <site pb_type="MLAB">
            <direct from="MLAB_tile.inputs[19:0]" to="MLAB.addr"/> <!-- Note MLAB inputs are not equivalent -->
            <direct from="MLAB_tile.inputs[29:20]" to="MLAB.data_in"/>
            <direct from="MLAB_tile.clk" to="MLAB.clk"/>
            <direct from="MLAB.data_out" to="MLAB_tile.outputs[9:0]"/>
        </site>
    </equivalent_sites>
</tile>

<tile name="LAB_tile">
    <input name="inputs" num_pins="50" equivalnce="full"/>
    <input name="clk" num_pins="2"/>
    <output name="outputs" num_pins="50"/>
    <pinlocations ...>
    <fc ...>
    <equivalent_sites>
        <site pb_type="LAB">
            <direct from="MLAB_tile.inputs" to="LAB.inputs"/> <!-- Note LAB inputs are equivalent -->
            <direct from="MLAB_tile.clk" to="LAB.clk"/>
            <direct from="LAB.outputs" to="MLAB_tile.outpus"/>
        </site>
    </equivalent_sites>
</tile>

<tile name="IOL_tile" capacity="2"> <!-- IO left -->
    <input name="inputs" num_pins="1"/>
    <input name="clk" num_pins="1"/>
    <output name="outputs" num_pins="1"/>
    <pinlocations pattern="custom">
        <loc side="right">IOL_tile.inputs IOL_tile.clk IOL_tile.outputs</loc>
    </pinlocations>
    <fc ...>
    <equivalent_sites>
        <site pb_type="IO">
            <direct from="IOL_tile.inputs" to="IO.inputs"/>
            <direct from="IOL_tile.clk" to="IO.clk"/>
            <direct from="IO.outputs" to="IOL_tile.outpus"/>
        </site>
    </equivalent_sites>
</tile>

<tile name="IOR_tile" capacity="2"> <!-- IO right -->
    <input name="inputs" num_pins="1"/>
    <input name="clk" num_pins="1"/>
    <output name="outputs" num_pins="1"/>
    <pinlocations pattern="custom">
        <loc side="left">IOR_tile.inputs IOR_tile.clk IOR_tile.outputs</loc>
    </pinlocations>
    <fc ...>
    <equivalent_sites>
        <site pb_type="IO">
            <direct from="IOR_tile.inputs" to="IO.inputs"/>
            <direct from="IOR_tile.clk" to="IO.clk"/>
            <direct from="IO.outputs" to="IOR_tile.outpus"/>
        </site>
    </equivalent_sites>
</tile>
</tiles>
<complexblocklist>
<pb_type name="IO"/>
    <input name="inputs" num_pins="1"/>
    <clock name="clk" num_pins="1"/>
    <output name="outputs" num_pins="2"/>
    ...
</pb_type>

<pb_type name="MLAB"/>
    <input name="addr" num_pins="20"/>
    <input name="data_in" num_pins="10"/>
    <clock name="clk" num_pins="2"/>
    <output name="data_out" num_pins="10"/>
    ...
</pb_type>

<pb_type name="LAB"/>
    <input name="inputs" num_pins="50"/>
    <clock name="clk" num_pins="2"/>
    <output name="outputs" num_pins="50"/>
    ...
</pb_type>
</complexblocklist>

Complex Placement Sites

<!-- Complex Placement Sites  (Non-identical pin-out & equivalence, internal capacity)

This approach futher generalized 'Flexible Equilvalent Placement Sites' to
allow the placer to make top-level operation mode choices on a more general
(but still restricted) set of architectures.  In particular, we no longer
have a single set of equivalent mutually-exclusive *single-slot* sites,
but a set of mutually-exclusive modes which may have *multi-slot* sites.
('Flexible Equilvant Placement Sites' can also be viewed as the placer
making a selection between the mutually exclusive sites, but each of
those sites was indpendent of the others).

In this formulation capacity is applied as an attribute to the *site*
(rather the tile) to support multi-slot sites.

The key advantage of this approach is illustrated with the RAM_tile
below, which could allow allow the placer to choose either a 2xRAM18 or
1xRAM36 mode.

However there is a significant drawback to this approach.  The multi-slot
specification capability makes easy to describe an architecture which will
lead to impossible-to-route placements.  The comments in 'RAM_tile_invalid'
illustrate how this could be easily specified.

VPR splits placement into two stages (packing and placement) explicitly to
avoid burdening the placer with having to consider these types of detailed
constraints (which are handled by the packer).  As a result the placer
assumes that any block can be placed at any 'site' of matching type,
and that the resulting placement will be free of impossible routing
bottlenecks.
-->
<tiles>
<tile name="RAM_tile">
    <input name="inputs" num_pins="50"/>
    <input name="clk" num_pins="2"/>
    <output name="outputs" num_pins="50"/>
    <pinlocations ...>
    <fc ...>
    <mode>
        <!-- NOTE: care must be taken to ensure both sites can be used
                   completely independently, otherwise you end up with
                   dependencies between the two sites and you effectively
                   would need the placer to do clustering to produce a legal
                   solution. See 'RAM_tile_invalid' as an illustration.
                   -->
        <site pb_type="RAM18" capacity="2"> 
            <direct from="RAM_tile.inputs[24:0]" to="RAM18[0].inputs"/>
            <direct from="RAM_tile.clk[0]" to="RAM18[0].clk"/>
            <direct from="RAM18[0].outputs" to="RAM_tile.outputs[24:0]"/>

            <direct from="RAM_tile.inputs[49:25]" to="RAM18[1].inputs"/>
            <direct from="RAM_tile.clk[1]" to="RAM18[1].clk"/>
            <direct from="RAM18[1].outputs" to="RAM_tile.outputs[49:25]"/>
        </site>
    </mode>
    <mode>
        <site pb_type="RAM36" capacity="1"> 
            <direct from="RAM_tile.inputs" to="RAM36.inputs"/>
            <direct from="RAM_tile.clk[0]" to="RAM36.clk"/>
            <direct from="RAM36.outputs" to="RAM_tile.outputs"/>
        </site>
    </mode>
</tile>

<tile name="RAM_tile_invalid">
    <input name="inputs" num_pins="50"/>
    <input name="clk" num_pins="1"/>
    <output name="outputs" num_pins="50"/>
    <pinlocations ...>
    <fc ...>
    <mode>
        <!-- NOTE: This tile is *invalid* and could produce unroutable placements

                   Here, the RAM_tile has only a single clock which is
                   shared between the two RAM18's. As a result there is an 
                   additional implicit constraint to the placement: only RAM18's
                   which share the same clock can be placed together in this tile.

                   Since the placer doesn't know this, it could produce a placement
                   which violated this constraint. The result would be a routing
                   failure with the single tile-level clock input pin being congested
                   due to a routing bottleneck (since the two RAM18's required different 
                   clocks, but only a single pin connects them).

                   We would need to detect and explictly reject the specification
                   of such architectures, to avoid this. Or generalize the idea of
                   placement macros to enforce this constaint - but it is far from
                   clear whether that is actually a good idea or not!
                   -->
        <site pb_type="RAM18" capacity="2"> 
            <direct from="RAM_tile.inputs[24:0]" to="RAM18[0].inputs"/>
            <direct from="RAM_tile.clk[0]" to="RAM18[0].clk"/>
            <direct from="RAM18[0].outputs" to="RAM_tile.outputs[24:0]"/>

            <direct from="RAM_tile.inputs[49:25]" to="RAM18[1].inputs"/>
            <direct from="RAM_tile.clk[0]" to="RAM18[1].clk"/>
            <direct from="RAM18[1].outputs" to="RAM_tile.outputs[49:25]"/>
        </site>
    </mode>
    <mode>
        <site pb_type="RAM36" capacity="1"> 
            <direct from="RAM_tile.inputs" to="RAM36.inputs"/>
            <direct from="RAM_tile.clk[0]" to="RAM36.clk"/>
            <direct from="RAM36.outputs" to="RAM_tile.outputs"/>
        </site>
    </mode>
</tile>

<tile name="MLAB_tile" width="1" height="1" area="XXX"> <!-- Note: capacity no longer a tile attribute -->
    <input name="inputs" num_pins="50"/>
    <input name="clk" num_pins="2"/>
    <output name="outputs" num_pins="50"/>
    <pinlocations ...>
    <fc ...>
    <mode>
        <site pb_type="LAB">
            <direct from="MLAB_tile.inputs" to="LAB.inputs"/> <!-- Note LAB inputs are equivalent -->
            <direct from="MLAB_tile.clk" to="LAB.clk"/>
            <direct from="LAB.outputs" to="MLAB_tile.outpus"/>
        </site>
    </mode>
    <mode>
        <site pb_type="MLAB">
            <direct from="MLAB_tile.inputs[19:0]" to="MLAB.addr"/> <!-- Note MLAB inputs are not equivalent -->
            <direct from="MLAB_tile.inputs[29:20]" to="MLAB.data_in"/>
            <direct from="MLAB_tile.clk" to="MLAB.clk"/>
            <direct from="MLAB.data_out" to="MLAB_tile.outputs[9:0]"/>
        </site>
    </mode>
    </equivalent_sites>
</tile>

<tile name="LAB_tile">
    <input name="inputs" num_pins="50" equivalnce="full"/>
    <input name="clk" num_pins="2"/>
    <output name="outputs" num_pins="50"/>
    <pinlocations ...>
    <fc ...>
    <mode>
        <site pb_type="LAB">
            <direct from="MLAB_tile.inputs" to="LAB.inputs"/> <!-- Note LAB inputs are equivalent -->
            <direct from="MLAB_tile.clk" to="LAB.clk"/>
            <direct from="LAB.outputs" to="MLAB_tile.outputs"/>
        </site>
    </mode>
</tile>

<tile name="IOL_tile" capacity="2"> <!-- IO left -->
    <input name="inputs" num_pins="1"/>
    <input name="clk" num_pins="1"/>
    <output name="outputs" num_pins="1"/>
    <pinlocations pattern="custom">
        <loc side="right">IOL_tile.inputs IOL_tile.clk IOL_tile.outputs</loc>
    </pinlocations>
    <fc ...>
    <mode>
        <site pb_type="IO">
            <direct from="IOL_tile.inputs" to="IO.inputs"/>
            <direct from="IOL_tile.clk" to="IO.clk"/>
            <direct from="IO.outputs" to="IOL_tile.outpus"/>
        </site>
    </mode>
</tile>

<tile name="IOR_tile" capacity="2"> <!-- IO right -->
    <input name="inputs" num_pins="1"/>
    <input name="clk" num_pins="1"/>
    <output name="outputs" num_pins="1"/>
    <pinlocations pattern="custom">
        <loc side="left">IOR_tile.inputs IOR_tile.clk IOR_tile.outputs</loc>
    </pinlocations>
    <fc ...>
    <mode>
        <site pb_type="IO">
            <direct from="IOR_tile.inputs" to="IO.inputs"/>
            <direct from="IOR_tile.clk" to="IO.clk"/>
            <direct from="IO.outputs" to="IOR_tile.outpus"/>
        </site>
    </mode>
</tile>
</tiles>
<complexblocklist>
<pb_type name="IO"/>
    <input name="inputs" num_pins="1"/>
    <clock name="clk" num_pins="1"/>
    <output name="outputs" num_pins="2"/>
    ...
</pb_type>

<pb_type name="MLAB"/>
    <input name="addr" num_pins="20"/>
    <input name="data_in" num_pins="10"/>
    <clock name="clk" num_pins="2"/>
    <output name="data_out" num_pins="10"/>
    ...
</pb_type>

<pb_type name="LAB"/>
    <input name="inputs" num_pins="50"/>
    <clock name="clk" num_pins="2"/>
    <output name="outputs" num_pins="50"/>
    ...
</pb_type>
<pb_type name="RAM18"/>
    <input name="inputs" num_pins="25"/>
    <clock name="clk" num_pins="1"/>
    <output name="outputs" num_pins="25"/>
    ...
</pb_type>
<pb_type name="RAM36"/>
    <input name="inputs" num_pins="50"/>
    <clock name="clk" num_pins="1"/>
    <output name="outputs" num_pins="50"/>
    ...
</pb_type>
</complexblocklist>

An additional wrinkle which actually effects all 3 of the above approaches relates to differing Fc specifications at different tiles/sites.

The packer currently assumes that it can exit and re-enter the cluster through the general routing. (although there are exceptions for Fc=0 pins, e.g. cin/cout of adder chains). Since this approach would decouple the Fc specification from the pb_type there are some corner cases to consider.

For example, what the packer assumes a path through the general routing, which is true of some sites (Fc > 0) but not others (Fc = 0 on the same pin)?

Potential ways to handle this could be:

Treat such sites as not-compatible (i.e. get the user to specify them as different pb_types)
Use the most pessimistic Fc across all sites (safe, but potentially not using full flexibility of the architecture)

@kmurray Thanks for the very detailed explanation.

I have started a WIP implementation that looks similar to the first approach, which implies a strict equivalence between the tile and the pb_type pins. It differs in the way the XML is defined and consequently read in VPR, but the underlying VPR changes should not change between my first implementation and the Simple Equivalent Placement Sites idea. The WIP can be found here https://github.com/SymbiFlow/vtr-verilog-to-routing/pull/36 (if you could provide a feedback it would help me understand if the direction I took is the right one).

I need to think about the Fc problem you have referred to and see how this can be effectively implemented. As far as I understood the issue is that different tiles, even if they may be considered equivalent (MLAB and LAB), can have different Fc values for the "same" pins, which could end up in inconsistencies, right? A way to avoid inconsistencies could be to check, during placement, whether the equivalent site has a Fc compatible pin with the one of the packed pb. If not, the placer discards the equivalent site and selects another site. (Even though, a site like MLAB should have all pins compatible with the LAB to be accepted as equivalent right?)

An additional wrinkle which actually effects all 3 of the above approaches relates to differing Fc specifications at different tiles/sites.

The packer currently assumes that it can exit and re-enter the cluster through the general routing. (although there are exceptions for Fc=0 pins, e.g. cin/cout of adder chains). Since this approach would decouple the Fc specification from the pb_type there are some corner cases to consider.

In our case, I don't believe Fc matters because the interconnect <-> pin connections are constant. Even considering Fc, in our application Fc is constant between the sites. Can you think of the counter example where Fc might vary between sites? In a uniform interconnect, equivalent sites should have equivalent Fc.

I also agree that approach 1 (simplest) is best.

@acomodi For the Fc (corner case, different Fc for different equivalent tiles) I would just iterate through and use the smallest Fc of the equivalent tiles in the code in the packer that checks if a pin can reach general routing. The common case is all the Fc's are the same anyway, and this handles the corner case of them being different in a simple way, with non-CPU-critical code.

@kmurray Here there is a first implementation of the equivalent tiles: https://github.com/verilog-to-routing/vtr-verilog-to-routing/pull/559. IMO it follows approach number 1 in a way. The mapping of pins is still required a LAB and MLAB could potentially have different pin names/numbers.

I have done it in a slightly different way, which I think can be still easily changed following into your suggestion. As it is right now the XML description looks like this:

<tiles>
    <tile name="CLBM" capacity="1" width="...">
            <fc ...>
            <pinlocation ...>
    </tile>
    <tile name="CLBL">
            <equivalent_tiles>
                <mode="CLBM">
                    <direct from="CLBL.A" to="CLBM.A" num_pins="1">
                    <direct from="CLBL.D" to="CLBM.D" num_pins="1">
                    <direct from="CLBL.DX" to="CLBM.DX1" num_pins="1">
                    ...
                </mode>
            </equivalent_tiles>
            <fc ...>
            <pinlocation ...>
    </tile>
      <tile>
            <mode ... />
            ...
      </tile>
      ...
</tiles>
<complexblocklist>
      <pb_type name="CLBM">
            <inputs />
            <outputs />
            <pb_type>
                 ...
            </pb_type>
            <interconnect/>
      </pb_type>
      ...
</complexblocklist>
...

Inputs and outputs are already defined in the pb_type, I don't think it is necessary to specify them also in the tile.

What I have done is actually to invert the idea: equivalent tiles are specified only for those tiles which have equivalent ones (e.g. CLBL has CLBM) instead of specifying a super-tile which contains possible equivalent ones. I suppose in the long term your approach is the one that should be adopted and I am working to make my solution as thought by you. For now I wanted to test whether my initial approach could fit within VPR.

I have checked PR https://github.com/verilog-to-routing/vtr-verilog-to-routing/pull/559 with the SymbiFlow tests and VPR produced valid designs.

Another thing to consider is that I did not yet take into account the Fc issue.

@kmurray, Probably the first thing I could do to make my approach more similar to yours is to move all the tile equivalence information to the t_pb_type data structure (or better to another data structure that I can call t_logical_type, or something like that, so to maintain separate the concepts of top level pb_types and intermediate/primitive `pb_types). The new data_structure would look to something like this:

struct t_logical_type {
  t_pb_type *pb_type;
  std::container<t_physical_type> equivalent_physical_type;
  std::map_container<t_physical_type, std::map_container<int, int>> pin_mapping_with_equivalent_physical_types;
  std::map_container<t_physical_type, std::map_container<int, int>> inverse_pin_mapping_with_equivalent_physical_types;
  // Something else I forgot about
}

t_logical_type will have a one-to-one connection with its relative pb_type and as many pin_mappings and equivalent_tiles as the logical block is present in the tiles tags.

Then the t_type_descriptor becomes something like before, only without pb_types and other logical type-related members (its name could be changed to something like t_physical_type).

I do not yet foresee how the split of the t_type_descriptor struct affects the whole VPR, but I believe it is necessary.

Does this make sense?

@kmurray I have been working on supporting equivalent sites after https://github.com/verilog-to-routing/vtr-verilog-to-routing/pull/941. I have an issue with the Resources Utilization. After packing we have the various device utilization calculations and the problem is the fact that, if the same block can be placed in different physical locations, the device utilization calculation should change.

For instance, there are more LAB instanced then the physical locations, but LAB could be potentially placed in LABM locations, so VPR should not exit with a failure. Moreover, let's say that VPR knows that the utilization should be spread across all the possible tile locations. In this way, we could encounter the situation for which there all the LABM physical tiles are occupied by LABM instances, therefore, LAB instances cannot be placed anymore in the LABM locations.

My question is: what could be the best approach to take this issue into account after the packing stage and before the placement one?

@acomodi: I'm not sure of the specific code failure you're seeing, but here are a few thoughts. We could do a conservative check after packing: if the number of some type of clustered block than there are physical locations that can accommodate it (i.e. more LAB instances than LAB + LABM locations) then we could give an error. For more complex interactions (e.g. LAB + LABM instances > LAB + LABM physical locations) we could rely on a failed initial placement to note that we do not have enough resources (or a failure to find any physical block that is empty and that can accommodate the primitives we're trying to pack next).

I believe this is now complete with #988.

verilog-to-routing / vtr-verilog-to-routing