Generalize sub_tile in the grid so sub_tiles can have different width & height

vaughnbetz commented 1 year ago

Proposed Behaviour

We'd like to model multi-die stack FPGAs. This is needed for the crossroads project, and will likely be useful to others.

Current Behaviour

The sub_tile feature almost has what we need in order to model one block under another, but it has the restriction that all sub_tiles in a tile must be the same size. We'd like to model a large router (or large accelerator block) that is underneath multiple LABs (or maybe RAMs, etc.). Right now we can't do this -- if we make the router large (e.g. width = 3, height = 3), then any LABs in the architecture become wide and tall.

Possible Solution

I think if we make the width, height, x_offset and y_offset in the grid a function of the sub_tile instead of a tile, we can have this feature. The capacity of grid[i][j] would then give the number of sub_tiles overlapping a certain (x,y) location, and you'd go through all those sub_tiles to see where their anchor points where, what their widths and heights are etc.

We should also use this change as an opportunity to wrap some member functions around access to the sub_tiles to ask questions about them and anchor points etc. (I'm assuming we have code directly accessing this data now).

Context

Impacting exploring routers with logic over them for the Crossroads project.

Alternative proposal

We could add a z dimension to the grid, making the layers explicit. But I believe this would involve more code updates, and probably a higher burden on more code to think about / set the z-coordinate, which would always be 0 for conventional planar architectures.

vaughnbetz commented 1 year ago

@kmurray @tangxifan : FYI. Sara is going to dig into this, with an eye to discussing in a future vtr meeting.

tangxifan commented 1 year ago

@vaughnbetz Thanks for the info. Will do.

vaughnbetz commented 1 year ago

@saaramahmoudi : can you add a link to your diagrams on this? Adding @MohamedElgammal . The two main options seem to be:

add an explicit z dimension. There would be a grid[i][j].layer_cap that said how many physical tiles were stacked at that point. And then all current grid data would move to grid[i][j][z_layer].
- negative: lots of code to change. Code all has to work with multiple layers; may be slightly slower etc.
- positive: existing code should work, as we don't really change the meaning of a physical tile etc.
Move and redefine the existing capacity of sub_tiles. grid[i][j].capacity would now exist. Each subtile would know more about itself (width, height) etc.
- positive: probably less code changes. Have one capacity that says what is stacked on top of each other instead of two.
- negative: physical_tile currently groups multiple subtiles and allows their pins to be grouped and located around the composite physical_tile in various ways, in order to build an rr_graph. May be hard to keep that behaviour unchanged while re-imaginging subtile this way.

saaramahmoudi commented 1 year ago

Different diagrams for our proposed solutions to this problem are shown below. Blocks with green are the ones that we should add to the existing code.

The first diagram is a straightforward approach in which we don't need to change sub-tile and tile structures or even how the grid exists. But, we need to store sub-tile attributes multiple times (as many as LAB existed on top of the NoC)

Screenshot from 2022-11-21 11-01-26

The second diagram Vaughn mentioned in the previous comment as adding a new explicit z dimension.

Screenshot from 2022-11-21 11-03-27

The final diagram shows how we should change sub_tiles and tiles definitions and add capacity to the grid itself.

Screenshot from 2022-11-21 11-05-23

Moving sub_tiles array from physical_tile structure to the grid might also solve the problem. The following diagram shows how this solution is different from number 3.

Screenshot from 2022-11-21 14-37-56

saaramahmoudi commented 1 year ago

@vaughnbetz Updating the issue after our meeting with Kevin. Presentation is attached for @tangxifan to review and discuss later. Multi-die stack FPGAs.pdf

tangxifan commented 1 year ago

@vaughnbetz @saaramahmoudi Thanks for the inputs with detailed explanation.

I personally prefer the solution by adding an explicit z dimension, i.e., grid[i][j][z_layer]. Here are my reasons:

I believe that it is clean and easy to understand for developers. Grid layout is separated for each die of the 3D stacked FPGAs.
I suggest not to add capacity to grids, which may cause confusion between the capacity of subtiles and grids.
From the perspective of OpenFPGA, when generating netlists or bitstreams, grid[i][j][z_layer] allows me to perform a top-down walkthrough on FPGA fabric, which is more natural in coding.

Actually, I am not 100% clear to which level of 3D stacked FPGAs, that VPR would like to support. My understanding is that as an architecture exploration tool, VPR aims to support very flexible FPGA architectures. I have a few questions:

Do we allow independent/correlated routing architecture on each die? Are we going to model the routing architecture of all the dies in one RRGraph or each die may have a separated RRGraph?
For a given tile coordinate (x, y), will all the tiles on different die have the same dimension? Also, will these tiles on different dies have different pin locations?
If we apply width, height, x_offset and y_offset to subtile, it is not clear to me how to determine thhe grid/physical_tile dimension. Are we considering a sum of all the subtile dimension? Some more syntax may be required and the maths may be complicated.

These are my views based on current knowledge. I am definitely interested in the technical feature, and OpenFPGA will support it. I would like to discuss more details and open to alter my views.

vaughnbetz commented 1 year ago

Thanks @tangxifan . Your opinion matches the emerging consensus -- add an explicit z for clarity. So we'll go with that.

For routing architecture, for now we're going to leave
the routing graph alone. The stacked dice we're modeling in the immediate future still have only one layer of programmable routing that the lower dice connect to. So the placement view would be generalized and the rr-graph generator would have some additional code to hook things up with some flexibility, but after that we'd have an (x,y) routing graph. If future stacked dice have multiple programmable routing dice we could add a z dimension to the rr-graph too though.
For a given (x,y) no restrictions will be enforced on the other z coordinates; the blocks on them could have different dimensions (and that's actually an important use case we want to explore, so this code will be exercised right away). Similarly, those tiles could have different pin outs.

tangxifan commented 1 year ago

Thanks @vaughnbetz for the details. Now it is clear to me. Do we expect any changes on the arch XML when supporting the 3D-stacked FPGAs? I am thinking about where we should define the z for the grids. For example,

    <fixed_layout name="2x2" width="4" height="4" num_layers="2">
      <!--Perimeter of 'io' blocks with 'EMPTY' blocks at corners-->
      <perimeter type="io" priority="100"/>
      <corners type="EMPTY" priority="101"/>
      <!--Fill with 'clb'-->
      <fill type="clb_die0" priority="10" layer="0"/>
      <fill type="clb_die1" priority="10" layer="1"/>
    </fixed_layout>

saaramahmoudi commented 1 year ago

@tangxifan This is how we discuss to implement it on the architecture file. Layer tag is going to be optional, so we don't need to update existing architecture files, and die number is considered to be 0 if unspecified. Number of available layers is also can be calculated using die numbers attribute on the layer tag (will be 1 if left unspecified). Does it sound a right way to do it to you?

    <fixed_layout name="first_layer">
        <layer die="0">
          <!--Perimeter of 'io' blocks with 'EMPTY' blocks at corners-->
          <perimeter type="io" priority="100"/>
          <corners type="EMPTY" priority="101"/>
          <!--Fill with 'clb'-->
          <fill type="clb_die0" priority="10"/>
        </layer>
    </fixed_layout>
    <fixed_layout name="second_layer">
        <layer die="1">
          <!--Perimeter of 'io' blocks with 'EMPTY' blocks at corners-->
          <perimeter type="io" priority="100"/>
          <corners type="EMPTY" priority="101"/>
          <!--Fill with 'clb'-->
          <fill type="clb_die1" priority="10"/>
        </layer>
    </fixed_layout>

tangxifan commented 1 year ago

@saaramahmoudi Yes. This is a good one to me. Thanks!

vaughnbetz commented 1 year ago

We could make either of these work (they're pretty close in meaning). Sara, I suggest going through the proposed arch syntax (or both alternatives) in a future vtr meeting (this Thursday if you're ready already). Could just show the alternatives from this issue.

ganeshgore commented 1 year ago

Hello, based on above discussion. Is this the correct way to interpret how pins on different Z levels will be flattened in the 2D RRGraph?

vaughnbetz commented 1 year ago

Discussed in meeting today. Consensus: go with z coordinate, go with Xifan's proposed syntax, make layer attribute optional /defaul to 0 so existing archs work). For the rr-graph: current plan is not to change the rr-graph much; one layer of programmable routing still. Devices on the z=1 layer may connect to the programmable routing via connection boxes though (or they may connect via a NoC). If connecting to the programmable routing, their pins should have z=1 so we can tell what layer they are on. So likely we'll wind up putting a z-coordinate (default to 0) on the rr-graph just for annotation of the pins / drawing etc. To minimize memory bloat, we may be able to put such a z-coordinate in the flyweight (rr-indexed data) where the cost_index points to it (since we wouldn't have many z coordinates at all).

tangxifan commented 1 year ago

@vaughnbetz I thought twice. The shortcoming of my syntax is that it may not be easy for engineers to spot grids on specific layers, when the size of <layout> block grows. Imagine there are 100 lines of grid definition under a layout.

I suggest to combine mine and @saaramahmoudi 's together, as follows.

    <fixed_layout name="2-layer-stacked-fpga" height="4" width="4">
        <layer die="0">
          <!--Perimeter of 'io' blocks with 'EMPTY' blocks at corners-->
          <perimeter type="io" priority="100"/>
          <corners type="EMPTY" priority="101"/>
          <!--Fill with 'clb'-->
          <fill type="clb_die0" priority="10"/>
        </layer>
        <layer die="1">
          <!--Perimeter of 'io' blocks with 'EMPTY' blocks at corners-->
          <perimeter type="io" priority="100"/>
          <corners type="EMPTY" priority="101"/>
          <!--Fill with 'clb'-->
          <fill type="clb_die1" priority="10"/>
        </layer>
    </fixed_layout>

This will allows us to have a clear view on how each layer look like. Engineers/developers can remove or rework a layer in a straightforward way.

It does not need to update existing architecture files. If there are no <layer> defined, we create a default layer when parsing the architecture XML.

Let me know what you think. I can help @saaramahmoudi if she needs advises on developing the parser.

vaughnbetz commented 1 year ago

Sure, that is fine with me.

I think with your earlier syntax people could still make it readable by grouping all the layer0 grid entries together, then all the layer 1 entries, with a comment line in between if they liked. The syntax wouldn't force them to to that, but they could if they wished.

Forcing such an organization may be cleaner though, as you suggest. I'm OK with either.

tangxifan commented 1 year ago

@vaughnbetz Yes. I see that people can always find a way to make their architecture XML clean. But most of time, they eventually fail to do so, due to various reasons (for instance, very tight deadlines and too many revisions). Once we have many lines in a <layout> block, it is difficult for developers to spot layout_id in each line and ensure that they are correct. Therefore, I believe we should provide developers the syntax that forces them to follow a clean way.

In addition, it is also straightforward to implement the parser for the <layout>. You can re-allocate memory by counting the number of <layout> blocks under a <fixed_layout> or a <auto_layout>. Otherwise, you may have to parse all the lines and find out the number of layers, and then allocate memory.

vaughnbetz commented 1 year ago

Good points; I agree.

verilog-to-routing / vtr-verilog-to-routing