verilog-to-routing / vtr-verilog-to-routing

Verilog to Routing -- Open Source CAD Flow for FPGA Research
https://verilogtorouting.org
Other
1.01k stars 391 forks source link

VPR architecture description: BLE with two ouputs (LUT output and Flip-flop output) #233

Closed pgaraccio closed 7 years ago

pgaraccio commented 7 years ago

I am trying to describe an architecture. In my description, the logic block consists of N Basic Logic Elements (BLEs). But with respect to the classic architecture, the BLE does not use a multiplexer to select the BLE output. The LUT output and the flip-flop output are both BLE outputs. Classic BLE: classic_ble Wanted BLE: ble

To describe this BLE behavior, i use a block structure equivalent to the following block structure:

<pb_type name="clb">
  <input name="I" num_pins="22" equivalent="true"/>
  <output name="O" num_pins="20
[archive.zip](https://github.com/verilog-to-routing/vtr-verilog-to-routing/files/1275169/archive.zip)

" equivalent="true"/>
  <clock name="clk" equivalent="false"/>

  <pb_type name="ble" num_pb="10">
    <input name="in" num_pins="4"/>
    <output name="out" num_pins="2"/>
    <clock name="clk"/>

    <pb_type name="lut_4" blif_model=".names" num_pb="1" class="lut">
      <input name="in" num_pins="4" port_class="lut_in"/>
      <output name="out" num_pins="1" port_class="lut_out"/>
    </pb_type>
    <pb_type name="ff" blif_model=".latch" num_pb="1" class="flipflop">
      <input name="D" num_pins="1" port_class="D"/>
      <output name="Q" num_pins="1" port_class="Q"/>
      <clock name="clk" port_class="clock"/>
    </pb_type>

    <interconnect>
      <direct input="ble.clk" output="ff.clk"/>
      <direct input="ble.in" output="lut_4.in"/>
      <direct input="lut_4.out" output="ff.D"/>
      <direct input="lut_4.out" output="ble.out[0]"/>
      <direct input="ff.Q"         output="ble.out[1]"/>
    </interconnect>
  </pb_type>

  <interconnect>
    <complete input="{clb.I ble[9:0].out}" output="ble[9:0].in"/>
    <complete input="clb.clk" output="ble[9:0].clk"/>
    <direct input="ble[9:0].out" output="clb.O"/>
  </interconnect>

  <!-- Describe complex block relation with FPGA -->

  <fc_in type="frac">0.150000</fc_in>
  <fc_out type="frac">0.125000</fc_out>

  <pinlocations pattern="spread"/>
  <gridlocations>
    <loc type="fill" priority="1"/>
  </gridlocations>
</pb_type>

Expected Behaviour

To check this new architecture, i implemented a test case in verilog to highlight the re-use of the combinatorial path of the BLE (LUT output). Indeed, when the design use both registered and combinational version of a signal, the classic implementation leads to consume 3 BLEs. If BLE outputs comb AND reg path, it allows to drive both version to next BLE inside CLB, so only 2 BLEs are needed.

Current Behaviour

I get the same resource usage for the two descriptions. While I expect a decrease in the number of BLE used. Moreover during the VPR routing phase, i get many messages of this kinds: Routing net nx12375z1 is impossible

I attached an archive containing xml blif and vpr_stdout files Best regards

Pierre Garaccio

pgaraccio commented 7 years ago

archive.zip

kmurray commented 7 years ago

Measuring BLE usage How are you measuring the BLE usage? VPR only reports the top-level block usage and not the sub-block usage. To see the sub-block (e.g. BLE) usage you would need to inspect the .net file.

For a small architecture change like this it is possible that the total number of top-level blocks (e.g. CLBs) will not change.

Log Messages To clarify, the log messages of the form:

Routing net nx12375z1 is impossible

are actually generated during packing. The packer routes the internal connections of a proposed cluster to ensure it is legal.

pgaraccio commented 7 years ago

Measuring BLE usage Indeed, I expected that the expected decrease in the number of BLE impacts the number of CLB. But when I observed that the new architecture had no impact on the number of CLB, I inspected the .net file to determine the BLE usage. And I saw that no BLE used these 2 outputs in a same time (The used BLE output was either the LUT output or the flip-flop output). I concluded that we were in the same conditions as a classical architecture (use of a multiplexer to select the BLE output).

Log Messages What are the implications of such message? Does this message imply that the architecture is not correctly defined?

kmurray commented 7 years ago

BLE Usage It will likely depend on the benchmark being evaluated. Do you know how prevalent the usage of both the combinational and registered version of a signal is in the benchmark circuit(s) of interest? I suspect it is relatively rare, and as a result would have a small impact which may not change the number of CLBs.

I tested with some simple micro benchmarks (attached) which work with the latest master version of VPR. If you compare mux/reg_and_comb/reg_and_comb.net with direct/reg_and_comb/reg_and_comb.net you will see that the mux version uses two BLE's (10 and 11), while the direct version uses a single BLE (using both BLE outputs). So the packer can make use of this type of architecture.

However, since the packer is a heuristic (i.e. non-optimal) algorithm which tries to balance multiple objectives (wirelength, resource usage, legality, timing) it is possible the packer does exploit it in larger more realistic circuits (even if the pattern exists in the benchmarks).

Log Messages The unroutable messages indicate that there are some assignments of primitives to locations in the CLB which can not be connected to other primitives in the CLB. This will slow the packer down, since speculative packing (assume everything in a cluster is routable and only route once at the end to verify) will fail and it will fall back to a slower mode where it attempts routing every time a primitive is added to the CLB.

Whether or not this implies the architecture is correctly defined will depend on if this characteristic accurately reflects the architecture you are trying to model.

pgaraccio commented 7 years ago

your attached is empty Could you send it me again?

kmurray commented 7 years ago

Here is the full attachment