intel / rohd

The Rapid Open Hardware Development (ROHD) framework is a framework for describing and verifying hardware in the Dart programming language.
https://intel.github.io/rohd-website
BSD 3-Clause "New" or "Revised" License
374 stars 67 forks source link

Building a module with a lot of sequential logic and little combinational logic takes a long time #312

Closed chykon closed 1 year ago

chykon commented 1 year ago

Describe the bug

It takes a VERY long time to build a circuit in which combinational logic interacts with a large number of sequential ones. An hour later, the assembly was not completed.

I will provide the code to reproduce. There are two modules:

It is worth noting that in the case of using <=, you can wait until the end of the build, in contrast to the use case of the Combinational set. Most likely this can also be linked to https://github.com/intel/rohd/issues/286.

Using the debugger, you can see that the build does not hang, but is EXTREMELY SLOW while in one of the getCombinationalPaths function loops.

To Reproduce

import 'dart:math';
import 'package:rohd/rohd.dart';

enum MCUInterfaceTag { input, output }

class MCUInterface extends Interface<MCUInterfaceTag> {
  MCUInterface({this.smallMemory = false}) {
    setPorts([
      Port('clock'),
      Port('enable'),
      Port('write'),
      Port('selectByte'),
      Port('address', 16),
      Port('inputData', 16),
    ], [
      MCUInterfaceTag.input
    ]);

    setPorts([
      Port('outputData', 16),
    ], [
      MCUInterfaceTag.output
    ]);
  }

  Logic get clock => port('clock');
  Logic get enable => port('enable');
  Logic get write => port('write');
  Logic get selectByte => port('selectByte');
  Logic get address => port('address');
  Logic get inputData => port('inputData');

  Logic get outputData => port('outputData');

  final bool smallMemory;
}

class MemoryControllerUnit extends Module {
  MemoryControllerUnit(MCUInterface intf) {
    this.intf = MCUInterface(smallMemory: intf.smallMemory)
      ..connectIO(this, intf,
          inputTags: {MCUInterfaceTag.input},
          outputTags: {MCUInterfaceTag.output});

    _buildLogic();
  }

  void _buildLogic() {
    memory = <Logic>[];
    for (var i = 0;
        i < (intf.smallMemory ? 256 : pow(2, intf.address.width));
        ++i) {
      memory.add(Logic(name: 'memoryElement$i', width: 8));
    }

    final writeByteCaseItems = <CaseItem>[];
    final readByteCaseItems = <CaseItem>[];
    final writeHalfwordCaseItems = <CaseItem>[];
    final readHalfwordCaseItems = <CaseItem>[];
    for (var i = 0; i < memory.length; ++i) {
      writeByteCaseItems.add(CaseItem(Const(i, width: intf.address.width),
          [memory[i] < intf.inputData.slice(7, 0)]));
      readByteCaseItems.add(CaseItem(Const(i, width: intf.address.width),
          [intf.outputData < intf.outputData.withSet(0, memory[i])]));

      if (i.isEven) {
        writeHalfwordCaseItems
            .add(CaseItem(Const(i, width: intf.address.width), [
          memory[i] < intf.inputData.slice(7, 0),
          memory[i + 1] < intf.inputData.slice(15, 8)
        ]));
        readHalfwordCaseItems
            .add(CaseItem(Const(i, width: intf.address.width), [
          intf.outputData < [memory[i + 1], memory[i]].swizzle()
        ]));
      }
    }

    Sequential(intf.clock, [
      If(intf.enable, then: [
        If(intf.write, then: [
          If(intf.selectByte,
              then: [Case(intf.address, writeByteCaseItems)],
              orElse: [Case(intf.address, writeHalfwordCaseItems)])
        ], orElse: [
          If(intf.selectByte,
              then: [Case(intf.address, readByteCaseItems)],
              orElse: [Case(intf.address, readHalfwordCaseItems)])
        ])
      ])
    ]);
  }

  late final MCUInterface intf;
  late final List<Logic> memory;
}

class CombinationalWrapper extends Module {
  CombinationalWrapper(MCUInterface intf) {
    this.intf = MCUInterface()
      ..connectIO(this, intf,
          inputTags: {MCUInterfaceTag.input},
          outputTags: {MCUInterfaceTag.output});

    _buildLogic();
  }

  void _buildLogic() {
    final mcu = MCUInterface();
    MemoryControllerUnit(mcu);

    // completed relatively quickly
    /*
    mcu.clock <= intf.clock;
    mcu.enable <= intf.enable;
    mcu.write <= intf.write;
    mcu.selectByte <= intf.selectByte;
    mcu.address <= intf.address;
    mcu.inputData <= intf.inputData;
    intf.outputData <= mcu.outputData;
    */

    // build takes too long
    /*
    Combinational([mcu.clock < intf.clock]);
    Combinational([mcu.enable < intf.enable]);
    Combinational([mcu.write < intf.write]);
    Combinational([mcu.selectByte < intf.selectByte]);
    Combinational([mcu.address < intf.address]);
    Combinational([mcu.inputData < intf.inputData]);
    Combinational([intf.outputData < mcu.outputData]);
    */

    // build takes too long
    Combinational([
      mcu.clock < intf.clock,
      mcu.enable < intf.enable,
      mcu.write < intf.write,
      mcu.selectByte < intf.selectByte,
      mcu.address < intf.address,
      mcu.inputData < intf.inputData,
      intf.outputData < mcu.outputData,
    ]);
  }

  late final MCUInterface intf;
}

void main() async {
  print(DateTime.now());

  final combinationalWrapper = CombinationalWrapper(MCUInterface());

  print(DateTime.now());

  await combinationalWrapper.build();

  print(DateTime.now());
}

Expected behavior

Build does not take long.

Actual behavior

Build takes a long time.

Additional: Dart SDK info

Dart SDK version: 2.19.3 (stable) (Tue Feb 28 15:52:19 2023 +0000) on "linux_x64"

Additional: pubspec.yaml

No response

Additional: Context

ROHD v0.4.2

mkorbel1 commented 1 year ago

Thank you for reporting this and providing a great example for reproducing!

I'm rethinking the overall rules around Combinational, especially around https://github.com/intel/rohd/issues/290 and sensitivity list calculations, which it sounds like may be responsible for the slow build time you're experiencing. I think we should be able to dramatically accelerate this to nearly the speed of a simple assignment.

I think this type of thing is also a great thing to throw into the ROHD benchmarks to help protect this type of performance.

mkorbel1 commented 1 year ago

Slightly modified your code into a benchmark and confirmed that the ongoing refactor of Combinational has fixed the performance issues seen in this bug.

Note that if you're trying to generate 64KiB of memory, you might want to lean for a sparse memory model using a Map for overall performance unless you really plan to synthesize the memory like that.

mkorbel1 commented 1 year ago

See #344 which includes the fix and benchmark