Strange input signal behavior

chykon commented 2 years ago

Describe the bug

The incoming signal behaves incorrectly in a certain case. Adding a useless assignment operator solves the problem.

To Reproduce

Steps to reproduce the behavior:

Get project
Go to file and comment highlighted line
Run another file

Expected behavior

The "codepoint" input does not require the use of a useless assignment operator to work correctly.

Actual behavior

The "codepoint" input does not work correctly.

Additional details

Dart SDK version: 2.18.0 (stable) (Fri Aug 26 10:22:54 2022 +0000) on "linux_x64"

name: utf8encoder_hw
version: 0.1.0-prototype.1
description: A hardware UTF-8 encoder written in Dart using the ROHD framework.
repository: https://github.com/chykon/utf8encoder-hw
issue_tracker: https://github.com/chykon/utf8encoder-hw/issues

environment:
  sdk: '>=2.18.0 <3.0.0'

dependencies:
  rohd: ^0.3.2

dev_dependencies:
  test: ^1.21.5
  very_good_analysis: ^3.0.1

Additional context To me, this is some weird issue that occurs with code point U+2020 but does not occur with U+2019 and U+2021. I also want to note that I encountered a problem when testing using this file. Before the appearance of the U+2020 code point, there are many other three-byte sequences in the file, but there are no problems with them.

chykon commented 2 years ago

Made a minimal example using one file.

I also stumbled upon the sudden appearance of unknown bits. There are comments in the code regarding this. Maybe these problems are somehow related? Or should I open a separate issue?

import 'dart:convert';
import 'dart:io';
import 'package:rohd/rohd.dart';

class UTF8Encoder extends Module {
  UTF8Encoder(Logic codepoint) {
    codepoint = addInput('codepoint', codepoint, width: 21);
    final bytes = addOutput('bytes', width: 32);

    final count = Logic(name: 'count', width: 2);
    final offset = Logic(name: 'offset', width: 8);
    final byte = Logic(name: 'byte', width: 8);

    final debug1 = Logic(name: 'debug_1', width: 21);

    Combinational([
      count < 0,

      If((codepoint >= 0) & codepoint.lte(0x7F), then: [
        bytes < codepoint.zeroExtend(32)
      ], orElse: [
        If((codepoint >= 0x800) & codepoint.lte(0xFFFF), then:[
          count < 2,
          offset < 0xE0
        ])
      ]),

      If(~count.eq(0), then: [
        byte < ((codepoint >>> (Const(6, width: 5) * count.zeroExtend(5))) +
                offset.zeroExtend(21)).slice(7, 0),
        // NOTE: Codepoint doesn't work correctly without this.
        debug1 < codepoint,
        // An attempt to directly substitute "byte" into "bytes" in this case
        // leads to the error "Cannot convert invalid LogicValue to int:
        // 32'bxxxxxxxx101000001000000011100010", but is it possible for
        // unknown bits to appear if "zeroExtend" is executed?
        /*
        bytes < ((codepoint >>> (Const(6, width: 5) * count.zeroExtend(5))) +
                 offset.zeroExtend(21)).slice(7, 0).zeroExtend(32),
        */
        bytes < byte.zeroExtend(32),
        bytes < bytes.withSet(8, Const(0x80, width: 8) |
                 ((codepoint >>> (Const(6, width: 4) * (count - 1).zeroExtend(4)))
                   .slice(7, 0) & Const(0x3F, width: 8))),
        count < count - 1,
        If(~count.eq(0), then: [
          bytes < bytes.withSet(16, Const(0x80, width: 8) |
                   ((codepoint >>> (Const(6, width: 3) * (count - 1).zeroExtend(3)))
                     .slice(7, 0) & Const(0x3F, width: 8))),
          count < count - 1,
          If(~count.eq(0), then: [
            bytes < bytes.withSet(24, Const(0x80, width: 8) |
                     (codepoint.slice(7, 0) & Const(0x3F, width: 8))),
          ])
        ])
      ])
    ]);
  }

  Logic get bytes => output('bytes');
}

Future<void> main() async {
  // Preparing inputs.
  final codepoint = Logic(width: 21);

  // Create and build module instance.
  final utf8encoder = UTF8Encoder(codepoint);
  await utf8encoder.build();

  // Generate SystemVerilog code.
  File('code.sv').writeAsStringSync(utf8encoder.generateSynth());

  // Prepare WaveDumper for simulation tracking.
  WaveDumper(utf8encoder);

  // Prepare input and output.
  final codepoints = '†† †† † †'.runes;
  final utf8Bytes = <int>[];

  // Run simulation.
  SimpleClockGenerator(10);
  await Simulator.tick();
  await Simulator.tick();
  for (final inputCodepoint in codepoints) {
    codepoint.inject(inputCodepoint);
    await Simulator.tick();
    await Simulator.tick();
    await Simulator.tick();
    for (var i = 0; i < 4; ++i) {
      final tempByte = (utf8encoder.bytes.value.toInt() >> (8 * i)) & 0xFF;
      if ((tempByte != 0) || (i == 0)) {
        utf8Bytes.add(tempByte);
      }
    }
  }
  codepoint.inject(0);
  await Simulator.tick();
  await Simulator.tick();

  // Print result.
  stdout.writeln(utf8.decode(utf8Bytes));
}

mkorbel1 commented 2 years ago

I'm able to reproduce a failure in your example code, but I'm still trying to figure out what the failure means since I'm not that familiar with the details of UTF8 encoding. It is certainly strange that some extraneous conditional assignments have an effect on the behavior, so I expect something suspicious happening within ROHD here, so thank you for filing this!

As a side note, if you use put instead of inject, you could ditch all the ticks (of which there are more than necessary anyways), since your logic is purely combinational -- the Simulator doesn't need to be involved. Also, the clock generator is not doing anything (since the clk is not attached to anything).

chykon commented 2 years ago

Further shortening of the code:

import 'dart:io';
import 'package:rohd/rohd.dart';

class ExampleModule extends Module {
  ExampleModule(Logic codepoint) {
    codepoint = addInput('codepoint', codepoint, width: 21);
    final bytes = addOutput('bytes', width: 32);
    final count = Logic(name: 'count', width: 2);

    Combinational([
      If(codepoint.eq(0x2020), then:[
        count < 2,
        bytes < ((codepoint >>> (Const(6, width: 5) * count.zeroExtend(5))) +
                 Const(0xE0, width: 21)).slice(7, 0).zeroExtend(32),
        count < count - 2,
      ]),
    ]);
  }

  Logic get bytes => output('bytes');
}

Future<void> main() async {
  final codepoint = Logic(width: 21);
  final exampleModule = ExampleModule(codepoint);
  await exampleModule.build();
  File('code.sv').writeAsStringSync(exampleModule.generateSynth());
  final codepoints = '†† †† † † q†† †'.runes;
  for (final inputCodepoint in codepoints) {
    codepoint.put(inputCodepoint);
    print(exampleModule.bytes.value);
  }
}

I don't know how to reduce it yet. When count < count - 2 is removed, the error disappears and everything works, also everything starts working when you try to substitute Const(2, width: 5) instead of count.zeroExtend(5).

Tried to discard all the details regarding UTF-8 encoding. Just watch the console output.

32'bxxxxxxxxxxxxxxxxxxxxxxxx11100010
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100010
32'bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100000
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100000
32'bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100000
32'bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100000
32'bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
32'bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100010
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100010
32'bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100000

Ignore completely unknown lines - these are spaces and the character q. The logic for handling them has been dropped, but without them the problem is not reproducible. Up to the first whitespace character, the first bytes are well-formed, which is E2, but then the output becomes E0. Also note that after the q character and up to the first whitespace character, everything also works fine.

My guess is that the codepoint value stops updating. If we shift the code point U+2020 (†) 12 bits to the right, we get 2, to which we then add the offset E0. But in the case of the code point U+0020 (space), after the shift we get 0, and after adding the offset we get the offset itself.

Thanks for the advice! But in fact, I used inject with tick to generate the timing diagram, and SimpleClockGenerator had to be used, because otherwise the message No more to tick. is sent to the console for every tick. However, I took your suggestions and got a more compact version of the code above.

chykon commented 2 years ago

I also seem to have missed the fact that the 24 most significant bits are unknown even though the zeroExtend operation is in progress.

mkorbel1 commented 2 years ago

Thank you for the shortened example, it's very helpful towards debugging this!

mkorbel1 commented 2 years ago

I think I've root caused a collection of issues in Combinational sensitivity lists that causes the behavior you're seeing, as well as some unexpected intermediate signal contention and incomplete swizzle execution. I've written a small suite of tests around your examples and I think I have things fixed up now. I'm cleaning up my code and will create a pull request with proposed fixes soon.

mkorbel1 commented 2 years ago

I've created a PR that should fix this issue: https://github.com/intel/rohd/pull/166

You can try it out and let me know if you see any issues still. I plan to merge it in soon.

Thanks, again, for filing this @chykon! It was an interesting and fruitful debug.

chykon commented 2 years ago

Checked - everything works. Great job!

intel / rohd