Closed chykon closed 2 years ago
Made a minimal example using one file.
I also stumbled upon the sudden appearance of unknown bits. There are comments in the code regarding this. Maybe these problems are somehow related? Or should I open a separate issue?
import 'dart:convert';
import 'dart:io';
import 'package:rohd/rohd.dart';
class UTF8Encoder extends Module {
UTF8Encoder(Logic codepoint) {
codepoint = addInput('codepoint', codepoint, width: 21);
final bytes = addOutput('bytes', width: 32);
final count = Logic(name: 'count', width: 2);
final offset = Logic(name: 'offset', width: 8);
final byte = Logic(name: 'byte', width: 8);
final debug1 = Logic(name: 'debug_1', width: 21);
Combinational([
count < 0,
If((codepoint >= 0) & codepoint.lte(0x7F), then: [
bytes < codepoint.zeroExtend(32)
], orElse: [
If((codepoint >= 0x800) & codepoint.lte(0xFFFF), then:[
count < 2,
offset < 0xE0
])
]),
If(~count.eq(0), then: [
byte < ((codepoint >>> (Const(6, width: 5) * count.zeroExtend(5))) +
offset.zeroExtend(21)).slice(7, 0),
// NOTE: Codepoint doesn't work correctly without this.
debug1 < codepoint,
// An attempt to directly substitute "byte" into "bytes" in this case
// leads to the error "Cannot convert invalid LogicValue to int:
// 32'bxxxxxxxx101000001000000011100010", but is it possible for
// unknown bits to appear if "zeroExtend" is executed?
/*
bytes < ((codepoint >>> (Const(6, width: 5) * count.zeroExtend(5))) +
offset.zeroExtend(21)).slice(7, 0).zeroExtend(32),
*/
bytes < byte.zeroExtend(32),
bytes < bytes.withSet(8, Const(0x80, width: 8) |
((codepoint >>> (Const(6, width: 4) * (count - 1).zeroExtend(4)))
.slice(7, 0) & Const(0x3F, width: 8))),
count < count - 1,
If(~count.eq(0), then: [
bytes < bytes.withSet(16, Const(0x80, width: 8) |
((codepoint >>> (Const(6, width: 3) * (count - 1).zeroExtend(3)))
.slice(7, 0) & Const(0x3F, width: 8))),
count < count - 1,
If(~count.eq(0), then: [
bytes < bytes.withSet(24, Const(0x80, width: 8) |
(codepoint.slice(7, 0) & Const(0x3F, width: 8))),
])
])
])
]);
}
Logic get bytes => output('bytes');
}
Future<void> main() async {
// Preparing inputs.
final codepoint = Logic(width: 21);
// Create and build module instance.
final utf8encoder = UTF8Encoder(codepoint);
await utf8encoder.build();
// Generate SystemVerilog code.
File('code.sv').writeAsStringSync(utf8encoder.generateSynth());
// Prepare WaveDumper for simulation tracking.
WaveDumper(utf8encoder);
// Prepare input and output.
final codepoints = '†† †† † †'.runes;
final utf8Bytes = <int>[];
// Run simulation.
SimpleClockGenerator(10);
await Simulator.tick();
await Simulator.tick();
for (final inputCodepoint in codepoints) {
codepoint.inject(inputCodepoint);
await Simulator.tick();
await Simulator.tick();
await Simulator.tick();
for (var i = 0; i < 4; ++i) {
final tempByte = (utf8encoder.bytes.value.toInt() >> (8 * i)) & 0xFF;
if ((tempByte != 0) || (i == 0)) {
utf8Bytes.add(tempByte);
}
}
}
codepoint.inject(0);
await Simulator.tick();
await Simulator.tick();
// Print result.
stdout.writeln(utf8.decode(utf8Bytes));
}
I'm able to reproduce a failure in your example code, but I'm still trying to figure out what the failure means since I'm not that familiar with the details of UTF8 encoding. It is certainly strange that some extraneous conditional assignments have an effect on the behavior, so I expect something suspicious happening within ROHD here, so thank you for filing this!
As a side note, if you use put
instead of inject
, you could ditch all the tick
s (of which there are more than necessary anyways), since your logic is purely combinational -- the Simulator
doesn't need to be involved. Also, the clock generator is not doing anything (since the clk is not attached to anything).
Further shortening of the code:
import 'dart:io';
import 'package:rohd/rohd.dart';
class ExampleModule extends Module {
ExampleModule(Logic codepoint) {
codepoint = addInput('codepoint', codepoint, width: 21);
final bytes = addOutput('bytes', width: 32);
final count = Logic(name: 'count', width: 2);
Combinational([
If(codepoint.eq(0x2020), then:[
count < 2,
bytes < ((codepoint >>> (Const(6, width: 5) * count.zeroExtend(5))) +
Const(0xE0, width: 21)).slice(7, 0).zeroExtend(32),
count < count - 2,
]),
]);
}
Logic get bytes => output('bytes');
}
Future<void> main() async {
final codepoint = Logic(width: 21);
final exampleModule = ExampleModule(codepoint);
await exampleModule.build();
File('code.sv').writeAsStringSync(exampleModule.generateSynth());
final codepoints = '†† †† † † q†† †'.runes;
for (final inputCodepoint in codepoints) {
codepoint.put(inputCodepoint);
print(exampleModule.bytes.value);
}
}
I don't know how to reduce it yet. When count < count - 2
is removed, the error disappears and everything works, also everything starts working when you try to substitute Const(2, width: 5)
instead of count.zeroExtend(5)
.
Tried to discard all the details regarding UTF-8 encoding. Just watch the console output.
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100010
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100010
32'bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100000
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100000
32'bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100000
32'bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100000
32'bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
32'bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100010
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100010
32'bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100000
Ignore completely unknown lines - these are spaces
and the character q
. The logic for handling them has been dropped, but without them the problem is not reproducible. Up to the first whitespace character, the first bytes are well-formed, which is E2
, but then the output becomes E0
. Also note that after the q
character and up to the first whitespace character, everything also works fine.
My guess is that the codepoint
value stops updating. If we shift the code point U+2020
(†) 12 bits to the right, we get 2
, to which we then add the offset E0
. But in the case of the code point U+0020
(space), after the shift we get 0
, and after adding the offset we get the offset itself.
Thanks for the advice! But in fact, I used inject
with tick
to generate the timing diagram, and SimpleClockGenerator
had to be used, because otherwise the message No more to tick.
is sent to the console for every tick
. However, I took your suggestions and got a more compact version of the code above.
I also seem to have missed the fact that the 24 most significant bits are unknown even though the zeroExtend
operation is in progress.
Thank you for the shortened example, it's very helpful towards debugging this!
I think I've root caused a collection of issues in Combinational
sensitivity lists that causes the behavior you're seeing, as well as some unexpected intermediate signal contention and incomplete swizzle
execution. I've written a small suite of tests around your examples and I think I have things fixed up now. I'm cleaning up my code and will create a pull request with proposed fixes soon.
I've created a PR that should fix this issue: https://github.com/intel/rohd/pull/166
You can try it out and let me know if you see any issues still. I plan to merge it in soon.
Thanks, again, for filing this @chykon! It was an interesting and fruitful debug.
Checked - everything works. Great job!
Describe the bug
The incoming signal behaves incorrectly in a certain case. Adding a useless assignment operator solves the problem.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The "codepoint" input does not require the use of a useless assignment operator to work correctly.
Actual behavior
The "codepoint" input does not work correctly.
Additional details
Additional context To me, this is some weird issue that occurs with code point U+2020 but does not occur with U+2019 and U+2021. I also want to note that I encountered a problem when testing using this file. Before the appearance of the U+2020 code point, there are many other three-byte sequences in the file, but there are no problems with them.