llvm / circt

Circuit IR Compilers and Tools
https://circt.org
Other
1.63k stars 283 forks source link

[Handshake][IntegrationTests] Using `--handshake-insert-buffers=strategy=all` on IR with sequential buffer produces unexpected output. #3180

Closed Dinistro closed 2 years ago

Dinistro commented 2 years ago

I'm playing around with the CIRCT integration tests. While doing so, I discovered some very weird behaviour related to buffers with more than one initial value.

I'm lowering the following to Verilog (through FIRRTL) and run a simulation with a driver that looks as integration_test/Dialect/Handshake/driver.sv, except that it expects a i1 output. I'm using Vivado to run the simulation.

module {
  handshake.func @top(%ctrl: none) -> (i1, none) attributes {argNames = ["inCtrl"], resNames = ["out0", "outCtrl"]} {
    // Using this causes the result to be 0/false
    %c20 = buffer [2] seq %c20 {initValues = [20 : i64, 20 : i64]} : i64 

    // Using one of the following results in the expected 1/true
    //%c20 = buffer [1] seq %c20 {initValues = [20 : i64]} : i64
    //%c20 = constant %ctrl {value = 20 : i64} : i64

    %c10 = constant %ctrl {value = 10 : i64} : i64
    %cond = arith.cmpi ne, %c20, %c10 : i64
    return %cond, %ctrl : i1, none
  }
}

As you see in the inline comments, this produces faulty results. The code can be found in the following branch: dinistro/handshake-buffer-problem

It turns out that removing the --handshake-insert-buffers=strategy=all flag fixes the issue. Does this pass have assumptions that my IR does not satisfy or is this a bug in the pass?

mortbopet commented 2 years ago

This seems to be related to the --handshake-insert-buffers=strategy=all flag passed to circt-opt. When i remove it, the output is as expected.

Interesting!... I'm just updating circt-hls so I can take it through cosimulation and get a better view of what's going on here...

mortbopet commented 2 years ago

image And there's our problem right away - the handshake_buffer_in_ui64_out_ui64_2slots_seq module is being used for all 32-bit buffers with 2 slots, and thus aliasing occurs between buffer uses where buffers have initial values, and those which have not.

This has presumably not yet been hit because all use of initial buffer values (i suspect I've been the sole user prior to you), has been with single-slot buffers, hence the aliasing with the default 2-slot buffers of strategy=all didn't occur.

The simple fix to this problem would be to include initial values into the name of a buffer module (i.e. initial values should be part of what uniques a buffer).

Dinistro commented 2 years ago

Thanks for investigating this issue. I definitely have to look into the cosimulation part as it might help me hunt down bugs I will encounter.

The suggestion you make seems reasonable, as there are no integration tests using buffers with more than one value and the StandardToHandshake pass does not produce any of them either. I'm going to work on a fix right away.

mortbopet commented 2 years ago

There's some info on how to use hlstool here. Cosimulation is fairly geared towards HLS use (and relies on Polygeist) but should still allow you to write tests against the Handshake-level IR that you generate.

For this specific case, i ended up with a setup as the follows (all files in the same directory).

a tst_top.c file:

int top();
int main() { return top(); }

A top_cf.mlir file: (necessary for having a software interface for the simulator).

module {
  func.func private @top() -> (i1)
}

A top.mlir file:

module {
  handshake.func @top(%arg0: none, ...) -> (i1, none) attributes {argNames = ["inCtrl"], resNames = ["out0", "outCtrl"]} {
    %0:2 = fork [2] %arg0 : none
    %1 = buffer [2] seq %2#1 {initValues = [20, 20]} : i64
    %2:2 = fork [2] %1 : i64
    %3 = constant %0#1 {value = 10 : i64} : i64
    %4 = arith.cmpi ne, %2#0, %3 : i64
    return %4, %0#0 : i1, none
  }
}

and running:

// Lower the kernel to verilog, lower testbench to LLVM, and simulate.
hlstool --rebuild  --kernel_name top --tb_file tst_top.c --kernel_file top_handshake.mlir dynamic-polygeist --hs_kernel --run_sim
// Run hsdbg; this will automatically run it using the vcd file generated from the above simulator run
hlstool  --checkpoint dynamic-polygeist --hsdbg
Dinistro commented 2 years ago

Thanks for the example, this will be very helpful to set things up.

Dinistro commented 2 years ago

Weird, the approach you mentioned works nicely for the case I have shown here, but it breaks down for the most simple case of a buffer with initial values:

handshake.func @top(%ctrl: none) -> (i64, none) attributes {argNames = ["inCtrl"], resNames = ["out0", "outCtrl"]} {
  %c42 = buffer [1] seq %c42 {initValues = [42 : i64]} : i64
  return %c42, %ctrl : i64, none
}

This always returns 0.

I'll dive deeper, but until now I'm a bit clueless.

Note: I made sure that the correct driver is used, see the last commit on the previously linked branch.

Dinistro commented 2 years ago

Well, the problem is an inserted buffer that seems to yield zero to the output.

Here the above file after fork and buffer insertions:

  handshake.func @top(%arg0: none, ...) -> (i64, none) attributes {argNames = ["inCtrl"], resNames = ["out0", "outCtrl"]} {
    %0 = buffer [2] seq %arg0 : none
    %1 = buffer [1] seq %2#1 {initValues = [42]} : i64
    %2:2 = fork [2] %1 : i64
    %3 = buffer [2] seq %2#0 : i64
    return %3, %0 : i64, none
  }

Shouldn't these be FIFOs instead of sequential buffers?

mortbopet commented 2 years ago

Shouldn't these be FIFOs instead of sequential buffers?

Assuming that we have the lowerings correct, this again will not affect the semantics of the program. FIFOs are just an optimization to skip empty buffer slots.

Are you also inspecting the VCD trace in debugging this? It should contain the answers we're looking for (albeit in a difficult to read format, which is where hsdbg comes to the rescue!).

Dinistro commented 2 years ago

Assuming that we have the lowerings correct, this again will not affect the semantics of the program. FIFOs are just an optimization to skip empty buffer slots.

I figured the same. Note that it works with the strategy allFIFO, so the problem is most certainly related to the sequential buffers.

Are you also inspecting the VCD trace in debugging this? It should contain the answers we're looking for (albeit in a difficult to read format, which is where hsdbg comes to the rescue!).

I'm going to set this up now, I guess ;)

Dinistro commented 2 years ago

I set up the hsdbg tool and checked my circuit. There seems to be no issue, as 0x0 is only shown when the arrow is red:

image

As soon as the arrow changes color (I'm assuming this is related ot the valid and ready signals) the value is there

image

Any ideas on how to debug this further?

mortbopet commented 2 years ago

As soon as the arrow changes color (I'm assuming this is related ot the valid and ready signals) the value is there

Could you remind me again what the mismatch is between expected behavior and what we see here? As far as i can tell, this shows correct execution.

Dinistro commented 2 years ago

When I ran the integration test with the code

  handshake.func @top(%arg0: none, ...) -> (i64, none) attributes {argNames = ["inCtrl"], resNames = ["out0", "outCtrl"]} {
    %0 = buffer [2] seq %arg0 : none
    %1 = buffer [1] seq %2#1 {initValues = [42]} : i64
    %2:2 = fork [2] %1 : i64
    %3 = buffer [2] seq %2#0 : i64
    return %3, %0 : i64, none
  }

It did produce the result 0 instead of 42. The result is only printed once the out0_valid signal is asserted. Note that this test is using Vivado's simulator.

The simulation shown above seems to produce the correct result. So we seem to have different behaviour here, which seems very odd to me.

mortbopet commented 2 years ago

Hmm yea, that is indeed odd. Mind posting a vcd trace from the vivado simulation? For reference

Dinistro commented 2 years ago

I'm trying to get such a trace, but I never used these tools before.

Dinistro commented 2 years ago

Well, I found my problem. Having a bitwidth mismatch seems to cause Vivado xsim to simply assign zero instead of doing something reasonable. The driver uses 32 bit integers and the above module used 64 bit integers. I guess this is UB and thus xsim does whatever it wants.

mortbopet commented 2 years ago

The joys of hardware development tooling...