phanrahan / magma

magma circuits
Other
244 stars 23 forks source link

CoreIR Simulator And Fault/Verilator Simulator Cycle Timing Bug #349

Open David-Durst opened 5 years ago

David-Durst commented 5 years ago

There is a bug in both the CoreIR simulator (accessed through magma) and the Verilator simulator (accessed through fault) with an and gate that is receiving mismatched signals.

There were two signals being and together: db.WE and db.CE (https://github.com/David-Durst/aetherling/blob/d27f05846583749d05b5d58665950b3129dca559/aetherling/modules/native_linebuffer/two_dimensional_native_linebuffer.py#L320-L321)

The and was being used to drive a RAM WE port (https://github.com/David-Durst/aetherling/blob/d27f05846583749d05b5d58665950b3129dca559/aetherling/modules/delayed_buffer.py#L84).

In the following Aetherling test (https://github.com/David-Durst/aetherling/blob/d27f05846583749d05b5d58665950b3129dca559/tests/helper_test_readyvalid.py#L18-L55) (note the specific commit, Aetherling has since been fixed), a bug in Aetherling caused the two signals to be mismatched by 1 cycle:

screen shot 2019-02-11 at 10 27 10 pm

Thanks to the help of @THofstee @rdaly525 and @rsetaluri , this image was produced by VCS by compiling the Magma circuit to verilog and recreating the test bench in Verilog.

The Verilator simulator and the CoreIR simulator showed an impossible output: that the result of the &, the signal driving the RAM WE, was high on the correct cycles. It should always have been low. However, the simulators were merging the &'s of the alternating signals to be high. Please see the below text output from the test in helper_test_readyvalid. Look at cycle i18 where RAMCE is high, WDATA is all 1's, but the next cycle RDATA is all 0's still.

Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
In Run Generators
Done running generators
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
In Run Generators
Done running generators
Numargs=1
In Run Generators
Done running generators
Numargs=1
In Run Generators
Done running generators
Numargs=1
In Run Generators
Done running generators
Numargs=1
In Run Generators
Done running generators
Numargs=1
In Run Generators
Done running generators
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
Numargs=1
Numargs=1
Numargs=2
Numargs=1
Numargs=1
Numargs=1
Numargs=1
Numargs=1
Numargs=1
Numargs=2
Numargs=1
Numargs=1
Numargs=1
Found raddr
Found raddr
Starting topological sort
topo_order.size() = 276
numVertices(g)    = 276
i0
undelayed out: [0, 0, 0, 1]
out: [0, 0, 0, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 0, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i1
undelayed out: [0, 0, 0, 1]
out: [0, 0, 0, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 0, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i2
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i3
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i4
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i5
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i6
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i7
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i8
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i9
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i10
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i11
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i12
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i13
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i14
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i15
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i16
undelayed out: [0, 1, 1, 1]
out: [0, 1, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 1, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i17
undelayed out: [0, 1, 1, 1]
out: [0, 1, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 1, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i18
undelayed out: [1, 1, 1, 1]
out: [1, 1, 1, 1]
valid: True
db CE: True
db WE: True
db RDATA: [0, 0, 0, 0]
db WDATA: [1, 1, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: True

i19
undelayed out: [1, 1, 1, 1]
out: [1, 1, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [1, 1, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i20
undelayed out: [1, 1, 1, 1]
out: [0, 0, 0, 0]
valid: False
db CE: True
db WE: True
db RDATA: [0, 0, 0, 0]
db WDATA: [1, 1, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: True

i21
undelayed out: [1, 1, 1, 1]
out: [0, 0, 0, 0]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [1, 1, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i22
undelayed out: [1, 1, 1, 1]
out: [0, 0, 0, 0]
valid: False
db CE: True
db WE: True
db RDATA: [0, 0, 0, 0]
db WDATA: [1, 1, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: True

i23
undelayed out: [1, 1, 1, 1]
out: [0, 0, 0, 0]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [1, 1, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i24
undelayed out: [1, 1, 1, 1]
out: [0, 0, 0, 0]
valid: False
db CE: True
db WE: True
db RDATA: [0, 0, 0, 0]
db WDATA: [1, 1, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: True

i25
undelayed out: [1, 1, 1, 1]
out: [0, 0, 0, 0]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [1, 1, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i26
undelayed out: [1, 1, 1, 1]
out: [0, 0, 0, 0]
valid: True
db CE: True
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [1, 1, 1, 1]
db RADDR: 1
db WADDR: 0
db RAM WE: False

F
tests/test_readyvalid.py:17 (test_2dlb_flicker_ce_with_2x2_stride)
(True != 0 or 0 != 0)

Expected :0 or 0 != 0)
Actual   :(True
 <Click to see difference>

def test_2dlb_flicker_ce_with_2x2_stride():
        scope = Scope()
        c = coreir.Context()
        cirb = CoreIRBackend(c)

        testcircuit = DefineTwoDimensionalLineBuffer(cirb, Array(8, In(Bit)), 1, 1, 2, 2, 8, 8, 2, 2, 0, 0, True)

        sim = CoreIRSimulator(testcircuit, testcircuit.CLK, context=c,
                              namespaces=["aetherlinglib", "commonlib", "mantle", "coreir", "global"])

        for i in range(100000):
            if i % 2 == 0:
                sim.set_value(testcircuit.I[0][0], int2seq(1, 8), scope)
                sim.set_value(testcircuit.CE, 1, scope)
            else:
                sim.set_value(testcircuit.I[0][0], int2seq(2, 8), scope)
                sim.set_value(testcircuit.CE, 0, scope)
            sim.evaluate()
            sim.advance_cycle()
            sim.evaluate()
            print("i" + str(i))
            print("undelayed out: " + str([seq2int(sim.get_value(testcircuit.undelayedO[0][r][c], scope)) for r in range(2) for c in range(2)]))
            print("out: " + str([seq2int(sim.get_value(testcircuit.O[0][r][c], scope)) for r in range(2) for c in range(2)]))
            print("valid: " + str(sim.get_value(testcircuit.valid, scope)))
            print("db CE: " + str(sim.get_value(testcircuit.dbCE, scope)))
            print("db WE: " + str(sim.get_value(testcircuit.dbWE, scope)))
            print("db RDATA: " + str([seq2int(sim.get_value(testcircuit.RDATA, scope)[0][r][c]) for r in range(2) for c in range(2)]))
            print("db WDATA: " + str([seq2int(sim.get_value(testcircuit.WDATA, scope)[0][r][c]) for r in range(2) for c in range(2)]))
            print("db RADDR: " + str(seq2int(sim.get_value(testcircuit.RADDR, scope)[0])))
            print("db WADDR: " + str(seq2int(sim.get_value(testcircuit.WADDR, scope)[0])))
            print("db RAM WE: " + str(sim.get_value(testcircuit.RAMWE, scope)))
            print("")
            print("")

            # for some reason, lb going to 0 when flickering valid on and off for ce
            for r in range(2):
                for c in range(2):
>                   assert (sim.get_value(testcircuit.valid, scope) == 0 or seq2int(sim.get_value(testcircuit.O[0][r][c], scope)) != 0)
E                   assert (True == 0 or 0 != 0)
E                    +  where True = <bound method CoreIRSimulator.get_value of <magma.simulator.coreir_simulator.CoreIRSimulator object at 0x10759f160>>(TwoDimensionalLineBuffer_Array_8_In_Bit__type_1x1pxPerClock_2x2window_8x8img_2x2stride_0x0origin.valid, <magma.scope.Scope object at 0x1071c65f8>)
E                    +    where <bound method CoreIRSimulator.get_value of <magma.simulator.coreir_simulator.CoreIRSimulator object at 0x10759f160>> = <magma.simulator.coreir_simulator.CoreIRSimulator object at 0x10759f160>.get_value
E                    +    and   TwoDimensionalLineBuffer_Array_8_In_Bit__type_1x1pxPerClock_2x2window_8x8img_2x2stride_0x0origin.valid = TwoDimensionalLineBuffer_Array_8_In_Bit__type_1x1pxPerClock_2x2window_8x8img_2x2stride_0x0origin = DefineCircuit("TwoD...E, TwoDimensionalLineBuffer_Array_8_In_Bit__type_1x1pxPerClock_2x2window_8x8img_2x2stride_0x0origin.RAMWE)\nEndCircuit().valid
E                    +  and   0 = seq2int([False, False, False, False, False, False, ...])
E                    +    where [False, False, False, False, False, False, ...] = <bound method CoreIRSimulator.get_value of <magma.simulator.coreir_simulator.CoreIRSimulator object at 0x10759f160>>(TwoDimensionalLineBuffer_Array_8_In_Bit__type_1x1pxPerClock_2x2window_8x8img_2x2stride_0x0origin.O[0][0][0], <magma.scope.Scope object at 0x1071c65f8>)
E                    +      where <bound method CoreIRSimulator.get_value of <magma.simulator.coreir_simulator.CoreIRSimulator object at 0x10759f160>> = <magma.simulator.coreir_simulator.CoreIRSimulator object at 0x10759f160>.get_value

test_readyvalid.py:55: AssertionError
David-Durst commented 5 years ago

https://github.com/David-Durst/aetherling/blob/2043af4641f5bf46f5dd111d0812acf98f4eb932/tests/helper_test_readyvalid.py#L19 has the fault exmaple. @rsetaluri can explain how to get this example running in fault.

rsetaluri commented 5 years ago

Thanks for documenting all of this @David-Durst. I'll take a look.

David-Durst commented 5 years ago

@dillonhuff What is the plan for fixing this in the CoreIR simulator?

rdaly525 commented 5 years ago

@rsetaluri, @dillonhuff: I have a hunch this is due to implementing the simulation step function as

def step():
  eval()
  flip_clock()

vs:

def step():
  flip_clock()
  eval()

The former will possibly report outputs appearing on the previous cycle as the current cycle. This is because it actually takes two calls to step() in order to propagate the clock change from the first step()

In general you should not ever be evaluating a clock change at the same time as an input change.

leonardt commented 5 years ago

I think we need to consider why this is happening in both verilator and the coreir simulator. Is this a fundamental issue w.r.t. to cycle accurate vs event based simulation?

It would be worth reporting this issue to the verilator community to see if they have any insight. Ideally we can provide them a simpler way to reproduce, is there a simple verilog circuit we can construct to demonstrate this issue? Maybe we can just splice the subcircuit of interest from the above test case out? The CPP test bench for verilator should be easy enough to write. What are the necessary components required to reproduce the issue?

rsetaluri commented 5 years ago

I tried a simple circuit according to @David-Durst 's description of that slice:

import magma
import mantle

circ = magma.DefineCircuit("top2", "clk", magma.In(magma.Clock), "counter_out", magma.Out(magma.Bit), "counter_reg_out", magma.Out(magma.Bit), "counter_out_and_counter_reg_out", magma.Out(magma.Bit))
counter = mantle.DefineCounterModM(2, 1, cout=False)()
magma.wire(counter.O[0], circ.counter_out)
reg = mantle.DefineRegister(1)()
magma.wire(counter.O, reg.I)
magma.wire(reg.O[0], circ.counter_reg_out)
magma.wire(reg.O[0] & counter.O[0], circ.counter_out_and_counter_reg_out)
magma.EndDefine()

And ran it through fault:

import fault

def prnt():
    tester.print(circ.counter_out)
    tester.print(circ.counter_reg_out)
    tester.print(circ.counter_out_and_counter_reg_out)

for i in range(10):
    prnt()
    tester.step(1)

Using both target="coreir" and target="verilator" (except with tester.step(2) -- this is something we also need to synchronize), we get:

top2.counter_out = 0
top2.counter_reg_out = 0
top2.counter_out_and_counter_reg_out = 0
top2.counter_out = 1
top2.counter_reg_out = 0
top2.counter_out_and_counter_reg_out = 0
top2.counter_out = 0
top2.counter_reg_out = 1
top2.counter_out_and_counter_reg_out = 0
top2.counter_out = 1
top2.counter_reg_out = 0
top2.counter_out_and_counter_reg_out = 0
top2.counter_out = 0
top2.counter_reg_out = 1
top2.counter_out_and_counter_reg_out = 0
top2.counter_out = 1
top2.counter_reg_out = 0
top2.counter_out_and_counter_reg_out = 0
top2.counter_out = 0
top2.counter_reg_out = 1
top2.counter_out_and_counter_reg_out = 0
top2.counter_out = 1
top2.counter_reg_out = 0
top2.counter_out_and_counter_reg_out = 0
top2.counter_out = 0
top2.counter_reg_out = 1
top2.counter_out_and_counter_reg_out = 0
top2.counter_out = 1
top2.counter_reg_out = 0
top2.counter_out_and_counter_reg_out = 0

I also tried wrapping the following verilog and get the same output:

module modcounter (
                   clk,
                   out
                   );

   input clk;
   output reg out;

   always @(posedge clk) begin
      out <= out + 1;
   end

endmodule // modcounter

module register (
            clk,
            in,
            out
            );

   input clk;
   input in;
   output reg out;

   always @(posedge clk) begin
      out <= in;
   end

endmodule

module top (
            clk,
            counter_out,
            counter_reg_out,
            counter_out_and_counter_reg_out
            );

   input clk;

   output wire counter_out;
   output wire counter_reg_out;
   output wire counter_out_and_counter_reg_out;

   modcounter modcounter_inst(clk, counter_out);
   register register_inst(clk, counter_out, counter_reg_out);

   assign counter_out_and_counter_reg_out = counter_out && counter_reg_out;

endmodule // top
David-Durst commented 5 years ago

@dillonhuff here are the coreir and verilog for delayed buffer that was behaving poorly. The issue was that it's CE and WE ports were alternating. example_bad_delayed_buffer.zip

David-Durst commented 5 years ago

The code at the time of the issue: https://github.com/David-Durst/aetherling/tree/be2105e73d40e4b2168d1492e45a0a5c6f0cc6c6

The commit that fixed the issue: https://github.com/David-Durst/aetherling/commit/8450c2b8beb61008f42633b3212af20c5dbbe489 . This shifted the phase of the inputs to the delayed buffer so that they are on the same cycle.